Intel MMX vs. Microsoft Talisman

Abbott and Costello Do Multimedia

Francis Vale

"Those who maintain that improvements in CPU and VLSI technology are sufficient to produce low cost hardware or even software (image processing & graphics) systems that we would consider high performance today, have not carefully analyzed the nature of the fundamental forces at work"

-- Microsoft, speaking on its new Talisman multimedia architecture, from the 1996 SIGGRAPH Proceedings.

"Processors enabled with MMX technology will deliver enough< performance to execute compute-intensive communications and multimedia tasks with headroom left to run other tasks or applications. They allow software developers to design richer, more exciting applications for the PC".

-- From Intel Corp.'s MMX Chip Overview.

So Who is on first? What's on second? I Don't Know is on third? If the techno-marketing issues for the next generation PC multimedia processing systems weren't so vitally important, the above computer replay of Abbot & Costello's classic comedy routine would make for a great TV farce. Unfortunately, also at stake in this looming new battle between Microsoft and Intel are billions of your IS/IT dollars--Not too mention the squishing mayonnaise effect it will have on software developers sandwiched between these two warring PC titans. Also caught in the middle are all the consumer electronics companies as they gear up for the great digital TV wars.

But to see how these bases got so crazily loaded, let's go back to this game's very beginning. Your Intel 8088 PC began its life as a very crude graphics-capable computer. Absolutely nothing about it--from CPU to memory bus--was meant to act like a high performance imaging system. This Grand Canyon-sized marketing gap left the business door wide open for other vendors, like Silicon Graphics, Intergraph, and Apollo Computer, to create high performance 2D/3D imaging, and CAD systems. These specialized computers invariably did not use Intel CPU's, nor MS operating systems. But over time, graphics capabilities were slowly introduced to the PC; first by clever software hacks, then with specialized chips, and finally, add-on graphics accelerator cards appeared. Then came more user demands for supporting high quality PC video and audio, essentially fueled by the advent of CD-ROMs. And finally, we have the Internet multimedia extravaganza.

But the problem was, no matter how many next iteration PC CPUs Intel cranked out, from the 8086 to the 32 bit Pentium Pro, user demands for imaging and multimedia were always several steps ahead of what the PC's processor could deliver. But these needs were not so easily met, for greatly compounding Intel's problems were the huge issues of applications/operating system backwards compatibility. This PC-Gordion Knot effectively precluded Intel from just chucking the whole '86 architecture, and starting out with a clean sheet of paper. Multimedia processing is thus a very sore point for Intel. Worse, it is typically done best and cheapest by DSPs, Digital Signal Processors. With its back against the multimedia wall, Intel was subsequently pushed into making a strategic marketing blunder.

In the summer of 1995, Intel mistakenly strayed onto Microsoft's fiercely protected API turf by offering something called NSP, Native Signal Processing. The Intel-specified NSP API off-loaded traditional DSP functions, like audio processing, onto the host CPU. In the case of audio, the NSP API coupled into Intel's IA-SPOX, its real-time kernel. Boom! Microsoft's jealous wrath came down on Intel like that of an Old Testament God. In furious response, Microsoft essentially mandated that software developers only use its new DirectX family of APIs, which replaced the IA-SPOX real time kernel with highly efficient drivers. Initially, the Direct APIs were part of the Windows 95 Game Software Development Kit (SDK) and included DirectDraw, DirectSound, DirectPlay, and DirectInput.

Via DirectX, Microsoft neatly cut the Knot that had so confusticated Intel. Microsoft had also known for some time that it must find a way to cut loose from the PC's multimedia's restraints. And even more than Intel, Microsoft also knew it had to maintain backwards compatibility with its operating systems and applications. So Gates and friends set out to create a means to preserve their operating system franchise, yet still satisfy users' multimedia cravings. Thus, the DirectX APIs came into being. These APIs constituted an object-oriented, hardware independent, multimedia architecture. A critical piece of the DirectX API family is the newly released Direct 3D, a software rendering engine that provides a means to run complex 3D graphics. Direct3D can do its job as a host-based, software-only system, via the native CPU-mode (done through HEL, the Hardware Emulation Layer); or it can use any available hardware specific multimedia capabilities, (via HAL, the machine independent Hardware Abstraction layer).

With the DirectX marketing war in MS high gear, the (rather predictable) result of last year's NSP-API brouhaha was that Intel quietly withdrew its support for PC IA-SPOX, and slinked away to lick its marketing wounds. And so, out of this NSP API fiasco, the new Intel MMX multimedia-enabled P55C CPU was born (although Intel would be loathe to admit it). The Intel MMX is a very clever design. No new, OS-disrupting, registers are implemented. Nor are there any new APIs to infuriate Microsoft. Instead, the MMX uses the Pentium's existing 80 bit floating point registers, of which there are eight, and aliases them for use by multimedia applications. In other words, the FP registers now do double duty. The 80-bit long FP registers are logically carved up into 64-bit operands. These MMX-only, 64-bits can be used in several ways. They can contain:

1) Eight packed bytes

2) Four packed 16 bit words

3) Two packed 32-bit words, or finally,

4) One 64-bit quadword

To use these aliased FP registers, Intel has also provided 57 new MMX instructions. By probing to see if the P55C MMX chip is present, via the instruction CPU_ID, the application can detect this new CPU, and change its execution course on-the-fly. When your application uses the MMX multimedia functionality, and then subsequently switches over to floating point use, the registers are first cleared for the new FP operation (EMMS: Empty MMX State.) In pre-emptive multitasking systems, like Win95 and NT, this switch is handled by the operating system, so the application does not have to be made context switch-aware. But these FP and MMX operations cannot be mixed together in the same application without incurring a big performance hit, as this context switch takes a heavy toll in CPU cycles. When switching modes, it costs your MMX chip a cool 50 processor cycles. However, as a practical matter, context switching is probably no big deal, for, as Intel well knew, most multimedia operations typically take place as:

a) Small, highly repetitive loops

b) Frequent multiplies and accumulates

c) Compute-intensive algorithms

d) Highly parallel operations

MMX uses a classic, parallel processing technique, known as single instruction, multiple data operation, or SIMD. This technique allows one instruction to chomp away on many pieces of data. Several parallel computer makers have used SIMD; one of the better known, perhaps, being Masspar. This SIMD machine vendor has since joined the infamous and defunct parallel processing ranks of Kendall Square Research, and Thinking Machines (now reborn as a software-only company.) What Masspar, TMC's Danny Hillis, and many starry-eyed others never could quite get the hang of is that effective parallel computing had little to do with the fancy hardware, and most everything to do with the associated compilers.

It has long been known that it is very difficult to create innovative, high performance algorithms for parallel applications. Only by a painstaking redesign of the parallel algorithm by very talented programmers can the true performance of a parallel architecture, such as SIMD, be exploited. But these are skills not possessed by ordinary programmers. So, in almost all cases, the software developer turns to the machine's vendor to provide the smart compilers necessary to get all the claimed performance. (So important is this critical software area, that, for many years, ARPA, via the High Performance Computing & Communications Initiative, spent millions of dollars on new compiler designs.)

The moral is therefore clear: without good compilers, SIMD, or any other parallel processing architecture, just will not live up to the surrounding hype. But to date, no one has really come up with an earth-shattering SIMD compiler. So guess what? Intel is not providing any SIMD-enabled compilers for the MMX! Instead, it has declared that it's leaving the development of this vitally important software to other companies, such as Microsoft, and Watcom. But for Microsoft, MMX is not that kind of party. Rather, than supplying a first rate compiler, Microsoft has merely served up to its software developers a new Direct3D API (via HAL) that takes minimal advantage of MMX's 57 additional instructions. Barf!

Now that you know all this SIMD arcana, how likely do you think that the "Up To 100% Faster!" claims being touted by Intel for P55C-modified code will ring true? Obviously, the WinTel mighty marketing machine is very much in over-hype gear. At best, you can expect a 15% to 40% speed improvement with MMX-redesigned applications, with performance increases of 25% to 25% being the likely norm. Instead, users might want to take comfort from the fact that the P55C has a new and larger L1 cache, which now sports 32K, instead of 16K. This increased cache size should realistically provide a 5% to 20% performance improvement for non-MMX-enabled software, depending upon the application.

Intel is also providing a 32-bit Pentium Pro-MMX, the new Pentium II. Originally code-named Klamath, this is one big mother of a chip set. Incorporating both CPU and L2 cache memory, the whole Pentium Pro MMX lot is encased in a 5" x 2.5" x .5" plastic and metal cartridge. Obviously, this is no simple swap-out-the-CPU-for-another affair, so upgraders beware. Instead, Intel says it will offer Pentium Pro "overdrive" chips with MMX capabilities. But these new Overdrive chips--to be shipped in late 1997--are not to be confused, said Intel, with the Pentium II. So, Pentium II comes first, then the MMX Pentium Pro Overdrive. If you really want an MMX Pentium Pro, just bite the bullet and buy a new Pentium II-equipped PC. But bear in mind that the MMX performance gains, as seen above, will probably not be all that Intel has hyped them to be.

One thing for sure: Intel will not be hyping its support of branch instructions on either type of MMX Pentium CPU. A misdirected MMX branch could stall the Pentium superscalar pipeline quite dramatically. In the case of the Pentium Pro, such a bad branch can cause up to 15 cycles to go by before it can recover. Rather than branch predictions, the MMX/Pentium Pro uses new instructions for conditional select operations on multiple operands; thereby supposedly achieving branch prediction performance, but without taking the potential pipeline hit.

To MMX-sum up, we have: the biggest major redesign of the Intel PC CPU architecture since the 386; a parallel processing, multimedia extension with backwards software compatibility; a larger CPU cache; a huge new Pentium Pro MMX implementation; outrageous MMX multimedia performance claims; minimal software support from Microsoft for exploiting all these new and nifty MMX features; and a history of bad blood between the two companies when it comes to enhancing multimedia PC performance. Lastly, there is Digital's recent full body legal slam against Intel. Is the Pentium a rip-off of DEC's Alpha CPU design?

So who holds the aces here? Microsoft, of course, believes it does, mainly by virtue of the machine independent DirectX APIs. Via a paper it presented at the 1996 SIGGRAPH, Microsoft made its DirectX/HAL hole card very clear for all to see. The title of this paper was "Talisman: Commodity Real-time 3D graphics for the PC," and was written by Jay Torborg and James T. Kajiya. This paper focused on new software/hardware ways to do 3D/2D graphics at video rate processing speeds, as well as handling 3D audio spatializing, Dolby Surround decoding, MPEG-2, etc. Finally, in August, 1996, the company generally announced its new Talisman direction. Bottom line, Talisman tries to make the case that current and future PC architectures for doing multimedia are thoroughly broken--and by guilty association, so is MMX.

Microsoft is further demanding that a new class of Talisman-PCI cards be created. In reality, Microsoft has called for a new jihad against Intel, for at Talisman's hardware's heart is, surprise, surprise, a DSP chip! (There goes the MMX neighborhood.) As stated in the SIGGRAPH '96 paper, this "Media DSP processor is responsible for video codecs, audio processing, and front-end graphics processing (transformations, lighting, etc.)" In addition, there are some other "proprietary VLSI devices and commercially available components" on this new PCI card. Finally, the Media DAC on this PCI board includes a USB serial channel, for joysticks and such; as well as support for 1394 (ak"FireWire") and its up to 400 Mbit/sec serial bus.

Although Microsoft has stated that the host CPU will still play an equitable role in this new multimedia universe, the fact of the matter is that these Microsoft-specified, Microsoft-controlled, PCI cards could now be handling the majority of your PC's multimedia chores. Where, then, does Talisman leave Intel? Pretty much where IBM and OS/2 were just before Microsoft hit the Win-only button, if Gates gets his way again. What's at stake, therefore, is who gets to control and define the multimedia destiny of the PC. This is no small stakes game. Talisman ain't that kind of party.

Microsoft's Talisman paper roundly slammed the conventional Intel-dominated PC architectures on two big counts: Memory bandwidth, and system latency. Microsoft's numbers state that the data from a 75Hz, 640 x 480 x 8 frame buffer need 23 MB/sec to scoot across. At the other extreme, a 1024 x 768 x 24 frame buffer for the same 75 Hz update requires 169 MB/sec. And when you add in z-buffering, texture map reads with various aliasing schemes, and so on, your bandwidth requirements have zoomed up to 12,000 MB/Sec. (That sound you just heard was your PC going into EDO shock). Somewhat gratuitously, the authors went on to say that SGI's RE2 machine, with its 10,000 MB/Sec memory bandwidth, has "nothing to fear from evolving PC 3D accelerators...for some time to come." (Except, some might say, from Microsoft's raptor marketing tactics.)

With regards memory advances, Microsoft strongly contends that, although PC DRAM capacity has improved enormously, and also dropped precipitously in price, improvements in latency and bandwidth have not kept lockstep pace, and that this laggard memory situation won't improve any time soon. Microsoft went way out of its marketing way when it made these bad memory allegations, for in the SIGGRAPH paper they were in a special section all their own, entitled "Fundamental Forces." (AKA, Intel Impediments?) Microsoft thereupon summed up its thesis, and the rationale for its new Talisman architecture, by stating that "These charts suggest that achieving high-quality imagery using the conventional pipeline is an inherently expensive enterprise. Those who maintain that improvements in CPU and VLSI technology are sufficient to produce low cost hardware or even software systems that we would consider high performance today, have not carefully analyzed the nature of the fundamental forces at work." And with that Intel-damning, McCarthy-era-style summation, Microsoft launched into its own vision for your PC's multimedia future.

Given Microsoft's full court body slam against memory latency and bandwidth, it's not too surprising that we find the Rambus DRAM at the heart of the Talisman hardware reference. Founded in 1990 by Mike Farmwald (also a co-founder of MIPS Technologies), and Stanford University Professor Mark Horowitz, Rambus (Mountain View, CA), and its 100 or so employees will this year rack up about $300 million in worldwide sales--But Rambus will not make a single one of these memory circuit parts. Instead, the company licenses its technology and engineering services on a royalty basis to semiconductor manufacturing companies like Toshiba, NEC, Hitachi, IBM, LG Semicon, LSI Logic, Cirrus Logic, Samsung, Hyundai and Oki. In addition, other companies like Nintendo, Silicon Graphics, and now Microsoft, also count themselves among Rambus' many licensees.

According to Rambus documentation, the 16/18/64/72-Mbit Concurrent Rambus' DRAMs (RDRAM) are extremely high-speed CMOS DRAMs organized as 2M or 8M words by 8 or 9 bits. They are general purpose high speed memory devices suitable for use in a wide variety of computer system applications. A RDRAM is capable of bursting unlimited lengths of data at 1.67 Ns per byte (13.3 Ns per eight bytes). The use of Rambus Signaling Logic (RSL) technology permits 600 MHz (600 hundred MB/Sec) transfer rates over the Rambus Channel--a narrow byte-wide data bus--while using conventional system and board design methodologies. The smallest block of memory that may be accessed with READ and WRITE commands is an octbyte (eight bytes). Low effective latency is attained by operating the two or four 1K Byte or 2 Kbyte sense amplifiers as high speed caches, and by using random access mode (page mode) to facilitate large block transfers. Concurrent (simultaneous) bank operations permit high effective bandwidth using inter-leaved transactions. The standard DRAM core is organized as two or four independent banks, with each bank organized as 512 or 1024 rows, and with each row consisting of 1K Byte or 2K bytes of memory cells. One row of a bank may be "activated" at any time and placed in the 1K Byte or 2K Byte "page" for the bank Column accesses (READ and WRITE commands) may be made to this active page.

Says Rambus, "The high bandwidth, low pin-count Rambus technology has a major impact on improving the price/performance and reducing the size of a broad range of systems..."

The Talisman reference spec calls for using 4 Mbytes of shared RDRAM, and two, 600 MHz, 8-bit Rambus channels. This configuration thus yields a potential throughput of 1.2 Gbyte/Sec. The shared Rambus memory holds the data during image processing. This data is logically carved up by Talisman into 32 x 32 bit "chunks." This 32 x 32 format also conveniently keeps the Rambus DRAMs in their efficient burst mode, and thus page-miss latencies don't suck up memory bandwidth.

The 32 x 32 chunking approach proposed by Microsoft is radically different from a conventional 3D pipeline, where the entire image and all its associated data must be on immediate beck and call for rendering. Such voluminous image data obviously place huge space and access demands on the memory system. Instead, says Microsoft, let's use the DirectX object-oriented mechanism. We can then carve the objects in the image up into these 32 x 32 pixel chunks, and next place each chunked object into the appropriate image layers. We would then have multiple image layers, with each layer possessing multiple image object chunks. Each image layer is thereby manipulated individually, with each object chunk described and manipulated via the DirectDraw and Direct3D APIs.

(Alternatively, image calculations can be done based on object hierarchy and bounding boxes.) As a consequence of this design, Talisman does not use frame buffers in the usual sense; "Instead, multiple image layers are composited together at video rates to create the video output signal."

Talisman is also a time-based image processing system. For example, if nothing has occurred within a particular image object chunk during the time several screen frames have passed, then it is not updated. If this temporal-processing scheme sounds somewhat similar to MPEG-2 compression techniques, such as that used by DSS (Digital Satellite Subscription) TV, then you are right. Microsoft claims that by using time-based techniques, the image layer transforms can be done considerably faster than if you had to continually recalculate the geometry--typically 10 to 20 times faster, it says. In addition, after each 32 x 32 chunk is sequentially rendered, Talisman uses an image compression algorithm quite similar to that used by JPEG. But, typically, Microsoft instead gives this JPEG-type system its own corporate moniker, "TREC." Finally, texture maps, which constitute the highest memory overhead, are also compressed this way, and stored in the Rambus graphics memory.

When you chart the overall Talisman image processing flow on the reference hardware, it looks like this:

1) Image pre-processing is done in the DSP, which is operating under a Microsoft-developed real-time kernel (not IA-SPOX!).

2) The DSP transforms the 3D objects into a 2D integer plane.

3) The processed data is carved up into 32 x 32 chunks, and then assigned to a particular image plane, as opposed to an enormous sorted list of visible polygons.

4) The now 2D processed, JPEG/TREC compressed, chunked data go into the Rambus DRAM.

5) The Polygon Object Processor (POP) VLSI chip then pulls out and decompresses the chunked data from the Rambus DRAM chips, and does the usual rendering stuff, like lighting, painting texture maps onto polygons, etc.

6) As no frame buffer is used, the once-again processed object data is put back by the POP chip into the Rambus memory.

7) Meanwhile the other VLSI chip, the Image Layer Compositor (ILC), is continually scanning the POP-processed data, building up the scene, 32 scan lines at a time. (Both the POP and ILC VLSI chips were designed by Silicon Engineering, and Cirrus Logic.)

8) At the same time, the ILC is checking to see if an object has moved since it built up the last screen frame.

9) If the object has moved, then the ILC applies affine transforms--it manipulates the polygons--so as to create the illusion of true 3D motion.

The sum result of Talisman's image processing software/hardware architecture, says Microsoft, is that the memory, and image processing demands would be greatly reduced; while at the same time, users could enjoy 1,344 x 1,024-pixel resolution at 75-Hz rates, with 24-bit true-color pixel data at all resolutions. The SIGGRAPH '96 paper states that a scene complexity of 20,000 to 30,000 rendered polygons, or higher, can be supported via Talisman; comparable, says Microsoft, to a 3D graphics workstation capable of executing 1.5 to 2 million polygons per second. And the Talisman cost? Microsoft quotes a bill of materials cost of $200 to $300, which roughly translates into a sub-$500 add-in PCI card at retail street prices.

Obviously, Talisman seems to be offering a lot of image bang for the buck, not too mention giving Intel some molto agida in the process. But this, after all is the computer business (as in, Penn & Teller Magic Marketing), so does Talisman really constitute a breakthrough by Microsoft in new multimedia architectures? Well, don't think for a minute that Andy Grove has sold his Intel holdings for a stake in Maalox. As it turns out, Microsoft's Talisman strategy has one huge, fatal logic flaw; namely, its core assertion that improvements in PC (commodity) memory latency and bus speeds are not going to keep pace with the new multimedia imaging requirements.

It now appears that at almost the same time that the Talisman architecture was being Pontifically announced, Grove and Company were busily touring all of the big Asian DRAM vendors. Intel's message was quite clear, and it was best summed up by Intel Chairman Gordon Moore, while on a visit to Japan, who said that CPU clock rates are increasing much faster than we originally expected. By the year 2000, the 250-MHz PC clock rates of 1996 will have more than doubled. (Actually, this has already happened. Digital is now shipping in volume its 500+MHz Alpha CPUs running NT, and the Maynard company has started sampling a 750 MHz Alpha chip.) These blistering clock speeds mean that PC memory must also keep up; otherwise, these new CPUs will mostly sit around, ordering coffee and doughnuts. Intel has therefore made it plain that by 1999, it wants memory chips that can do 1.6-Gybte data transfers! It also wants these new 64-bit DRAMs in mega-quantities from the Asian semi vendors. (Maybe not so coincidentally, the private memory bus to the cache on the Intel P6 is 64 bits wide, and has a theoretical peak transfer rate of 1.6 GB per second.)

Intel has now publicly confirmed that it is working with Rambus, Inc. The two companies have signed a co-development contract, and engineers from the two organizations are now working together to extend the 500 to 600-MHz RDRAM to "ballpark" 1.6 Gbytes/sec by 1999. With Intel throwing its considerable marketing weight behind the Rambus DRAM, it will emerge as the new PC memory standard. These soon to be available Rambus DRAMs, with their extraordinarily high bandwidth and very low latency, will, thanks to the sheer size of the PC market, be amazingly cheap.

The new Rambus/Intel architecture is still not a completely done deal, however. As soon as word hit the street about Intel's new wish list, the Semi vendors quickly huddled together and came up with--you guessed it--a new industry standard. This proposed DRAM standard, called SyncLink, outlines a packet-oriented bus architecture operating at 400 MHz, and transferring its data on each clock edge. These two DRAM approaches, Rambus and SyncLink, are not all that dissimilar according to industry sources, save one huge difference: With SyncLink, there are no royalties to pay to Rambus. That omission, in of itself, is a big motivator.

However, SyncLink critics (including the-off-the-record Intel), don't think that this new proposal will be able to meet its design and production goals in time to meet Intel's PC needs. Regardless, Fujitsu, Hyundai, and Mitsubishi Electric, and some others, are all hard at work on developing SyncLink DRAMs. But many more are betting the silicon farm on Rambus. Regardless, we now have a two horse race. Moreover, in computers, three years might as well as be three centuries. So don't count SyncLink out just yet. But one thing you can surely count on is oodles of incredibly fast, cheap RAM coming your way soon. Thus, Microsoft's over-the-top, memory-shortcoming thrust goes right out the Talisman window. Oops! Who's reaching for the antacid now?

Therefore, the really big question raised by Talisman is not memory, but rather, whose chips will do the image processing chores? We already know Intel's MMX answer, and its respective strengths and limitations. We also know Microsoft's DSP-take on this subject. However, Gates and Co. seriously blundered by asserting that regular PC memory would continue to be technically bankrupt well into the foreseeable multimedia future. So who is to say Microsoft won't be dramatically wrong again? But helping Gates make sure that he doesn't fall on his visionary sword twice, is the fact that DSPs typically enjoy a native (pardon the pun) high speed multimedia advantage over hybrid-function host CPUs, like the MMX. And Talisman-sanctioned DSP vendor companies, like Philips are striving mightily to make sure it stays that way.

Philips is a Johnny-come-lately to the Talisman DSP party (filling Samsung's Talisman shoes after the Korean company folded its multimedia DSP joint venture effort with 3DO.) But Philips' late arrival hasn't slowed Philips down in its mad rush onto Microsoft's dance floor. Philips' TriMedia DSP is very much unlike the now defunct Samsung unit. The TriMedia uses an instruction technique called VLIW (Very Long Instruction Word). Like SIMD architectures, the VLIW approach is also dependent upon getting the compilers right. But in this case, Philips, unlike Intel, has made the necessary time and money commitments to get the requisite software development tools, including compilers.

Philips formally announced its TriMedia plans in November, 1996. This chip, now called the TM-1, is a brute. At a clock rate of 400 MHz, and an internal bus that also screams along at 400 MB per second, the TM-1 can crank out up to four billion operations per second (importantly, these specs are based on C code execution). The TM-1 can simultaneously handle multiple media data types, does image processing, and also supports Dolby Digital, MPEG, H.320, and H.324 video-conferencing.

Philips has been quietly developing a new and separate version of the TM-1, code-named TM-MS, for Microsoft's Talisman architecture. The TM-MS chip will combine a media processor, and a polygon processor onto a single chip. Significantly, the new polygon processor is also being designed by Silicon Engineering, and Cirrus Logic, the same companies which also designed the two specialized Talisman VLSI chips, the POP and the ILC. (A little Talisman incest, Silicon Valley style.) This combined architecture will have the polygon engine doing the 3-D set-up and rendering, with its silicon sibling off-loading the task of geometry and lighting from the host CPU.

The TriMedia DSP is thus one hot chip, and we don't mean thermally. For Apple computer has also been reported to be coming out in early 1997 with its own add-in PCI card sporting the TriMedia chip, and special image processing memory. This add-in Power Macintosh card will supposedly cost about $400 to $500; the same price range tagged by Talisman. Apple is also porting its own object-oriented--and highly touted--QuickDraw 3D graphics environment onto the TriMedia processor. Apple's QuickDraw software currently enjoys a lead over Direct3D, both in terms of code robustness and capabilities. Furthermore, QuickDraw, unlike DirectX, is also multiplatform, and runs under MS Windows. So, along with Intel's Rambus memory plans, Apple also blows apart any MS allegations that Talisman's DSP/high speed memory/object-oriented multimedia architecture is unique, and offers special advantages over competing rivals. However, Apple, being Apple, hasn't really told anybody about all this great stuff yet. (One day, Apple may learn about marketing, but don't hold your breath.)

So when you come right down to it, what is so new and special about Microsoft's Talisman architecture--aside from maybe giving Intel a few sleepless nights? Apart from its image morphing and compositing functions, not much, really. Moreover, Cirrus Logic, one of the makers of the Talisman custom VLSI chips, has recently pulled out of the deal, causing Microsoft to cancel its plans to roll out a Talisman reference board.

However, stepping into the Cirrus breach are companies like Trident, which has announced plans to be first on the street with a Talisman 3-D single-chip processor. (Rumble on the street has it that Fujitsu, one of the primary players in the original multi-chip Talisman reference spec, will also be throwing its hat into the single 3-D chip ring. Maybe even Cirrus, too) If the efforts of Trident and rumored others are successful, then the four chip design of the original Talisman reference design (the "Escalante" board) will be shrunk down to just one chunk of silicon costing anywhere from $20 to $50. This translates into a big reduction in Talisman board costs, as well. Instead of a bill of materials of several hundred dollars, the new boards may cost less than $100 to make.

But also likely shrunk down will be the capabilities of the Talisman system. Although Trident claims that its new single chip implementation will also handle MPEG-2 decompression, it's likely that AC3 surround audio and MPEG-2 decoding will be omitted in most other companies' single chip implementations. In other words, Talisman ends up just being a fancy new kind of graphics accelerator. Indeed, accelerator board companies, like S3 and ATI Technologies, are considering licensing parts of the Talisman design from Microsoft for just this 3-D-only purpose.

In the end, what may have killed off the original, all encompassing Talisman vision was Microsoft's naive belief that it could get four semiconductor companies (Philips, Samsung, Cirrus Logic, and Fujitsu) cooperating together, and all marching in sync to Gates' drumming. Not bloody likely. And so the beat goes on, with other hot silicon designers coming up with their own ideas on how best to do 3-D graphics processing.

Meanwhile, if you have succumbed to Nintendo 64 fever, you already know what less than $200 will buy you in terms of real time, awesome 3D graphics. This 64-bit (!) TV game console puts to shame all of this PC posturing and industry gesturing. When you plunked down your couple of hundred bucks at Toys R'Us for the N64, you also became the proud owner of a custom MIPS RISC chip, the R4300i, which is clocked at 93.75 MHz. Incredibly, the 4300i also sports a 64-bit FPU. You are also holding in your game-sweaty hands another custom-made MIPS co-processor, the Reality Co-processor (RCP), clocked at 62.5 MHz. These two MIPS chips, both manufactured by NEC, interface directly to each other, without any glue logic. To keep Super Mario going non-stop, the N64 sports two large on-chip caches; a 16 KB instruction cache, and an 8 KB data cache.

And, hmm. Guess what? This way cool 3D game system also use a DSP! The R4300i does the processing chores, while the RCP does most of the audio and graphics. The RCP is actually two chips in one. One half is a DSP core, running at approximately 450 to 500 million operations per second, while the other half is what SGI calls "Pixel Engine." The Pixel Engine does all the polygon shading, transformations, texturing etc. The Pixel Engine can crank out about 500,000 fully processed polygons per second. And also guess what's in there, doing the memory functions? That's right, the N64 uses a 500 MB/Sec data transfer variant of the blazing fast Rambus DRAMs.

So, will Intel be able to counterattack the coming DSP/Talisman/3-D onslaught with a host-based, Rambus-oriented, multimedia architecture? And thereby successfully fend off Microsoft's, et al, attempts to run the PC Multimedia Party?

Just ask Mario--If you can catch him.

Copyright, 1997, Francis Vale, All Rights Reserved

21st, The VXM Network,