New Microprocessor Promises Ubiquitous Supercomputers…And Perhaps Machine Consciousness?

From its press release: ClearSpeed Technology today announced the ClearSpeed CS301, a multi-threaded array processor that enables dramatic improvements in performance and power consumption for intensive floating point applications. At over 25 GFLOPS peak performance, the new chip provides more than twice the processing speed of competitive products. At 10 GFLOPS per Watt, power consumption is also twenty times more efficient. As a result, the CS301 delivers up to a ninety percent reduction in purchase price and running costs, making high performance computing affordable and available to companies of all sizes.

“With conventional processor design, increasing performance has tended to come with real penalties in power consumption and heat dissipation, to the point where computing cannot keep up with the demands of today’s emerging applications and rapidly increasing volumes of data,” said Tom Beese, CEO of ClearSpeed Technology. “The CS301 is designed specifically to meet those needs with high performance, power efficiency and full programmability in C
combined into a single chip. The CS301 is the first in a family of ClearSpeed microprocessors that we believe will challenge present day thinking by creating a world where scientists, bio-informaticians, engineers and content creators alike can have access to high performance computing anywhere, anytime.”

The CS301 is based on a multi-threaded array processing (MTAP) architecture and includes 64 processing elements, 384 Kbytes of on-chip SRAM and I/O ports interconnecting through ClearSpeed’s ClearConnect bus. Each processing element has its own floating point units, local memory and I/O capability, making the CS301 ideally suited for applications which have high processing or bandwidth requirements. The ClearConnect bus is a packet switched network that provides high bandwidth and low power consumption, supporting multiple concurrent transfers giving even higher aggregate bandwidth.

As a result, complex mathematically-based applications such as computational biology and drug discovery, digital content creation, nanotechnology development, scientific research and financial modelling can now be executed in a fraction of the time.

“We are gratified to see the immediate high level of interest displayed by OEM’s in the overall system improvements enabled by the CS301,” said Mike Calise, president of ClearSpeed U.S. “The dual benefit of performance and efficiency is empowering companies to accelerate existing applications as well as inspiring them to explore new applications that were previously inaccessible.”

The CS301 can serve either as a co-processor alongside an Intel or AMD CPU within a high performance workstation, blade server or cluster configuration, or as a standalone processor for embedded DSP applications like radar pulse compression or image processing. In applications where the CS301 is acting as a co-processor, dynamic libraries offload an application’s inner loops to the CS301. Although these inner loops only make up a small portion of the source code, these loops are responsible for the vast majority of the application’s running time. By offloading the inner loops, the CS301 can bypass the traditional bottleneck caused by a CPU’s limited mathematical capability, executing the core of the application more than twice as fast as anything else in the marketplace.

“To deliver such high levels of performance with full programmability and outstanding gains in power efficiency is a very significant achievement,” said Chris Piercy, president and chairman of the Northern California Nanotechnology Initiative. “We believe this technology will accelerate the development of nanotechnology and its applications across various industries by making high performance computing more accessible and scalable than ever before.”

The ClearSpeed CS301 is fully programmable in high-level languages and its software development kit is available now with a C compiler, graphical debugger and a full suite of supporting tools and libraries.

Thus endeth the press release.

From Wired coverage: Similar capabilities are already built into Apple’s G4 and G5 Macs, which have a floating-point co-processor called AltiVec, which handles complex, data-intensive calculations for the main processor. But whereas AltiVec is four-way parallel, ClearSpeed’s chip is 64-way, the company said. “You might class it as a big evolutionary step of AltiVec,” said Mike Calise, ClearSpeed’s president. The second generation of the chip will be 128-way parallel, and then 256, and so on, Calise said.

He said server manufacturers are looking at the chip with a view to building petaflop machines — monster supercomputers capable of a quadrillion floating-point operations a second — or the equivalent of 25 Earth Simulators. A petaflop machine based on the second generation of the ClearSpeed chip would take up about 20 server racks, the company said.

Calise said computer manufacturers are very excited about the new chip.

“Right now it’s awe, shock and when can I get my hands on it?” Calise said.

ClearSpeed said the new chip is also very low-power, operating at about 2 watts, which would allow it to run off a laptop battery and wouldn’t require special cooling. “At 3 watts, you could put it in a PCMCIA card,” said McIntosh-Smith. “With two chips on a PC Card, you can have 50 gigaflops on a laptop, running off a battery. That’s equivalent to a small Linux cluster on your notebook.” McIntosh-Smith said that down the line, a PC Card with a pair of second-generation chips would perform at about 200 gigaflops, which is equivalent to a big Linux cluster and would nearly qualify the laptop for today’s Top 500 supercomputers list.

Appropriately, the chip will be described at the Microprocessor Forum during a discussion of extreme processors.

Though supercomputer performance on a desktop machine may seem like overkill, Calise said there is ever-growing demand in science, government and industry, especially Hollywood, for more-powerful computers.

“If everything they say is true, they really do kick butt,” said Will Strauss, an analyst with Forward Concepts of Tempe, Arizona. “The proof of the pudding is in the eating, of course, but they do have a very well-thought-out architecture.”

Strauss said the PCMCIA card intrigued him. “It’s the first time I’ve seen a reasonable way to get that much power into a laptop,” he said. “That it is low-power enough to bring that kind of processing power to a laptop is remarkable.”

Strauss warned that writing software for the chip’s complex architecture might be a stumbling block, but the company has assured him that its compiler makes it easy to program.

“It’s a refreshing new approach to high-powered chips, and they seem to be pretty well ahead with it, too,” Strauss said. “I’m pretty impressed. I’ve seen lots of things like this over the years, but this breaks new ground.”