ASCI Q Clusters at LANL
From the architectural point of view, the Q-machine can be described as a cluster of Shared Memory Multiprocessors (SMP), which is expected to the deliver a peak performance in excess of 30 TeraOps (with approximately 12000 processors). Multiple independent QsNet network rails will interconnect the SMPs through their I/O ports (PCI or PCI-X).
An interesting feature of QsNet is the native support for collective communication.
Experimental results conducted on 64-node AlphaServer cluster with 256 processors indicate that the time to complete the hardware-based barrier synchronization on the whole network is as low as 6 mus, with very good scalability. Good latency and scalability are also achieved with the software-based synchronization, which takes about 15 mus. With the broadcast, similar performance is achieved by the hardware- and software-based implementations, which can deliver messages of up to 256 bytes in 13 mus and can get a sustained asymptotic bandwidth of 288 Mbytes/sec on all the nodes. The hardware-based barrier is almost insensitive to the network congestion, with 93% of the synchronizations taking less than 20 when the network is flooded with a background traffic of unicast messages.
| |  |