Monday, August 29, 2011

200 000 disk drives in a 120-petabyte clusters

For one hitherto unknown customers, IBM has developed a storage cluster that can hold 120 petabytes of data. There will be a new supercomputer for simulations to one side. The still unnamed system summarizes 200 000 hard drives. To accommodate them all, the developers of
IBM Almaden Research in used to be high on racks in width and with horizontal slots. The plates are cooled with water instead of fans.

Since in such a cluster of regular hard drives are inevitable failures of individual drives, also had a new strategy for resiliency ago: Instead of using the system works with the calculated parity distributed copies of data that can access the supercomputer without performance loss. Meanwhile, the defective copies are restored to the replaced boards in the background with low priority. More records than the average should turn out normal, the system increases the speed of recovery in order to counter a potential data loss.

Crucial in such large systems, however, is the speed of access to data and file system because the increase in hard drive performance has long ceased to keep up with the processors and supercomputers. IBM therefore spans all the disks in the cluster's own GPFS (General Parallel File System). To speed up access to individual files distributed across multiple drives, which can then read or write in parallel. Additional file system indexes avoid the hassle of scanning volumes in search of individual files. Just last month, an IBM team with 10 billion files in 43 minutes a new record in the file system access set up . For the GPFS needs but on the new system only two PByte for file management.



No comments:

Post a Comment