"
Striping data across multiple nodes has been recognized as an effective technique for delivering high-bandwidth I/O to applications running on clusters. However the technique is vulnerable to disk failure, and a number of research efforts have focused on ways to improve reliablity of striped storage systems at a minimal cost in terms of performance and scalability. In this paper we present a novel I/O architecture for clusters called Reliable Array of Autonomous Controllers (RAAC) that builds on the technique of RAID style data redundancy. The RAAC architecture uses a two-tier layout that enables the system to scale in terms of storage capacity and transfer bandwidth while avoiding the synchronization overhead incurred in a distributed RAID system. We describe our implementation of RAAC in PVFS, a popular parallel file system for Linux clusters. We compare the performance of parity-based redundancy in RAAC and in a conventional distributed RAID architecture using microbenchmarks as well as recognized parallel benchmarks and application kernels.