"
Data-driven scientific applications have become an important application class that can benefit from high-performance and distributed computing. Examples of these applications include applications that explore a parameter space to understand a physical phenomenon and applications that integrate and analyze data through a sequence of processes. A key challenge is the storage and management of input and output data in a distributed environment. In this paper, we describe a middleware framework to address this problem. In this framework, applications define the structure of their input and output data using XML schemas. The framework provides support for 1) registration, versioning, management of schemas, and 2) management of storage, querying, and retrieval of instance data corresponding to the schemas in distributed databases. We carry out an experimental evaluation of the system on a set of PC clusters connected over wide- and local-area networks.