Overview

The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs.

We present PARLO, a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.

Authors

Z. Gong, D. Boyuka, X. Zou, Q. Liu, N. Podhorszki, S. Klasky, N. F. Samatova.

Acknowledgement

Funding

Publications

  1. Z. Gong, D. Boyuka, X. Zou, Q. Liu, N. Podhorszki, S. Klasky, N. F. Samatova,
    "PARLO: PArallel Run-time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Patterns",
    IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, The Netherlands, May, 2013. [pdf]

Contact

Dr.Nagiza Samatova (samatova@csc.ncsu.edu)

News

July, 2013:

PARLO code Alpha release. [parlo-0.1.0.tar.gz]