Overview

The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their runtime environments. The growing gap gets exacerbated by exploratory data-intensive analytics, such as querying simulation data for regions of interest with multivariate, spatio-temporal constraints. Query-driven data exploration induces heterogeneous access patterns that further stress the performance of the underlying storage system. To partially alleviate the problem, data reduction via compression and multi-resolution data extraction are becoming an integral part of I/O systems. While addressing the data size issue, these techniques introduce yet another mix of access patterns to a heterogeneous set of possibilities. Moreover, how extreme-scale datasets are partitioned into multiple files and organized on a parallel file systems augments to an already combinatorial space of possible access patterns.

To address this challenge, we present MLOC, a parallel Multi-level Layout Optimization framework for Compressed scientific spatio-temporal data at extreme scale. MLOC proposes multiple fine-grained data layout optimization kernels that form a generic core from which a broader constellation of such kernels can be organically consolidated to enable an effective data exploration with various combinations of access patterns. Specifically, the kernels are optimized for access patterns induced by (a) query-driven multivariate, spatio-temporal constraints, (b) precision-driven data analytics, (c) compression-driven data reduction, (d) multi-resolution data sampling, and (e) multi-file data partitioning and organization on a parallel file system. MLOC organizes these optimization kernels within a multi-level architecture, on which all the levels can be flexibly re-ordered by user-defined priorities. When tested on query-driven exploration of compressed data, MLOC demonstrates a superior performance compared to any state-of-the-art scientific database management technologies.

Authors

Z. Gong, T. Rogers, J. Jenkins, H. Kolla, S. Ethier, J. Chen, R. Ross, S. Klasky, N. F. Samatova.

Acknowledgement

Funding

Publications

  1. Z. Gong, T. Rogers, J. Jenkins, H. Kolla, S. Ethier, J. Chen, R. Ross, S. Klasky, N. F. Samatova,
    "MLOC: Multi-level Layout Optimization Framework for Compressed Scientific Data Exploration with Heterogeneous Access Patterns",
    International Conference on Parallel Processing (ICPP), Pittsburgh, PA, September, 2012. [pdf]

Contact

Dr.Nagiza Samatova (samatova@csc.ncsu.edu)

News

July, 2013:

MLOC code Alpha release. [mloc-0.1.0.tar.gz]