Block-Parallel Data Analysis with DIY2

LDAV: Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV), 2016.
LBNL: LBNL Tech Report LBNL-1005149, 2016.
PDF LDAV
PDF LBNL Tech Report
Code: DIY2
Abstract
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on complete analysis codes.