Scaling Generalized N-Body Problems, A Case Study from Genomics
This work examines a data-intensive irregular application from genomics that represents a type of Generalized N-Body problems, one of the “seven giants” of the NRC Big Data motifs. In this problem, computations (genome alignments) are performed on sparse data-dependent pairs of inputs, with variable cost computation and variable datum sizes. Unlike simulation-based N-Body problems, there is no inherent locality in the pairwise interactions, and the interaction sparsity depends on particular parameters of the input, which can also affect the quality of the output. We build-on a pre-existing bulk-synchronous implementation, using collective communication in MPI, and implement a new asynchronous one, using cross-node RPCs in UPC++. We establish the intranode comparability and efficiency of both, scaling from one to all core(s) on node. Then we evaluate the multinode scalability from 1 node to 512 nodes (32,768 cores) of NERSC’s Cray XC40 with Intel Xeon Phi “Knight’s Landing” nodes. With real workloads, we examine the load balance of the irregular computation and communication, and the costs of many small asynchronous messages versus few large-aggregated messages, in both latency and overall application memory footprint. While both implementations demonstrate good scaling, the study reveals some of the programming and architectural challenges for scaling this type of data-intensive irregular application, and contributes code that can be used in genomics pipelines or in benchmarking for data analytics more broadly.