Scheduling Dynamic Parallelism on Accelerators
Resource management on accelerator based systems is complicated by the
disjoint nature of the main CPU and accelerators, which involves separate
memory hierarhcies, different degrees of parallelism, and relatively high
cost of communicating between them. This study addresses the problem of
orchestrating and scheduling parallelism at multiple levels of granularity.
We present mechanisms and policies for adaptive exploitation and scheduling
of layered parallelism on the accelerator-based architectures. Our policies
combine event-driven task scheduling with malleable loop-level parallelism,
which is exploited from the runtime system whenever task-level parallelism
leaves idle cores. We focus on the IBM Cell processor - a representative
of accelerator-based architectures. We investigate performance with RAxML -
a bioinformatics application which infers large phylogenetic trees, using the
Maximum Likelihood method. Our experiments show that the accelerator-based
architectures benefit significantly from dynamic methods that selectively exploit
the layers of parallelism in the system, in response to workload fluctuation. Our
scheduler outperforms the MPI version of RAxML, scheduled by the Linux kernel,
by up to a factor of 2.6.
Filip Blagojevic is a Research Scientist in the Future Technologies Group, Lawrence Berkeley National Laboratory. He received a PhD degree in Computer Science from Virginia Tech in 2008, for the research performed in the area of "Scheduling for Asymmetric Architectures". Prior to obtaining a Ph.D. degree, Dr. Blagojevic received an M.S. Degree in Computer Science from the College of William and Mary (2005) and a B.S. degree in Mathematics from the University of Belgrade (2002). His professional interests include, but are not limited to the following: Adaptive Scheduling for Asymmetric
Systems, Power-Aware/Energy-Aware Execution, Emerging Accelerator-Based Architectures: Cell BE, GPGPU, PGAS Languages.