Shantenu Jha receives DOE Advanced Scientific Computing Research Award

Professor Shantenu Jha is the PI of a Department of Energy (DOE) ASCR award to Rutgers, along with Brookhaven National Laboratory (lead), Oak Ridge National Laboratory and University of Texas for the project titled, "Workflow Management on Titan for High Energy and Nuclear Physics and for Future Extreme Scale Scientific Applications”. The 2 year project is funded at $2M, and Rutgers will receive $565K and be responsible for research into workload management and placement. Working with the ATLAS experiment — one of two major experiments at CERN in Geneva, this project will investigate new and scalable high-performance and distributed computing methods to federate DOE leadership computing facilities with the Large Hadron Collider (LHC) Grid. The research outcomes and solutions from this project are likely to guide the design and implementation of future computing infrastructure that CERN will employ as it plans for Run 3 and Run 4 of the LHC, producing thousands to million times greater volumes of complex data.

Congratulations on yet another high profile project Shantenu!

A detailed abstract follows:

Scientific priorities in High Energy and Nuclear Physics continue to serve as drivers of integrated computer and data infrastructure. The lack of scalable and extensible workload management capabilities across heterogeneous computing infrastructure, however presents a barrier to the scientific progress.

Our approach will demonstrate integration of non-traditional, data-intensive, high-throughput workloads and traditional
compute-intensive workloads within leadership computing facilities, and yield important physics simulations and data analysis that would otherwise be impossible or far too slow for the rapidly increasing pace of data collection at the Large Hadron Collider (LHC).

The proposed solution will provide an important model for future exascale computing, increasing the coherence between the technology base used for high-performance, scalable modeling and simulation and that used for data-analytic computing. This work represents important conceptual advances and novel capabilities for workload management.

This project will translate the research artifacts into OLCF operational advances and enhancement. We propose to deploy and bring into production BigPanDA workflow management techniques on the Oak Ridge Leadership Computing Facility (OLCF) Titan supercomputer. This will significantly impact scientific communities in High Energy and Nuclear Physics, and beyond, for current and future leadership computing facilities.