Skip to main content
Have a request for an upcoming news/science story? Submit a Request

RCAC’s HyperShell software decreases time to science for animal infectious disease researcher

  • Science Highlights

One Purdue researcher is using an innovative piece of software developed by Rosen Center for Advanced Computing (RCAC) staff to dramatically reduce the time he spends on computations.

Jonathan Brooks, a postdoctoral research assistant in forestry and natural resources and comparative pathobiology, focuses his research on bovine tuberculosis in cattle, deer and small carnivores in Indiana. There have been isolated cases of bovine TB in southeastern Indiana but the disease comes and goes and the patterns of infection are not well understood. Brooks’ research focuses on developing disease models to help guide the Indiana Department of Natural Resources’ bovine TB surveillance efforts. These efforts help to ensure that wildlife and cattle in Indiana remain free of bovine TB.

Brooks developed his models in R statistical analysis software and since he had over 1,000 different parameter combinations he needed to simulate, he sought out help from the RCAC's coffee hour consultations, where student and faculty researchers can connect with RCAC’s expert staff about computational and data management issues.

After describing his issues to the staff at Coffee Hour, Brooks was connected to RCAC lead research data scientist Geoffrey Lentner, who has developed software known as HyperShell that is designed to divide and conquer large volumes of discrete tasks to assist with scheduling and managing what’s known as “many task computing” where the supercomputer is performing a very large number of small, independent tasks.

Traditional HPC job schedulers like Slurm are not well-suited to managing this kind of computing, which led Lentner to build HyperShell. HyperShell, which is written in Python, is ideal for researchers like Brooks who are doing large volumes of data analysis and processing.

“HyperShell has allowed us to take our code and parallelize it on the Bell cluster and get our results much faster,” says Brooks.

Without HyperShell, the R package used to build Brooks’ models doesn’t have the capacity to parallelize the code. As a result, Brooks’ models would have taken over 5,000 hours to run sequentially. HyperShell reduced that time to just 24 hours, a more than 200-fold reduction in wait time.

“I really appreciate RCAC and the support that they provide, and how easy they are to work with” says Brooks, who has continued to check in with RCAC staff as his work progresses to ensure his computational workflows are optimized.

Lentner and former RCAC staff member Lev Gorenstein presented an early version of HyperShell at the 2022 Practice and Experience in Advanced Research Computing (PEARC) conference, and won the best poster award for their poster about it.

Learn more about HyperShell here, or contact rcac-help@purdue.edu for more information.

Originally posted: