Anvil supports BigCARE 2024 Summer Workshop
Purdue University’s Anvil supercomputer recently supported the 2024 BigCARE Summer Workshop, which took place at the University of California, Irvine (UCI). The workshop was a two-week intensive class aimed at helping cancer researchers develop skills for managing, visualizing, analyzing, and integrating various types of omics data in cancer studies, specifically for those who are inexperienced in big data science. Anvil was integral to the workshop, providing the attendees with access to a high-performance computing (HPC) resource designed to have a low barrier of entry for newcomers.
The Big Data Training for Cancer Research (BigCARE) workshop is a program funded by the National Cancer Institute (NCI). It was founded in 2020 by Min Zhang, MD, PhD, a professor of epidemiology and biostatistics at the UCI’s Joe C. Wen School of Population & Public Health, and the biostatistics shared resources director for the UCI Chao Family Comprehensive Cancer Center. Dr. Zhang recognized a need for specialized HPC and Big Data training for cancer researchers and designed BigCARE to provide for that need. This year’s workshop focused on analyzing and interpreting genomic and genetic data, including transcriptomic analyses, epigenomic analyses, genome-wide association analyses, and network analyses. Thanks to supplemental funding from National Institute of Allergy and Infectious Diseases (NIAID), the workshop also covered COVID and microbiome data analysis by introducing infectious and immune-mediated disease-related data sets, a first for BigCARE.
“During the previous big data workshops I organized,” says Zhang, “participants faced significant challenges as they had to navigate both the command line interface and the R programming environment, which often led to difficulties as most participants have limited computing skills. Anvil’s powerful computing capabilities allow participants to handle large-scale omics data more efficiently, making analysis of next-generation sequencing data more accessible.”
Anvil’s role in the BigCARE workshop was to provide HPC resources through Open OnDemand and Jupyter Notebooks, which limits the need for in-depth knowledge of command line interfaces or HPC server environments. The course material was developed as Jupyter notebooks, and thanks to Open OnDemand, the researchers had direct web access to the notebooks. All of this equated to a low barrier of entry for the workshop participants.
Aside from providing the hardware and software needed to run the workshop, Anvil added value to BigCARE through the user support provided by the RCAC (Rosen Center for Advanced Computing) team. Before the start of the workshop, the Anvil team created a custom Open OnDemand-Jupyter deployment that automatically handled all of the course set-up and creation of environments, taking away a lot of the typical HPC work required by the participants of such classes. Eric Adams, a Senior Manager for User Support, and Ryan DeRue, a Senior Computational Scientist, also attended the event at UCI to help with any support tasks that arise.
“The interactive interface provided by Anvil's Open OnDemand deployment gave us a platform to collaborate with the researchers designing the workshop and create custom Jupyter applications for each unit of the workshop's curriculum,” says DeRue. “These custom applications allowed us to abstract away many of the nuanced environment creation and initialization commands from the participants who did not have a background in HPC. This allowed them to focus on the pieces that were relevant to their research--effectively decreasing the ‘time to science’ and allowing them to make use of an extremely capable national HPC resource without needing to be an expert in HPC.”
The year’s workshop was a huge success. Dr. Zhang and the attendees were thrilled by what they were able to accomplish during the two-week intensive, as well as how helpful both Anvil and the RCAC support team were. When asked about Anvil and Jupyter Notebooks in a post-course survey, attendees had this to say:
“I love that we could perform in a short time (minutes), processes that would have taken several days. Jupyter is user-friendly and very easy to use. I like that we could run bash and 'r' commands.”
“I like that the Anvil supercomputer offers high-performance computing capabilities for complex research projects, while the Jupyter Notebook provides an interactive and versatile environment for data analysis, visualization, and collaboration.”
“Easy to use, excellent technical support, and very fast operation.”
“I enjoyed the ease of use and the ability to run large datasets.”
“I thought the Jupyter Notebook was a good way to introduce coding in a nice environment, and using Anvil was a nice introduction to HPC.”
“It was fairly easy to set up an account; I really enjoyed having their IT team here to help answer questions and deal with any issues. I do think R Studio is my preferred method over Jupyter, but I also liked having the pre-filled information in the Jupyter notebooks, so I get how it is useful for a course like this. I think it worked really well for the course and allowed me to go through the material in a seamless way.”
Anvil was so helpful for the workshop that Dr. Zhang intends to renew it as the resource for supporting BigCARE for the foreseeable future. “We are pleased to announce that our R25 grant, ‘Big Data Training for Cancer Research,’ has been renewed by the National Cancer Institute for the next five years,” says Zhang. “We look forward to the continued fruitful collaboration with the Anvil group, leveraging their expertise to drive our program forward.”
More information about the BigCARE 2024 Summer Workshop can be found on UCI’s “Big Data Training for Cancer Research” webpage. Information about the Anvil supercomputer can be found on Purdue’s Anvil Website.
For more information regarding HPC and how it can help you, please visit our “Why HPC?” page. Anvil is funded under NSF award No. 2005632. Researchers may request access to Anvil via the ACCESS allocations process.
Written by: Jonathan Poole, poole43@purdue.edu