Annual Review: RCAC support for life sciences in 2025
Throughout the past year, the team at the Rosen Center for Advanced Computing (RCAC) has been dedicated to expanding the training and cyberinfrastructure available for life sciences research at Purdue. The goal of these new initiatives is to build a robust user community and enable high-impact life sciences research across campus.
High-performance
computing (HPC) resources have proven indispensable for bioinformatics research due to several factors, including the massive size of the datasets used in life sciences, the computing power required for numerous research activities (analyzing DNA sequences for disease-related genes, cryo-EM based 3D reconstruction, etc.), and regulatory requirements. Life scientists have increasingly faced complex HPC environments, workflow tools, and large datasets due to a manifest shift in life sciences research, relying more heavily on sequencing data. And while having access to the computing hardware is essential, it's not enough—researchers also need to know how to use it.
Among various life sciences initiatives led by RCAC staff, RCAC scientists especially elevated support for Bioinformatics. Bioinformatics as a research discipline sits at the crossroads of biology and computer science. Often related to genetics and genomics, though not limited to this subfield, bioinformatics seeks to analyze and interpret biological data, which is typically packaged in large and complex datasets. Headed by Dr. Arun Seetharam, Lead Bioinformatics Scientist at RCAC, the development of the new bioinformatics support services has successfully contributed to broadening awareness and increasing adoption of HPC. Seetharam’s main focus for achieving this goal has been multifold: expand the bioinformatics software catalogue available on RCAC machines, provide world-class training appropriate for bioinformatics researchers and HPC users of all levels, and partner with researchers to enable new cutting-edge research.
“Our goal has been to make advanced bioinformatics accessible to every researcher at Purdue, regardless of their background,” says Seetharam. “Modern genomics workflows only reach their full potential when paired with high-performance computing, so our focus is on giving scientists not just the software, but the confidence and skills to run these analyses efficiently and reproducibly.” Seetharam continues, “Researchers often tell us that the hardest part isn’t the analysis itself, but navigating the tools and computing environments needed to do it well. By expanding our software catalog and building structured, hands-on training, we’re lowering that barrier and helping labs move from raw data to results much more quickly.”
In order to ensure compliance with data management and sharing policies that went into effect in 2025, RCAC deployed the Rossmann compute cluster and storage environment. As a NIST 800-171–aligned research resource, Rossmann features appropriate computing and data environments for research that involves level-3 restricted data, such as data subject to the NIH Genomic Data Sharing (GDS) policy and a spectrum of licensed data. Furthermore, RCAC launched a data management and facilitation service to ensure organized retention of research data, automate workflows, and guarantee researchers reliable access to raw results for current work and future needs.
Thanks to the efforts of several RCAC team members, RCAC now offers a diverse portfolio of Life Sciences Services to Purdue researchers. Some highlights of the team’s accomplishments from 2025 include:
- Software & infrastructure
- Deployed Rossmann Community Cluster that is optimized for researchers running applications subject to heightened security requirements, such as data subject to the NIH Genomic Data Sharing (GDS) policy and licensed data.
- Launched data management and facilitation to support the Bindley core facilities in establishing data management pipelines to transfer research data to appropriate storage and archival systems.
- Deployed and actively maintain 750+ biocontainer modules across Negishi, Bell, Gautschi, Scholar, and Rossmann clusters.
- Built and deployed specialized environments, including RStudio biocontainer images and CellProfiler OOD applications.
- Provide ongoing workflow enablement and troubleshooting for Nextflow, Snakemake, and nf-core pipelines on RCAC systems.
- Expanded Globus data transfer and sharing tool to include High Assurance collections appropriate for Protected Health Information, Personally Identifiable Information, and Controlled Unclassified Information.
- Datasets
- Updated iGenomes datasets are available as cluster-wide modules, providing standardized reference genomes for commonly studied species and eliminating the need for users to download or manage their own copies.
- Maintain centrally updated BLAST databases (nr, nt, RefSeq, SwissProt, and related sets) on all clusters to support high-throughput homology searches and downstream annotation workflows
- Training, workshops & community programs
- Genomics Exchange (Spring 2025): 13 sessions; ~7 attendees/session; 77 cumulative participants.
- Genomics Essentials (Fall 2025): 12 sessions; 10–20 attendees per session; 10 sessions delivered; 100+ total participants.
- RNA-seq Workshop (Nov 20): served a large group of 35 researchers.
- Orientation for Biologists (Oct 16): Introductory HPC training session for 22 new life sciences researchers.
- Cross-institutional engagement
- Upcoming Midwest Bioinformatics Showcase (Spring 2026), a joint Purdue–Northwestern seminar series highlighting graduate and postdoctoral genomics researchers across the Midwest and fostering cross-institutional dialogue, professional development, and HPC-enabled research. Abstract submissions open through January 16.
- Delivered plenary talk, titled “Responding to NIH Requirements for Controlled-Access Data” at the 2025 Trusted CI NSF Cybersecurity Summit.
- Campus community building
- RCAC Genomics Discord server, now 50+ members, serving as a centralized communication and support hub for Purdue genomics researchers.
- Proposal support for 7 faculty (NSF, NIH P30/R35/MIRA/R01, NASA), including estimation of cost, consultation on sequencing strategy, workflow feasibility assessments.
- Documentation & resources
- A newly launched RCAC Bioinformatics Tutorials Site, a comprehensive collection of bioinformatics guides, installation instructions, and RCAC-optimized best-practice workflows.
- RCAC Bioinformatics Resources page, which consolidates current and past workshops, training materials, and community programs.
While 2025 has been extraordinarily productive for the Life Sciences team at RCAC, Seetharam noted that there is no intent to slow down. The new year will usher in a new collection of Genomics Exchange workshops, running January through April. Also on the schedule is the Midwest Bioinformatics Showcase, a joint series between Purdue and Northwestern University, scheduled for the Spring 2026 semester. This seminar series will offer training and presentations led by select bioinformatics speakers from multiple institutions, with a goal of providing researchers the ideas, tools, and instruction they need to make an impact and ignite bioinformatic research in the Midwest region.
“The Midwest Bioinformatics Showcase is an opportunity to highlight emerging researchers and strengthen connections across institutions,” says Seetharam. “By bringing together students, postdocs, and faculty working on genomic and computational projects, we’re creating a space for collaboration that extends well beyond a single seminar series.”
To stay apprised of upcoming life sciences news and trainings, please subscribe to our RCAC newsletter. For more information on our bioinformatics and computational biology services, please visit our Computational Biology Services page.
The Life Sciences Services at Purdue’s RCAC provides expert support for researchers across a broad spectrum of research, including genomics, genetics, and computational biology, and any other data- and/or compute-intensive life sciences research. The team assists with complex biological data analysis, including NGS, transcriptomics, and large-scale data management. Their goal is to equip scientists with the tools and expertise to advance their research and achieve impactful results using advanced computing systems and tools.
RCAC operates all centrally-maintained research computing resources at Purdue University, providing access to leading-edge computational and data storage systems as well as expertise and support to Purdue faculty, staff, and student researchers. To learn more about HPC and how RCAC can help you, please visit: https://www.rcac.purdue.edu/ or reach out to rcac-help@purdue.edu to request consultation.
Written by: Jonathan Poole, poole43@purdue.edu