Projects

News & Events

Purdue co-hosts regional NSF AI workshop
- July 8, 2026
Anvil used to advance AI 3D object detection
- June 30, 2026

Current REU Projects

In each research project, students will work closely with two or more members of our staff. The projects will be in a wide variety of areas, including but not limitied to various High Performance Computing (HPC) & Research (RC) topics:

AI & machine learning
Data analytics
Dasboards & visualization
HPC software deployment & solutions
Interface design & user experience
HPC benchmarking & workflow scaling
Scientific & eduacational applications solutions
Containerization
and more!

Summer 2026 Projects

Project #1: Exploring Quantum AI with Anvil

Step into the future of computing with this hands-on project that connects Quantum Computing, Artificial Intelligence (AI), and High-Performance Computing (HPC)! As part of the Anvil REU project team, you'll use Open OnDemand to create interactive tools and learning materials that make cutting-edge Quantum Computing and AI more accessible for students, researchers, and professionals in Science, Engineering, and Medicine (SEM). You'll explore how classical and quantum computing can work together, through quantum simulation, machine learning, and optimization, to tackle some of the biggest challenges in modern research. A working prototype is expected to debut in Spring 2026 to support the Quantum Computing for Materials Science and Chemistry (QC4MC) Summer School. Your contributions will help shape a larger NSF CyberTraining project proposal (QAI4SEM), expanding the impact of Quantum AI education and research nationwide. This is an exciting opportunity to: (1) Build real tools that advance Quantum AI learning and research, (2) collaborate with experts in HPC, AI, and Quantum Computing, and (3) gain experience that bridges computing, science, and innovation.

Requirements: Basic experience with Linux command-line environments, familiarity with Python programming, exposure to Open OnDemand and Slurm workload management, understanding of Git/GitHub for version control and collaboration, foundational knowledge of Artificial Intelligence (AI) and Machine Learning (ML) concepts, and introductory understanding or interest in Quantum Computing.

Prior coursework or hands-on experience in any of these areas is helpful, but curiosity and a willingness to learn are just as important. If there is no direct experience the ability to successfully complete a relevant course, for example, Coursera's Applied Machine Learning in Python and IBM Learn Quantum Computing, before the start of the REU program is required.

Pre-REU Trainings:

Project #2: SciAgents: AI Agents for Scientific Discovery and Reproducibility

Imagine a world where science moves faster, and smarter, thanks to AI working alongside researchers. That's the vision behind SciAgents, a next-generation system of AI-powered agents designed to help scientists discover, test, and validate new ideas more efficiently. As part of this project, you'll help build an integrated network of specialized AI agents that collaborate to perform end-to-end research workflows. Each agent has a unique role: (1) Literature Agent – analyzes research papers to extract key experimental details, (2) Data Agent – discovers, organizes, and preprocesses scientific datasets, (3) Experiment Agent – designs and runs computational experiments, and (4) Synthesis Agent – interprets results and generates new scientific insights. To take it even further, you'll help develop an Interactive Reproducibility Checker, a tool that reads a paper's methods section and tries to recreate the experiment automatically. It flags missing information like software versions, hyperparameters, or unclear data splits, benchmarking each workflow against established standards such as the ACM Artifact Review, FAIR Principles, and the ML Reproducibility Checklist. By bringing together discovery, experimentation, and validation in one intelligent system, SciAgents aims to revolutionize how scientific research is conducted and verified. This project is perfect for students who want to: Work at the intersection of AI, data science, and research innovation, contribute to tools that promote transparency and reproducibility in science, and gain hands-on experience building multi-agent systems with real-world impact.

Requirements: Familiarity with foundational data analysis and AI/ML methods and frameworks, basic understanding of Python programming, exposure to LangChain or LlamaIndex (or a willingness to learn), understanding of APIs (e.g., semantic search, literature mining) and working with JSON schemas, and experience using Git/GitHub for version control and collaboration.

Prior experience with large language models (LLMs) is a plus, but not required. Candidates without direct LLM experience may still qualify if they demonstrate strong foundational skills or successfully complete a relevant course or documentation, for example, Coursera's Applied Machine Learning in Python or LangChain, before the start of the REU program.

Pre-REU Trainings:

Project #3: Elastic Cloud Scheduling for Anvil

Help push the limits of high-performance computing by making GPU use more efficient and flexible on the Anvil supercomputer. Currently, Anvil can convert entire compute nodes into Kubernetes workloads, but GPU nodes, often equipped with multiple GPUs, must be allocated as a whole. In this project, you'll develop a fine-grained scheduling system that enables single-GPU allocation across heterogeneous compute environments and multiple Kubernetes clusters. Your work will help researchers use powerful GPUs more effectively, reducing waste and improving access for AI, machine learning, and data-intensive applications. This project is perfect for students interested in HPC, cloud computing, or systems engineering who want hands-on experience developing real solutions for large-scale research computing.

Requirements: Familiarity with Linux environments and basic Bash scripting, experience with a scripting language such as Python, understanding of Kubernetes concepts and configuration, exposure to Crossplane, Interlink, and DGX SPARKS (or a willingness to learn), knowledge of Slurm workload management and HPC job scheduling, and experience using Git/GitHub for version control and collaboration.

Prior experience with container orchestration or cloud-native infrastructure is helpful but not required; curiosity and a desire to learn new technologies are key. Candidates without direct experience may still qualify if they demonstrate strong foundational skills or successfully complete a relevant course or documentation, for example, Codecademy – Learn Python, before the start of the REU program.

Students must be able to complete the following trainings prior to the start of the REU program:

Project #4: Building a Scalable Data Federation for Anvil Using the Pelican Platform

Unlock the power of data federation and distributed computing in this hands-on project using the Pelican Platform. As an REU student, you'll deploy infrastructure to federate data access on Anvil, providing a unified endpoint to scientific data hosted across Anvil's multiple petabyte (PB) scale distributed storage systems. This project uses Pelican as an abstraction layer to connect multiple, technologically diverse storage resources as unified data origins, enabling seamless access and integration across diverse computing environments. Leveraging Pelican's infrastructure for data distribution and caching, you'll develop example workflows and container-based analysis pipelines that are generalizable and reproducible to support the usage of the Anvil data fabric for AI and scientific computing. This project is ideal for students interested in systems engineering, data science, distributed computing, and workflow automation, especially those eager to build tools that make large-scale, data-driven research more efficient, connected, and reproducible.

Requirements: Basic understanding of Linux or UNIX environments, including basic system administration tasks (e.g., file permissions, process management, software installation), experience with at least one scripting or programming language (e.g., Python, Bash), familiarity with version control tools such as Git/GitHub, interest in data management, storage systems, or high-performance computing (HPC), interest in or exposure to the Pelican platform, and strong attention to documentation and reproducibility practices.

Candidates without direct experience may still qualify if they demonstrate strong foundational skills or successfully complete a relevant course or documentation, for example, Google's Python Class or Learn Python and an understanding of the Pelican Platform, before the start of the REU program.

Students must be able to complete the following trainings prior to the start of the REU program:

Past REU Projects

Summer 2025

Project #1: Building a data warehouse to store and manage logs from data center and compute systems, integrating data sources and creating visual dashboards.

Project #2: Developing a dynamic web interface for building and deploying container workloads on Anvil' Features include Dockerfile/Singularity uploads, Git repository integration, and streamlined container management.

Project #3: Creating user-friendly workflow templates for essential bioinformatics tasks including RNA-seq, variant calling, and genome assembly. Templates include optimized SLURM job scripts and organized file structures for efficient resource utilization.

Project #4: Enhancing TicketHub to automatically generate FAQs and standard responses using NLP and large language models. Extracting key information from systems, past support requests, and documentation to streamline technical support processes.

Summer 2024

Streamlined Software project presentation slide showing automated software deployment on Anvil

Project #1: Streamlined Software: Automating user-requested software deployment on Anvil using agile technologies

Data dashboard project presentation slide showing data impact visualization

Project #2: Unlocking the Impact of Data: The Power of the Dashboard

Gromacs Gateway project presentation slide showing molecular simulation interface

Project #3: Gromacs Gateway: Creating User-friendly Molecular Simulations Online

AI-Powered Operational Data Analytics project presentation slide showing user experience enhancement

Project #4: AI-Powered Operational Data Analytics: Enhancing User Experience on Anvil

Summer 2023

Projects focused on a wide variety of areas, including data analytics, high performance computing, DevOps, and containerization.

XALT job-level monitoring integration with XDMoD project diagram

Project #1: Integrate the XALT job-level usage activity monitoring tool into XDMoD reporting for deeper analysis of workloads.

Cloud burst implementation from Anvil to Azure architecture diagram

Project #2: Implement direct cloud burst from Anvil to Azure for HPC and accelerator workloads based on the work with Microsoft in 2022.

Rancher to Azure Kubernetes Service connection architecture for elastic scaling

Project #3: Connect the Anvil composable subsystem’s Rancher management platform to Azure Kubernetes Service to support elastic-scaling of workloads for science gateway applications.

Jupyter notebook deployment solution for composable subsystem architecture

Project #4: Develop deployment solutions for the Jupyter notebook interactive computing platform on Anvil’s composable subsystem for education and training activities.

Summer 2022

Projects focused on improving data collection and reporting of the Anvil cluster to ensure quality of service, effective system utilization, and performance.

Project #1

While benchmarking compute and storage performance, students:

Learned concepts surrounding benchmarking of HPC systems, including HPL, HPCG, STREAM, and IO500
Measured performance of Anvil's compute nodes, GPU nodes, scratch, project, and BeeOND filesystems
Established baselines for system performance for a continuous measurement framework

Project #2

To improve data collection and reporting, students:

Configured multiple modules on Anvil's XDMoD instance and improved data collection and reporting of system metrics
PCP, SUPREMM
Open OnDemand data collection
Continuous measurement framework via Application Kernels

Project #3

For system environmental measurements, students:

Created monitoring of Anvil's power consumption at the node and rack levels using Prometheus and Grafana

Fall 2022

Enhance Anvil's cybersecurity posture, utilizing an NSF CICI-funded intrusion detection system to monitor aspects of Anvil's network and enable visualization-driven insights into network traffic and cybersecurity alerts. Students helped with:

Visualization of network traffic trends on Anvil and Purdue networks
Dashboard development based on Zeek IDS protocol logs (TCD, UDP, SSH, HTTP)
Port scanning
Per-host traffic visualization

Nondiscrimination Policy Statement

Purdue University prohibits discrimination against any member of the University community on the basis of race, religion, color, sex, age, national origin or ancestry, genetic information, marital status, parental status, sexual orientation, gender identity and expression, disability, or status as a veteran. The University will conduct its programs, services and activities consistent with applicable federal, state and local laws, regulations and orders and in conformance with the procedures and limitations as set forth in Purdue’s Equal Opportunity, Equal Access and Affirmative Action policy which provides specific contractual rights and remedies. Additionally, the University promotes the full realization of equal employment opportunity for women, minorities, persons with disabilities and veterans through its affirmative action program. View a more complete statement of Purdue's policies of equal access and equal opportunity. If you have any questions or concerns regarding these policies, please contact the Office of the Vice President for Ethics and Compliance at vpec@purdue.edu or 765-494-5830.

Anvil is supported by the National Science Foundation under Grant No. 2005632.