New Anvil Object Storage now available to researchers
The Rosen Center for Advanced Computing (RCAC) is excited to announce the arrival of Anvil Object Storage, the newest storage service for Purdue’s Anvil supercomputer. Anvil Object Storage is designed to support large-scale, data-intensive research and enhance modern scientific workflows. By including Anvil Object Storage in its suite of high-performance storage solutions, Purdue is continuing to enable scientific discovery and innovation at scale throughout the nation.
Object storage
has exploded in popularity in recent years due to its ability to deliver virtually unlimited storage scalability. Unlike traditional file- or block-based storage, object storage treats each individual piece of data as a self-contained object. Every object contains three things—the data itself, the metadata, and a universally unique identifier—and is stored in a structurally flat container known as a “bucket.” Any object within the bucket can be quickly retrieved and analyzed (regardless of file type) based on the custom metadata. Thanks to this unique architecture, object storage allows users to store and manage extremely high volumes of unstructured data while retaining speed and data availability.
“Traditional filesystems can face significant performance bottlenecks when scaling to tens or hundreds of millions of files,” says Erik Gough, Senior Research Scientist at RCAC. “The performance impacts can be felt by all users of a shared filesystem when a single user is performing millions of metadata operations simultaneously. Object storage is designed specifically to eliminate this contention on filesystem metadata servers at large scale.”
Object storage is ideal for researchers working with large volumes of static data, such as those leveraging artificial intelligence (AI) and machine learning (ML) methods in their work. Typical use-cases for object storage include creating and utilizing scientific data lakes, instantly accessing ML/AI training datasets, and archiving large volumes of rich media content. With Anvil Object Storage, researchers now have access to a storage pool that can scale to petabyte levels without the metadata performance bottlenecks typically found in traditional file systems, allowing for the storage of billions of items without performance degradation.
“With Anvil Object Storage,” says Gough, “we are giving Purdue and national researchers a powerful new storage tier that is optimized for large-scale datasets that are required for AI and LLM training. We’re excited to see how this new storage tier impacts scientific discovery on Anvil.”
Anvil Object Storage is a 1-petabyte (PB) high-performance, software-defined storage tier integrated directly into the Anvil Composable Subsystem. The new storage service is powered by Ceph, providing scalable, all-flash S3-compatible storage to Anvil users. Researchers can create dedicated storage buckets to securely store, access, and manage their data using familiar S3 client tools such as rclone or s3cmd. The service provides a REST-based API endpoint accessible from Anvil systems and supports flexible access controls for modern data-intensive workflows.
Anvil Object Storage is specifically optimized to host the massive image, text, and sensor datasets required for training Large Language Models (LLMs) and complex computer vision systems. The storage is directly accessible from Anvil’s nodes (including NVIDIA A100 and H100 nodes), facilitating rapid data ingestion for AI training and model hosting.
Access is currently available upon request, subject to available capacity, with a total system capacity of 1PB. To request access, please submit a ticket through the ACCESS or NAIRR ticketing system.
Anvil is one of Purdue University’s most powerful supercomputers, providing researchers from diverse backgrounds with advanced computing capabilities. Built through a $10 million system acquisition grant from the National Science Foundation (NSF), Anvil supports scientific discovery by providing resources through the NSF’s Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS), a program that serves tens of thousands of researchers across the United States. Anvil also supports advanced artificial intelligence research as an official resource provider of the National Artificial Intelligence Research Resource (NAIRR) Pilot.
Researchers may request access to Anvil via the ACCESS allocations process or through the NAIRR allocations process. More information about Anvil is available on Purdue’s Anvil website. Anyone with questions should contact anvil@purdue.edu. Anvil is funded under NSF award No. 2005632.
Written by: Jonathan Poole, poole43@purdue.edu