Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Third generation of Data Depot Filesystem Storage Service is now online!

The Rosen Center for Advanced Computing (RCAC) is excited to announce that its Data Depot Storage Service has received an upgrade and is available for researchers. This is the third generation of the Data Depot filesystem, which offers research groups a centralized service to hold and work with their research data. Along with the system upgrade, RCAC is also pleased to announce a lower rate for Data Depot of $47 per terabyte (TB) per year. Researchers can now take full advantage of this upgraded, world-class storage solution at a fraction of its previous cost.

Powered by IBM Storage Scale filesystem via partnership with Dell Technologies and DataCore Nexus, the Data Depot uses an enterprise-class GPFS storage solution with a total capacity of 10 petabytes (PB). This storage is redundant and reliable, features regular snapshots, and is globally available on all RCAC systems. The Data Depot is a non-purged space suitable for tasks such as sharing data, editing files, developing and building software, and many other uses.

The Data Depot facilitates joint work on shared files across research groups, avoiding the need for numerous copies of datasets across individuals' home or scratch directories. It is an ideal place to store group applications, tools, scripts, and documents. With this upgrade, Data Depot’s expanded capabilities will allow researchers to accomplish even more with their data. For example, massive datasets, often required for advanced artificial intelligence (AI) workloads, will be able to be stored and quickly retrieved for immediate user access. This allows researchers to conduct studies of a larger scope and scale than previously possible, while also enabling a faster time-to-science.

Improved Storage and Performance:

The third generation of Data Depot has been upgraded to better support the emerging needs of today’s research environment. The Data Depot now includes:

  • 10 PB total capacity, with 2 PB of flash to maximize performance
  • Low latency, Remote Direct Memory Access (RDMA) connectivity to HPC systems, to better enable AI workloads
  • Total raw performance of 500 GB/sec, a 25x increase in total performance
  • A powerful data tiering policy engine
  • Cloud bursting and syncing tools available

These enhancements to Data Depot’s capabilities are now stacked on top of the features that over 700 research labs have come to depend on to enable their science. Features like Depot’s persistent (not purged) storage, redundancy, and self-service management of research lab permissions allow a PI to effectively govern and manage their lab’s data.

Data Depot is accessible as a Windows or Mac OS X network drive on personal and lab computers on campus, directly on Community Cluster nodes, or from other universities or labs through Globus. No matter where a research group works, Depot provides ubiquitous access to the lab’s data.

Higher performance, lower price

Along with changes to its storage capabilities, Data Depot v3 now offers more affordable pricing. The new price for Data Depot is $47 per TB per year, down from $70 per TB/year. Thanks to Purdue’s purchasing power and innovations in storage hardware, the third generation of Depot has resulted in one that is bigger, faster, and cheaper for Purdue researchers to use. Data Depot is available to any Purdue research group as an annual purchase in increments of 1TB. For groups who are unsure or would like to experience before buying, RCAC offers a 100 GB trial space, free of charge. Note: Participation in the Community Cluster program is not required. To purchase capacity on Data Depot, please visit RCAC’s dedicated Purchase Page.

A complete ecosystem for research data

The Data Depot filesystem is part of a whole ecosystem in support of data for open science, which include:

  • Depot object storage: Depot object storage is a high-capacity, fast, reliable, and secure object storage service. Funded by an NSF CC* award, Depot object is a high-performance Ceph storage solution with an initial total capacity of over 4 PB. This storage is redundant and reliable, with APIs accessible from any Purdue network.
  • Fortress Tape archive: The Fortress system is a large, long-term, multi-tiered file caching and storage system utilizing both online disk and robotic tape library, ideal for near-line caching of raw data, or archiving completed projects.

Depot and Fortress are part of an integrated high-performance computing and storage environment, with Globus-based links to the Purdue University Research Repository (PURR), and easy connectivity to experimental instrumentation facilities. RCAC also provides equivalent offerings for export-controlled data (via the Weber cluster), and sensitive and restricted data, including life sciences (via the Rossmann cluster, which is supported by the Protected Data Filesystem (PDFS) and the Protected Data Archive (PDA)).

“RCAC has always had the mandate to provide Purdue with world-class cyberinfrastructure at the highest proven value, which means powerful solutions at attractive prices,” says Preston Smith, executive director of RCAC. “With this newest generation of the Data Depot filesystem entering production, we look forward to it continuing to provide a flexible and accessible solution to ensure that Purdue researchers’ scientific data are in a resource right where it can be processed or simulated. Depot V3’s speed and accelerated connectivity will further augment the capabilities around artificial intelligence, and the Pixstor software by DataCore Nexus will allow for more effective management of our researchers’ data lifecycle.”

Researchers who are unsure which data storage solutions are right for them should utilize RCAC’s Storage Solutions Finder, an interactive tool to help Purdue researchers compare and select a variety of available storage options based on their usage needs and data security constraints.

RCAC operates all centrally-maintained research computing resources at Purdue University, providing access to leading-edge computational and data storage systems as well as expertise and support to Purdue faculty, staff, and student researchers. To learn more about HPC and how RCAC can help you, please visit: https://www.rcac.purdue.edu/

Written by: Jonathan Poole, poole43@purdue.edu

Originally posted: