Depot Object User Guide

Depot Object storage is a high-capacity, fast, reliable and secure object storage service designed, configured and operated for the needs of Purdue researchers in any field and shareable with both on-campus and off-campus collaborators.

Depot Object Overview

As with the community clusters, research labs will be able to easily purchase capacity in the Depot Object Store through the Data Depot Purchase page on this site. For more information, please contact us.

Link to section 'Depot Object Features' of 'Depot Object Overview' Depot Object Features

The Depot Object Store offers research groups in need of centralized object storage unique features and benefits:

Available
To any Purdue research group as a purchase in increments of 1 TB at a competitive annual price or you may request a 100 GB trial space free of charge. Participation in the Community Cluster program is not required.
Accessible
- Access to the Depot Object Store is via S3-compatible APIs. Applications requiring POSIX filesystem access should continue to use the Data Depot Filesystem.
Capable
The Depot Object Store facilitates joint work on shared files across your research group, avoiding the need for numerous copies of datasets across individuals' home or scratch directories. It is an ideal place to store group datasets, models, or raw data.
Controllable Access
Access management is under your direct control. Unix groups can be created for your group and staff can assist you in setting appropriate permissions to allow exactly the access you want and prevent any you do not. Easily manage who has access through a simple web application — the same application used to manage access to Community Cluster queues.
Data Retention
All data kept in the Depot Object Store remains owned by the research group's lead faculty. When researchers or students leave your group, any files left in their home directories may become difficult to recover. Files kept in the Depot Object Store remain with the research group, unaffected by turnover, and could head off potentially difficult disputes.
Never Purged
Depot Object is never subject to purging.
Reliable
Depot Object is redundant and protected against hardware failures by Ceph replication and erasure coding.
Restricted Data
Depot object is suitable for non-HIPAA human subjects data. See the Data Depot FAQ for a data security statement for your IRB documentation. The Data Depot is not approved for regulated data, including HIPAA, ePHI, FISMA, or ITAR data.

Link to section 'Depot Object Hardware Details' of 'Depot Object Overview' Depot Object Hardware Details

The Depot Object Store is built from a high-performance Ceph storage solution with an initial total capacity of over 4 PB. This storage is redundant and reliable, with APIs axcessible from any Purdue network .

Depot Object Concepts

Depot Object Storage is a highly scalable, durable, and secure S3 object store that allows users to store and serve large amounts of data. Depot Object is an S3 compatible object storage service base on Ceph. S3 storage usage is ubiquitous in enterprise and cloud computing environments and many of these same use cases apply to scientific data storage.

Depot Object Storage is fully integrated into Purdue's Globus insfrastructure for data movement and sharing.

Link to section 'Key Concepts' of 'Depot Object Concepts' Key Concepts

S3 stores data as objects, which can be up to 5 TB in size. Objects are stored in buckets, which are similar to folders.

Buckets: A bucket is the top-level container for storing objects in S3. Each bucket has a unique name and can be used to store an unlimited number of objects.
Objects: An object is a file or a collection of files stored in a bucket. Objects can be stored in a variety of formats, including text, images, videos, and more.
Keys: A key is the unique identifier for an object within a bucket. Keys are used to retrieve objects from S3 and can be thought of as the "filename" for an object.
Metadata: Metadata is additional information about an object that is stored along with the object itself. This can include things like the object's content type and last modified date.

Link to section 'Authentication and Authorization' of 'Depot Object Concepts' Authentication and Authorization

Access Keys are used to authenticate and authorize requests to access your Depot Object resources. There are two types of access keys:

Access Key ID: A unique identifier for your Depot Object account, which is used in conjunction with the Secret Access Key.
Secret Access Key: A secret key that is used to sign requests to Depot Object, ensuring that only authorized users can access your resources.

Access keys are used for:

Programmatic access: Scripting or coding interactions with Depot Object using SDKs or APIs.
CLI access: Using the command line tools to manage your Depot Object resources.
Third-party integrations: Integrating Depot Object with third-party applications or services that require authentication.

Link to section 'Use Cases for S3 Storage' of 'Depot Object Concepts' Use Cases for S3 Storage

Public dataset hosting: S3 is a popular choice for hosting static websites that can be used to share datasets publicly.
Cross resource workflows: S3 can be used to easily access and process data across multiple RCAC resources, including cloud based platforms like Geddes.
Cold storage tier: S3 can be used as a cold storage tier for datasets that are not ready to be stored on Fortress.
AI/ML: Many machine learning libraries natively support S3 access for data input for training. Inference engines support accessing trained models directly from S3.
Data lakes: S3 is a popular choice for building data lakes, which are centralized repositories that store raw, unprocessed data in its native format.

Managing Buckets and Objects

Link to section 'Accessing Depot Object Storage' of 'Managing Buckets and Objects' Accessing Depot Object Storage

The S3 endpoint provided by Depot Object can be accessed in multiple ways. Two popular options for interacting with S3 storage via the command line and GUI are listed below.

Endpoint: s3.rcac.purdue.edu

Link to section 's3cmd User Guide' of 'Managing Buckets and Objects' s3cmd User Guide

s3cmd is a free command line tool for managing data in S3 compatible storage resources that works on Linux, Mac and Windows. This section provides a basic overview of using s3cmd to manage Depot Object storage.

Link to section 'Table of Contents' of 'Managing Buckets and Objects' Table of Contents

Installation
Authentication
Basic Commands
Object Management
Bucket Management
More Information

Link to section 'Installation' of 'Managing Buckets and Objects' Installation

To use s3cmd, first ensure you have it installed on your system. You can install it via pip:

pip install s3cmd

Link to section 'Authentication' of 'Managing Buckets and Objects' Authentication

Before using s3cmd to interact with your S3 storage, you need to configure your .s3cfg file in your home directory.

The s3cmd configuration file should have the following format. Access keys and secret keys can be obtained via rcac-help@purdue.edu.

[default]

host_base = s3.rcac.purdue.edu
host_bucket = s3.rcac.purdue.edu
access_key = <your access key>
secret_key = <your secret key>

Link to section 'Basic Commands' of 'Managing Buckets and Objects' Basic Commands

s3cmd ls: This lists all the buckets associated with your account.
s3cmd sync: Syncs directories on your machine to or from S3.
s3cmd put: Uploads an object to S3.
s3cmd get: Downloads an object from S3.

Link to section 'Bucket Management' of 'Managing Buckets and Objects' Bucket Management

s3cmd mb s3://<bucket>: Creates a new bucket.
s3cmd rb s3://<bucket>: Deletes an entire bucket, including all objects within it. This is irreversible, so use with caution.

Link to section 'Object Management' of 'Managing Buckets and Objects' Object Management

s3cmd put <file> s3://<bucket>: This is used for uploading files to your bucket.
s3cmd get s3://<bucket>/path/to/object: Use this command to download an object from S3, specifying both the bucket and object name where necessary.

Link to section 'More Information' of 'Managing Buckets and Objects' More Information

For detailed usage and options, run `s3cmd --help` from your terminal/command prompt.

Download: https://s3tools.org/download
How-To Documentation: https://s3tools.org/s3cmd-howto

Link to section 'Cyberduck' of 'Managing Buckets and Objects' Cyberduck

Cyberduck is a free server and cloud storage browser that can be used on Windows and Mac.

Download and install Cyberduck
Launch Cyberduck
Click + Open Connection at the top of the UI.
Select S3 from the dropdown menu
Fill in Server, Access Key ID and Secret Access Key fields
Click Connect
You can now right click to bring up a menu of actions that can be performed against the storage endpoint

Further information about using Cyberduck can be found on the Cyberduck documentation site.