RCAC - Announcements, Outages and Maintenance, Outages, Maintenance, Outages, Maintenance, Science Highlights

Data Depot service disruption impacting schedulers

Mon, 13 Jul 2026 10:15:00 -0400

We are currently experiencing a service disruption affecting Depot. Engineers are actively investigating. Schedulers are paused to prevent further impact while we work to stabilize the service. We will provide another update as soon as more information is available.

Data Depot related service disruption

Tue, 23 Jun 2026 10:30:00 -0400

We’re investigating an issue affecting access to Data Depot services following a network change made at 11:00 AM ET. Users may see a brief outage or stalled connections while we work to restore normal service. Our teams are actively investigating, and we’ll post updates here as we learn more.

Depot service disruption impacting schedulers

Thu, 11 Jun 2026 14:50:00 -0400

We are currently experiencing a service disruption affecting Depot. The issue reoccurred after our earlier restoration, and engineers are actively investigating. Schedulers on Negishi, Rowdy, Gautschi, Bell, Scholar, and Gilbreth are paused to prevent further impact while we work to stabilize the service. We will provide another update as soon as more information is available.

Depot service disruption impacting schedulers

Wed, 10 Jun 2026 21:30:00 -0400

We are currently experiencing a service disruption affecting Depot. The issue reoccurred after our earlier restoration, and engineers are actively investigating. Schedulers on Negishi, Rowdy, Gautschi, Bell, Scholar, and Gilbreth remain paused to prevent further impact while we work to stabilize the service. We will provide another update as soon as more information is available.

Depot issue impacting multiple schedulers

Wed, 10 Jun 2026 17:00:00 -0400

We are currently experiencing an issue affecting Data Depot. Engineers are actively investigating, and we will provide an update as soon as more information is available. Schedulers have been paused on Negishi, Rowdy, Gautschi, Bell, Scholar, and Gilbreth to prevent further impact until service is restored.

Brief Data Depot disruption, community cluster scheduling paused and restored

Mon, 08 Jun 2026 20:30:00 -0400

RCAC experienced a disruption in service on the Data Depot storage system this evening, which required us to temporarily pause job scheduling on the community clusters. The issue has been resolved, schedulers have been resumed.

Changes to Scratch Storage Policies

Thu, 04 Jun 2026 11:30:00 -0400

We are updating how we describe and manage scratch storage to keep our systems fast, reliable, and fair for everyone.

Following good data practices makes scratch more performant and more useful for all users. By keeping master copies of important data in backed‑up storage, staging only the inputs and outputs you need for active jobs into scratch, and promptly moving valuable results back to a persistent location when your work completes, you help keep scratch fast, responsive, and available for high‑throughput workloads across the system.

To support this, we’ve updated our scratch documentation with clearer guidance and policies, including:

An emphasis on scratch as temporary, high‑performance workspace, not long‑term storage.
Clearer explanation of purge behavior and age limits (including that files may be removed at any time after they become eligible).
Removal of legacy “purge warning email” language; users should no longer expect email notifications before files are purged.
Explicit guidance that using tools like touch or other methods to defeat purge mechanisms is a violation of Acceptable Research Resource Use.
Practical tips and examples for staging data into scratch, monitoring file ages with tools such as purgelist , and moving important results back to Data Depot or Fortress.

Scratch filesystems are engineered for capacity and performance and are not protected by backup technology; some types of failures can result in permanent data loss. If losing a file in scratch would significantly impact your research, that file should have a current copy in a more durable storage location such as Data Depot (for active data) or Fortress (for long‑term archival).

You can review the updated scratch storage guidance here: https://www.rcac.purdue.edu/policies/scratchpurge

If you have questions about how to adapt your workflows or where to store particular data, please contact us at rcac-help@purdue.edu and we’ll be happy to help.

Data Depot storage issue, cluster operations currently paused

Wed, 03 Jun 2026 15:30:00 -0400

We are investigating an issue with the Data Depot storage system that is impacting dependent services.

Users may see failures connecting via Windows/Samba and jobs are affected because all clusters are currently paused while recovery work proceeds. Our engineers are working with the storage platform to restore service; we will share further updates as we learn more.

Service Disruption to Bell, Geddes, Gilbreth, Negishi, Anvil, and GenAI Studio

Wed, 13 May 2026 10:00:00 -0400

We are currently experiencing an unexpected service outage affecting the following systems: Negishi, Geddes, GenAI Studio, Bell, Anvil, and Gilbreth. Our team is actively working to restore full functionality as quickly as possible. During this time, some services may be slow or unavailable. We will provide an update as soon as more information becomes available.

Issues with Microsoft Authenticator Push Notifications

Tue, 12 May 2026 08:00:00 -0400

We’re aware that some users are having trouble seeing Microsoft Authenticator push prompts during sign-in. Authentication is still working, but the experience may be confusing or inconvenient. We wanted to share what’s going on and how you can work around the issue..

**Issue: **Some users may not see Microsoft Authenticator push prompts unless they manually open the app. This can cause sign-ins to time out.

Workaround: If you do not see a push notification, open the Microsoft Authenticator app and approve the pending request. Also, please verify that your default sign-in method is set to ‘App based Authentication – notification’ You can follow the steps in this Knowledge Base article to check and update your settings: https://service.purdue.edu/TDClient/32/Purdue/KB/ArticleDet?ID=1788

Still Need Help? - Contact us at rcac-help@purdue.edu

Important: Use Microsoft Authenticator to Access Purdue RCAC Resources Starting May 11, 2026

Fri, 01 May 2026 08:00:00 -0400

May 12 Update Some users are experiencing issues with push notifications; see current Microsoft Authenticator Trouble Announcement notice here. https://www.rcac.purdue.edu/news/7738

Beginning May 11, 2026, RCAC will move from Duo Mobile to Microsoft Authenticator for logging in to Purdue’s research computing and supercomputing systems.

To ensure uninterrupted access, please complete the steps below before May 11.

What you need to do:

Enroll in Microsoft Multi-Factor Authentication (MFA) using the Microsoft Authenticator app (if you have not already)

Set your default sign-in method in Microsoft MFA to your preferred method

After May 11, Duo Mobile will no longer work for accessing RCAC systems.

Please contact rcac-help@purdue.edu If you have any questions or need help.

Kernel Upgrade on Clusters Nodes

Thu, 30 Apr 2026 09:30:00 -0400

We are performing rolling reboots on all login and compute nodes for critical kernel upgrades. Nodes will reboot in stages, so resources remain available.

All Clusters & Services Restored

Mon, 27 Apr 2026 21:40:00 -0400

At approximately 9:40pm EDT Monday, April 27th, 2026, a campus-wide power outage due to weather impacted all research computing clusters and storage systems. Power has been restored, and engineers are currently bringing systems back online in an orderly fashion. Access to various systems has been paused as this is being addressed.

We will provide an update by noon, April 28 or sooner as work progresses.

RCAC Maintenance Complete – All Systems Available

Wed, 22 Apr 2026 06:00:00 -0400

The ongoing RCAC maintenance originally scheduled to conclude at 5:00 PM today (Thursday, April 23) has been extended. The new expected completion time is 9:00 PM tonight, Thursday, April 23.

All RCAC systems and Research Network services will remain unavailable during this extended window.

We appreciate your continued patience as we complete these critical upgrades to improve RCAC infrastructure reliability and performance.

For assistance, questions, or concerns, please contact rcac-help@purdue.edu.

Data Depot is up, but some operations may be slower than normal while recovery work continues

Tue, 14 Apr 2026 13:00:00 -0400

Data Depot is currently experiencing degraded performance. Service is available, but background recovery processes are still running and may slow access to Depot. Our team is continuing to monitor the system and work toward full restoration.

Data Depot is currently unavailable

Tue, 14 Apr 2026 09:30:00 -0400

We are currently experiencing an unplanned outage affecting Data Depot. Engineers are actively working to restore service, but there is no ETA to share yet. We will provide updates as soon as more information is available.

Negishi Cluster Filesystem Interruption (Resolved)

Mon, 30 Mar 2026 21:30:00 -0400

Between approximately 9:30 PM and 10:45 PM EDT on March 30, 2026, the Negishi home file system experienced issues that prevented users from successfully connecting to login nodes.

Service was fully restored at 10:45 PM EDT, and login functionality has returned to normal. No data loss occurred.

Power Outage Impacting Multiple Clusters — Recovery Underway

Wed, 18 Mar 2026 06:00:00 -0400

At approximately 6:00 AM EDT, a power outage impacted systems in the Math Data Center. Most services have now been restored.

Due to the outage, some jobs on Gilbreth did not requeue automatically. Users should check the status of any jobs that were running early this morning and resubmit them if needed.

Negishi experiencing 2nd service disruption

Tue, 17 Mar 2026 14:30:00 -0400

We are again investigating an issue affecting Negishi. The system is currently unresponsive or unavailable for some users, including SSH access. We will provide an update when service is restored or as soon as we have additional information.

Negishi experiencing service disruption

Tue, 17 Mar 2026 11:00:00 -0400

We are investigating an issue affecting Negishi. The system is currently unresponsive or unavailable for some users, including SSH access. We will provide an update when service is restored or as soon as we have additional information.