Outages and Maintenance
-
UPDATE: As of 11:45a, the Scholar cluster maintenance was completed. Cluster is back in service. The Scholar cluster will be unavailable beginning on Wednesday, March 15th, 2017 at 8:00am EDT, for scheduled maintenance. The cluster will return to f...
-
Emergency Carter Cluster Maintenance
Update: Owner queues on Carter have been restarted. While Carter is currently deemed stable, performance is still impacted. Engineers are closely monitoring the situation and will take corrective action if necessary. Update: At this time, only Carter...
-
Halstead nodes continue to come back online. While the cluster is operating normally, the total amount of available nodes is not yet at full capacity. We will update on the situation by 6:00pm. Update: Scheduling has been restarted and jobs are cur...
-
The Halstead cluster will be unavailable beginning at Thursday, April 6th, 2017 at 8:00am EDT, for scheduled maintenance. The cluster will return to full production by Thursday, April 6th, 2017 at 11:59pm EDT. During this time, Halstead will have the...
-
Rice, Snyder, Hammer, Scholar Maintenance
The Hammer, Rice, Scholar, and Snyder clusters have been returned to service. Please note that Thinlinc clients and web browser access can be found at: Rice: desktop.rice.rcac.purdue.edu Hammer: desktop.hammer.rcac.purdue.edu Snyder: desktop.snyder.r...
-
Update: April 13, 2017 5:02pm The EXRC cluster has been returned to service. Original Message: The EXRC cluster will be unavailable beginning at Thursday, April 13th, 2017 at 8:00am EDT, for scheduled maintenance. The cluster will return to full prod...
-
The Data Depot file system was sporadically available for 2 hours today. Some jobs running on the Community Clusters paused during the instability but have resumed. We expect no job loss to have occurred. This issue is now resolved.
-
Emergency Maintenance on Rice, Snyder, Hammer
As of 7:15pm, all queues on these clusters have resumed scheduling. Nodes will continue to be upgraded as they finish current jobs and become available. In the interim, the clusters will run in a degraded state, but will continue to start new jobs...
-
As of 2:35 pm, Conte cluster is returned to service. Scheduling is resumed in all queues. Update The source of the problem has been identified and the fix is underway. We anticipate returning Conte to service by 3pm today. Original message The Conte...
-
Scratch system failure on Rice, Snyder, Hammer
*** Update *** As of 7:00 pm, the problem on the scratch system has been corrected, and scheduling has resumed on all three affected clusters - Rice, Snyder, and Hammer. Update Storage engineers are working with the system vendor to evaluate a propos...
-
As of 8:48pm the issue has been resolved. Original message The Research Data Depot is experiencing a system-wide slow down. Engineers have isolated the systems which are at the core of this phenomenon and are taking steps to restore normal service....
-
Engineers have restored failed core servers back to a functional state. Data Depot is up and running as normal and job scheduling resumed. Should you encounter any lingering issues please let us know at rcac-help@purdue.edu Original Message Some core...
-
Nodes have continued to gradually reboot into the new image as jobs complete. At this point, more than 80% of Halstead has completed this process, and we have not seen any issues in them doing so. This outage is closed. Update: May 25, 2017 5:00pm...
-
The Fortress Archive will be unavailable beginning at Wednesday, May 31st, 2017 at 8:00am EDT, for scheduled maintenance. Fortress will return to full production by Wednesday, May 31st, 2017 at 5:00pm EDT. During this time, Fortress will have the Hig...
-
Extension of monthly GitHub maintenance
Engineers and GitHub support have resolved the issues encountered earlier this afternoon and GitHub is back online and running normally again. We apologize for the disruption this may have caused. If you encounter any issues please let us know at rca...
-
Hathi, Radon, and Specialized Cluster Maintenance
Update message: After performing necessary repairs, Radon has been returned to service. -- Previous message: After consulting with vendor support, we have determined that Radon has experienced a failure in its network hardware. Parts and and a vendo...
-
Email notifications from Research Computing website broken
Email notifications are up and running again as usual. Original Message As of 5pm Thursday evening, email notifications from the Research Computing website are not working. Some people are receiving no email and others are receiving damaged emails. T...
-
Email to "rcac-help@purdue.edu" not Working
As of 3:45pm Friday, the rcac-help@purdue.edu address is working normally again. Original Message Beginning 5:00pm Thursday, the rcac-help@purdue.edu email address stopped accepting email. Anything sent since then has not been received. We are workin...
-
The Hammer cluster has been successfully returned to full production. This concludes this maintenance. Update: July 18, 2017 5:01pm: The Hammer cluster has most of the reconfiguration complete, but work continues on a good portion of the nodes whic...
-
The issues with Globus have been resolved, and the Fortress archive is fully restored to normal operations. This concludes this maintenance. Update: July 19, 2017 9:36pm: The work on Fortress has been completed and it is in normal production for al...