Outages and Maintenance
-
Fortress: ADIC Scalar 10k tape robot unavailable
Update 12/2/11 (4:15pm) The tape robot has been returned to service and Fortress is back in production. Please contact us at rcac-help@purdue.edu if you encounter further issues. Update 12/2/11: The ADIC Scalar 10K robot is temporarily down again wit...
-
Hansen: unscheduled outage to Lustre scratch
Update The error condition on the Lustre filesystem has been cleared, and Hansen is back in production and accepting new jobs. Jobs already running should have resumed at the point where they were blocked waiting when the Lustre error occurred. This...
-
Fortress: ADIC Scalar 10k tape robot unavailable (1/4/2012)
Update - 1/9/2012 The repairs to the ADIC tape library have been completed and Fortress' tape functionality is back in operation. Update - 1/6/2012 Following work today by vendor engineers, the latest estimate for the ADIC tape robot's return to serv...
-
This morning, the PBS system on Coates developed an issue with the storage holding its internal state.While systems engineers are working on recovering it from backup, any new job submissions will not be possible, nor will you be able to query job st...
-
Lustre unavailable on Hansen cluster
Update: As of 9:45pm, Lustre is back in production and scheduling has resumed on Hansen. Original Notice: As of approximately 8:00pm February 7, an issue was found the Lustre filesystem on Hansen making the filesystem unavailable for use. ITaP engine...
-
System Maintenance - Spring Break 2012
During the week of spring break, 2012, the Steele, Coates, and Rossmann clusters will each be down for maintenance for one day to install OS patches and update the PBS batch software to version 11.1. Additionally, the Radon cluster will be unavailabl...
-
Unscheduled outage to Rossmann cluster
At approximately 10:50pm, Thursday, March 15, the power distribution to large portions of the Rossmann cluster failed. These feeds also power the login nodes for the cluster, which, while unavailable, renders Rossmann unavailable for use. Power was r...
-
PBS unavailable on Rossmann cluster
Due to a network issue, the server running the PBS software for Rossmann is unavailable. While the server is unavailable, attempts to use PBS commands ("qsub", "qstat", "pbsnodes") will fail with error messages like: qst...
-
Unscheduled outage to MATH datacenter
Update - 9:30pm, 4/1/2012: As of about 9:30pm, Sunday, 1 April, ITaP systems staff have returned Hansen to production status, and job scheduling is re-enabled. The scratch filesystem on Hansen has been restored with no apparent loss of files; if you...
-
Partial outage affecting some Coates queues
Update - 6:45 pm Tuesday, 10 April 2012 ITaP engineers have found and repaired the network issue that was affecting Coates nodes type B, C and E. Job scheduling has been resumed for all queues. If you encounter any problems, please report them to rc...
-
Update : 1:45pm As of As of 1:45pm this afternoon, systems staff have completed patching the samba servers used to access storage systems. You should now be able to connect to samba.rcac.purdue.edu for samba access to home and scratch directories and...
-
Update - April 11, 2012 240pm At around 240pm, ITaP engineers have restored communications between the HPSS system and the tape library. Access to Fortress from Samba, HSI/HTAR and other methods has been restored. I apologize for the inconvenience th...
-
Scheduled Maintenance - May 2012
In May 2012, all RCAC systems will each be down for maintenance for up to three days in order to accommodate electrical service work in the Math building and storage systems maintenance. Some systems will also be receiving OS and scheduler upgrades....
-
Community clusters, storage to be off line for upgrades and maintenance
Purdue’s Community Cluster Program supercomputers, related high-performance data storage and the Fortress archival data storage system will be down for scheduled maintenance for up to three days from May 15-17. For details, see the rcac-help@purdue.e...
-
Scheduled Maintenance - August 2012
In August 2012, some RCAC systems will be down for maintenance for up to three days in order to accommodate electrical service and chilled water upgrades in the Math building and OS and scheduler upgrades on the systems. Planned Maintenance Timelin...
-
Unscheduled Power outage in Math Datacenter
Update: 10:00pm Tuesday As of 8:30pm Tuesday 21 August 2012, the LustreB filesystem has been returned to full service. Our storage engineers with assistance of the vendor have verified that the system is stable. If you encounter any issues, please co...
-
ADIC Scalar 10k tape library maintenance
From 8:00am-12pm on Friday, September 7, 2012, the ADIC Scalar 10K library serving the Fortress archive will be unavailable while emergency preventative maintenance is performed. Fortress will still be able to write files into HPSS, and files already...
-
Scheduled Maintenance - October 2012
UPDATE: 9 October, 2012 The Coates and Rossmann Clusters have both returned to production, and their maintenance is completed, as of 11:30 am, Tuesday 9 October, 2012 The Coates and Rossmann clusters will go down for scheduled maintenance at 8:00 am...
-
Scheduled Maintenance for Radon Cluster
UPDATE - 12:40 pm 27 Nov 2012: The update went smoothly, with no delays or problems, and Radon has returned to service as of 12:30 pm, 5 and a half hours earlier than expected. Please let us know at rcac-help@purdue.edu if you see any problems with...
-
Scheduling paused on ITaP research clusters
During scheduled network maintenance on network equipment connecting storage to ITaP clusters, all scheduling will be paused from 4-6pm. Running jobs will continue to execute, and new jobs may be submitted to PBS queues, but no new jobs will start u...