Outages and Maintenance
-
Due to power work in the MSEE building, most ECN services will be unavailable between 5:30 pm Thursday, 11 June, 2015 and 8:00 am Friday 12 June 2015. In particular, for Research Computing users this means that software packages licensed through ECN...
-
Software upgrades on Rice Cluster have been completed by 7:30pm. It is now open for access by early adopters. Please let us know if you see any issue with the cluster. Maintenance on Snyder, Rossmann, Hansen, Hammer, and Conte has been completed and...
-
Rice job submission failing for some users
Update: The scheduling server has been rebooted and job submissions appear to be working normally again. Please let us know at rcac-help@purdue.edu if you see any further issues. Thanks again for your patience! Job submissions for at least some users...
-
Fortress Service Unavailable June 23
The Fortress data archiving services will be unavailable starting 8:00AM on 23 June, 2015 due to a scheduled maintenance. During this outage, our storage engineers will: Upgrade hardware, and Configure RAID on the internal servers. Users are reques...
-
Data Depot connectivity issues
ITaP engineers have identified issues causing intermittent failures on Carter. Engineers are currently tuning parameters on Depot system that have been identified as potential fixes to the issues. Access to Depot on Carter has been stable since tunin...
-
The Hammer, Hathi, Radon, and Snyder cluster will be unavailable beginning at Wednesday, July 1, 2015 from 8:00am - 12:00pm EDT, for scheduled maintenance. The cluster will return to full production by Wednesday, July 1st, 2015 at 12:00pm EDT. The do...
-
Due to power work in the MSEE building, most ECN services will be unavailable between 6:30am – 9:00pm EDT on Saturday, August 15, 2015. For Research Computing users this means that software packages licensed through ECN servers will not be able to ch...
-
Unscheduled scratch outage on Rossmann
UPDATE As of 8pm on August 15, 2015 the scratch filesystem serving Rossmann is back in full production. Original message: The scratch filesystem serving Rossmann is currently unavailable. Both currently running jobs and attempts to access files in sc...
-
Cluster Maintenance - Peregrine1
The Peregrine1 cluster will be unavailable beginning at August 17, 2015 8:00am - August 19, 2015 6:00pm EDT, for scheduled maintenance. The cluster will return to full production by Wednesday, August 19th, 2015 at 6:00pm EDT. During this time, Pere...
-
As of 11:55 pm August 18, 2015, Fortress/HPSS has been brought back online. Storage engineers continue working on bringing upgraded Fortress up and deploying new software to all RCAC systems. Current estimate for return to service: 12:00 am August 1...
-
Unscheduled scratch outage on Rossmann
**Update: August 25, 2015 9:00 pm ** On Monday, August 24, a disk tray in the Rossmann scratch storage system suffered multiple failures and despite great effort by both ITaP storage engineers and the system vendor, this portion of the scratch system...
-
Update: September 23, 2015 8am Shortly after 2am, Engineers were able to complete the file transfer and return Carter back to production. Update: September 22, 2015 11pm The file transfer continues and will last well into the night. The next update...
-
Cluster Maintenance - Hansen/Peregrine1
Update: September 22, 2015 1pm The work affecting Hansen and Peregrine1 scratch filesystems has been completed and the clusters are back in full production. Original The Hansen and Peregrine1 cluster will be unavailable beginning at Tuesday, Septembe...
-
Emergency scratch maintenance on Carter and Scholar
The scratch filesystem serving Carter/Scholar underwent emergency maintenance through Friday night and well into Saturday. We expect this work to resolve the periodic hangs this filesystem has been experiencing for the last two days. Job scheduling...
-
October 30, 2015 11:00am ITaP Engineers have made additional timeout changes to the scratch filesystem which has increased stability. Additional work is being scheduled for Tuesday, December 1, 2015 from 7:00am to 7:00pm. October 8, 2015 5:00pm An e...
-
October 22, 2015 9:15pm All services have been restored and Hammer is now in production. October 22, 2015 7:00pm Engineers continue to work through issues relating to the move. Another update will be sent at 9pm. Original The Hammer cluster will be...
-
Unscheduled outage for Samba/Windows
Service was restored around 7:30pm today. Engineers changed the way Samba authenticates users to avoid this problem going forward. -- Service was restored around 10:30am today, but has since failed again. Engineers are working on the problem, and we...
-
November 3, 2015 6:15pm The maintenance for Radon is completed and the cluster has been returned to production. Original The Radon cluster will be unavailable beginning at Tuesday, November 3, 2015 from 7:00am - 7:00pm EST, for scheduled maintenance....
-
The Fortress Archive service, Fortress, will be unavailable starting Wednesday, November 4th, 2015 at 6:00am EST for regular maintenance and will return at Wednesday, November 4th, 2015 at 8:00am EST. During this time, access via HSI, HTAR, Globus of...
-
Update - 9:20pm Conte has been returned to full production as of 9:15pm. During the failure earlier today, the internal tracking of jobs within the scheduler on Conte was corrupted. Unfortunately, this resulted in all running and pending jobs being...