Outages and Maintenance
-
Software Stack Changes during Scheduled Maintenance
During the New Years' weekend holiday, all ITaP HPC resources will be unavailable due to a scheduled upgrade of research home directories. While the systems are down they will also receive several updates to the software stack and modules. These upda...
-
Scheduled Maintenance - RCAC home directory upgrades
Update - 7:00pm, 1/4/2013: - All community clusters (Steele, Coates, Rossmann, Hansen, Carter, and Peregrine1) are back in production. Radon is currently not in production, as ITaP engineers are addressing issues encountered during the upgrade. T...
-
Unexpected Power Outage in MATH
Update: Noon, 1/8/13 The power issue in MATH has been resolved. Power has been restored to the nodes in the Coates-A subcluster affected by the outage. ITaP engineers have verified that the Coates-A subcluster is operating correctly, and have restart...
-
Campus chilled water serving the MATH data center is experiencing above-normal temperatures, and as a precaution, scheduling on the Coates, Rossmann, Hansen, Carter, and Radon clusters has been stopped. Steele is not affected. There should be no impa...
-
Update: As of about 11:00 am, the problem with the chilled water has been corrected, and scheduling has resumed on all RCAC clusters. Thank you for your patience. If you encounter any issues or have questions, please contact us at rcac-help@purdue.ed...
-
As of 9:00am, are seeing a problem with the LustreC scratch filesystem that serves Carter, Hansen, and Peregrine1. To prevent any more jobs from running into this, we have temporarily suspended scheduling of new jobs, though you may still submit to...
-
During the March 12 to March 14 maintenance window, the WinHPC cluster will be unavailable due to upgrades to the electrical service in the MATH data center. WinHPC will be shut down at 8:00 am Tuesday, 12 March, 2013, and is expected to return to se...
-
On Tuesday, 12 March 2013, the samba.rcac.purdue.edu host will be offline for about 2 hours between 8:00 and noon for maintenance. This will not affect any running jobs or new job submission, but will mean that people who use Samba to map their home...
-
Update: ITaP engineers have corrected the issue affecting the LustreC filesystem. The system is back in production. Job scheduling on Carter, Hansen and Peregrine1 has been restarted. As always, thank you for your patience. If you encounter any issue...
-
Updated, 3/29/2013 During the datacenter migration the core server for the Fortress HPSS system experienced a failure with its boot device. Vendor engineers are working on the issue, and as a result, the estimate to Fortress's return to production ha...
-
Scheduling paused on Carter cluster
Update: 8:12pm Scheduling on Carter has been resumed, and Carter is back in full production. Original Message: Beginning the morning of April 16, a number of compute nodes on the Carter cluster are experiencing a connectivity issue. While ITaP engine...
-
Network outage affecting Peregrine1 cluster
On April 24, 2013, network engineers will be relocating fiber optics that connect the Peregrine1 cluster to infrastructure in West Lafayette. This outage is scheduled for 12:00am through 5:00am. This will leave Peregrine1 unable to run jobs Any PBS j...
-
Resolved: As of about 4:45pm ET, the connectivity issue affecting the Fortress archive has been resolved. The HPSS archive is back in full production. If you encounter any issues, please contact us at rcac-help@purdue.edu Update: ITaP Storage Enginee...
-
LustreC filesystem unavailable
Update: May 13, 2013 11:00pm: LustreC has been returned to service. Carter, Hansen, and Peregrine1 are back in production with queues enabled. Update: May 13, 2013 3:00pm: storage engineers are continuing to work with vendor support to return Lustre...
-
As you may be aware, on April 5, the Board of Trustees approved the purchase of the next generation of community cluster, to be named "Conte". Since that time, ITaP staff have begun preparations for installing the new system, which will arr...
-
Software Stack Changes during Carter Maintenance
Between July 8 and July 16, Carter will be unavailable due to scheduled maintenance. On July 8, there will be changes made to the software stack on most of ITaP's community clusters. Changes will include updates to the default version of the Intel co...
-
LustreC Filesystem Maintenance
The high performance scratch file system (LustreC) supporting the Carter, Hansen, Peregrine1, and WinHPC research clusters is in need of mandatory maintenance work. The work should be performed as soon as possible in order to ensure full performance...
-
Fortress HPSS Archive Unavailable
Update - 10:15 am Fortress is back in full production. Original Message: As of 8:00am, Thursday, September 19, the Fortress HPSS is temporarily unavailable due to issues with communicating with its tape drives. Storage engineers are working to return...
-
Partial scratch96 filesystem outage
In the evening of 10/10/2013, the fileserver providing the "scratch96" filesystem serving some users of the Steele and Radon clusters suffered a permanent failure to its 2nd tier storage. This means that files on scratch96 that are older th...
-
Update: 11:00pm, Nov. 12, 2013 ITaP storage engineers have returned the offline hardware to production and LustreC is back in production. Queues on Hansen and Carter have been restarted as of 11:45pm. Update: 5:00pm Following consultation with vendor...