Outages and Maintenance
-
The Bell, Brown, Gilbreth, Halstead, Rice, Scholar, and Snyder clusters began experiencing issues with their Data Depot mounts around 10:00pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. To avoid job losses for...
-
A large number of Scholar accounts have been accidentally removed during overnight processing. This manifests as "LDAP authorization check failed", or "Incorrect or Invalid username/password" and similar errors when trying to logi...
-
The Rice cluster has reached the end of its life cycle and is being retired on Friday, January 15th, 2021. Researchers owning nodes on Rice should start archiving any data they may have there to the Fortress Archive now, or move it to other clusters...
-
The bulk of the Snyder cluster (A and B nodes of 2015 vintage) has reached the end of its life cycle and is being retired on Friday, January 15th, 2021. Researchers owning nodes in A or B sub-clusters should start archiving any data they may have th...
-
Fortress Archive Monthly Maintenance
The Fortress Archive will be unavailable Wednesday, February 3, 2021 from 8:30am - 12:00pm EST for scheduled monthly maintenance (first Wednesday of every month). During this time, Fortress will receive normal software and hardware updates. Any trans...
-
The Data Depot storage server began experiencing issues around 3:00pm EST on Thursday, February 4th, 2021. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused on all clusters while this issue...
-
The Halstead cluster began experiencing issues with its scratch filesystem mount around 4:30pm EST. Users may see "Stale file handle" messages or be unable to navigate to their scratch directories. Engineers are currently diagnosing the iss...
-
Gilbreth queue submission problems
We have received multiple user reports that Gilbreth cluster began experiencing issues with job submissions over the weekend. The problem manifests as an "Invalid account or account/partition combination specified" error message from sbatch...
-
The Bell cluster will be unavailable Tuesday, February 23, 2021 at 8:00am EST for scheduled maintenance. The cluster will return to full production by %enddatetime%. During this time, Bell will have a maintenance upgrade performed on the software com...
-
The Gilbreth cluster will be unavailable Thursday, February 25, 2021 at 8:00am EST for scheduled maintenance. The cluster will return to full production by %enddatetime%. During this time, Gilbreth will have the latest CUDA driver and toolkit softwar...
-
Fortress Archive Monthly Maintenance
The Fortress Archive will be unavailable Wednesday, March 3, 2021 from 8:30am - 12:00pm EST for scheduled monthly maintenance (first Wednesday of every month). During this time, Fortress will receive normal software and hardware updates. Any transfer...
-
The Workbench cluster began experiencing issues with its network uplink around 6:30pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 10 pm.
-
The Weber cluster will be taken down for regular maintenance and upgrades beginning on Tuesday, March 16th, 2021 at 8:00am EDT. During this time, Weber will have operating system updates applied. Users will be unable to log in or use Weber for inter...
-
ANSYS Fluent software unavailable on Bell
We have received multiple reports about ANSYS Fluent software on Bell cluster being unavailable. We are currently diagnosing the issue and are working to identify a fix. We will provide an update by 6pm tonight.
-
Fortress Archive Monthly Maintenance
The Fortress Archive will be unavailable Wednesday, April 7, 2021 from 8:30am - 12:00pm EDT for scheduled monthly maintenance (first Wednesday of every month). During this time, Fortress will receive normal software and hardware updates. Any transfer...
-
Unscheduled outage on multiple clusters
Due to problems with cooling system in the MATH datacenter, the CMS, Bell, Brown, Gilbreth, Halstead, WCERES, and WSC Hadoop clusters began experiencing issues around 4:00pm EDT. Multiple front-end, compute and storage services are affected. Engineer...
-
The Fortress tape archive began experiencing load-induced issues around 1:00pm EDT. Problems manifest as various errors and timeouts while trying to access Fortress or transfer data. Engineers are currently diagnosing the issue and are working to ide...
-
Fortress Archive Monthly Maintenance
The Fortress Archive will be unavailable Wednesday, May 5, 2021 from 8:30am - 12:00pm EDT for scheduled monthly maintenance (first Wednesday of every month). During this time, Fortress will receive normal software and hardware updates. Any transfers...
-
Data Depot Hardware Replacement and Migration
On Tuesday, May 11, 2021 at 5:00pm EDT, the Data Depot storage service will be unavailable while it will be transitioned to new hardware. All Depot access methods (SCP/SFTP, Windows network drives, Globus, NFS exports, direct mounts on Research Compu...
-
Whole-Floor Cluster Maintenance
The majority of Research Computing computational resources (Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, WCERES, Workbench, and WSC Hadoop clusters) will be unavailable Tuesday, May 11, 2021 at 5:00pm EDT for Data Depot migration work. The clust...