News: Outages and Maintenance (All): Pag...

Outages and Maintenance

Unscheduled scratch outage on Conte
- November 28, 2015 12:00pm - 5:30pm EST widget.news::news.updated: November 28, 2015 6:06pm EST
The scratch filesystem has been restored to full service and all queues have been restarted. Original Message: The scratch filesystem serving Conte is currently unavailable. Both currently running jobs and attempts to access files in scratch will bl...
Cluster Maintenance - Carter
- December 1, 2015 7:00am - December 2, 2015 2:20pm EST widget.news::news.updated: December 2, 2015 2:21pm EST
Carter has been return to normal operations. All queues have been enabled. Update: December 2, 2015 12:15pm Carter is mostly ready to return to service, but the site-wide home filesystem has suffered a failure which is preventing this from being co...
Unscheduled scratch outage on Rice, Hammer, and Snyder
- December 1, 2015 11:15am - 4:15pm EST widget.news::news.updated: December 1, 2015 4:50pm EST
The scratch filesystem serving Hammer, Rice, and Snyder has been restored to normal operations, and all queues have been re-enabled. Original Message: The scratch filesystem serving Hammer, Rice, and Snyder is partially unavailable. Both currently ru...
Unscheduled Home Filesystem Outage
- December 2, 2015 12:00pm - 12:45pm EST widget.news::news.updated: December 2, 2015 1:52pm EST
As of 12:46, December 2, the home filesystem serving Conte, Hammer, Hansen, Hathi, Peregrine1, Radon, Rice, and Snyder was restored to normal operation. All queues have been re-enabled. As of Wednesday, December 2nd, 2015 at 12:00pm EST, Conte, Hamm...
Fortress Maintenance
- January 5, 2016 7:00am - January 7, 2016 7:00pm EST widget.news::news.updated: January 7, 2016 6:02pm EST
January 7, 2016, 6pm The Fortress move has completed and has been returned to production. Original Due to a failure in the notice system, the earlier attempts to notify of this work which were sent on Dec 7th and Jan 3rd were not delivered. The Fortr...
Cluster Maintenance - Carter
- January 19, 2016 7:00am - January 20, 2016 7:00pm EST widget.news::news.updated: January 20, 2016 7:02pm EST
Carter has been returned to normal operation. Update: January 20, 2016 3:26pm: We are doing return to service testing now and expect Carter to return to production by 7:00pm. Update: January 20, 2016 12:00pm: Work is being wrapped up on Carter and...
Rice and Snyder Cluster Maintenance
- February 1, 2016 8:00am - February 5, 2016 10:40pm EST widget.news::news.updated: February 5, 2016 10:43pm EST
As of 10:40pm, the Snyder cluster was returned to normal service in the POD. This concludes this maintenance. Update: February 5, 2016 8:54pm As of 8:25 pm, Friday, 5 Feb 2016, the Rice cluster maintenance has completed and the system is returning...
Unscheduled scratch outage on Hammer
- February 1, 2016 8:00am - February 5, 2016 7:00pm EST widget.news::news.updated: February 5, 2016 7:03pm EST
The Hammer scratch filesystem has now returned to normal operations. Original Message: During the maintenance of the Rice and Snyder clusters this week, it became necessary to shut down the scratch filesystem which these clusters currently share with...
Unscheduled outage on Carter
- February 2, 2016 6:00pm - February 3, 2016 10:50pm EST widget.news::news.updated: February 3, 2016 10:51pm EST
The underlying issues affecting Carter are resolved and job scheduling has been resumed. Many individual nodes remain offline for corrective action, and these will be returning to service gradually as engineers are able to fix them. In the interim,...
Fortress Maintenance
- February 3, 2016 8:00am - 9:00am EST
Fortress will be unavailable from 8:00am to 9:00am Wednesday, 3 February, 2016 for routine maintenance.
Hathi & WinHPC Power Maintenance
- February 4, 2016 6:00am - 5:00pm EST
The Hathi and WinHPC clusters will be unavailable beginning at Thursday, February 4th, 2016 at 6:00am EST, for scheduled maintenance to the power feed. Both clusters will return to full production by Thursday, February 4th, 2016 at 5:00pm EST. During...
Unscheduled outage on Carter
- February 4, 2016 8:00am - 10:30am EST widget.news::news.updated: February 4, 2016 10:42am EST
The cause of this turned out to be a power loss to Carter's scratch filesystem and portions of the Data Depot, which has been restored now. Carter nodes are returning to normal operations now. Original Message: As of Thursday, February 4th, 2016 at...
Unscheduled Outage in Math Data Center
- February 4, 2016 8:00am - 10:30am EST widget.news::news.updated: February 4, 2016 10:40am EST
Most of the impact of this turned out to be to the Depot storage system, which has now been restored to normal operations. All the other affected systems are showing a return to normal operations now. Original Message: As of Thursday, February 4th,...
Cluster Maintenance - Conte
- February 10, 2016 7:00am - 9:30pm EST widget.news::news.updated: February 10, 2016 9:33pm EST
The scheduler issue has been resolved, and Conte has been returned to normal operations as of Wednesday, February 10th, 2016 at 9:30pm EST. Update: February 10, 2016 7:04pm There was a minor issue discovered with the newly upgraded scheduler which i...
Unscheduled scratch outage on Carter
- February 12, 2016 10:20am - 12:00pm EST widget.news::news.updated: February 12, 2016 12:04pm EST
There was an issue with the cluster's gateway switches, causing infiniband traffic to be incapable of IP over infiniband. This also caused an instability in the lustre scratch servers, which required that they be rebooted. Jobs that were using scratc...
Unscheduled outage on Rice and Snyder
- February 17, 2016 7:30pm - 9:15pm EST widget.news::news.updated: February 17, 2016 9:30pm EST
As of 9:15 PM, the Snyder and Rice clusters have been brought back into service after cooling was brought back online. Front-ends are operational and scheduling has been resumed. Original Message: At about 7:30 pm Wednesday, 17 February, 2016, the fr...
Unscheduled Outage on Data Depot
- February 23, 2016 11:00am - February 24, 2016 6:00pm EST widget.news::news.updated: February 24, 2016 6:13pm EST
The Depot filesystem checks have all completed cleanly and the Depot has been fully returned to normal operations. All queues on all clusters are scheduling new jobs again. Any existing jobs which had been waiting for Depot access may also resume....
ECN services outage - ITaP Research Computing systems impacted
- March 1, 2016 6:30am - 9:00am EST
Engineering Computing Network (ECN) will be performing staged patching and reboots of all of ECN's RedHat Linux workstations and servers to protect against a serious vulnerability in glibc system library. A significant number of ECN services will be...
Unscheduled outage for Peregrine1
- March 7, 2016 12:30pm - 2:30pm EST widget.news::news.updated: March 7, 2016 2:24pm EST
As of Monday, March 7th, 2016 at 12:30pm EST, the Peregrine1 cluster is unavailable due to a failed network switch in its datacenter. This switch is currently in the process of being replaced. Estimated time to complete this work and bring the clu...
Unscheduled outage on Peregrine-1
- March 17, 2016 4:00pm - 6:40pm EDT widget.news::news.updated: March 17, 2016 6:41pm EDT
Outage RESOLVED A misconfiguration that caused an unneeded IB driver to be loaded was fixed. Peregrine-1 is back online. Job scheduling is on. Original Message: The Peregrine-1 cluster is currently offline due to problems with the cluster nodes' op...

Results 121-140 of 708

Outages and Maintenance

Follow Us