Skip to main content

more options.


June E-mail Outage: Statement from the Vice President for Information Technologies

June 18, 2008 (updated June 20, 2008)

Dear members of the Cornell community:

We are emerging from the storage system failure that took out our Cornell email services. We are still resolving problems with specific email accounts. Please report any problems with email to the CIT Contact Center (255-8990, helpdesk@cornell.edu) so that we can fix them.

A brief summary of the outage: It began on Sunday, June 15, between 11:30 a.m. and 12:30 p.m., when the Sun storage systems that support the Cornell email service spontaneously rebooted, bringing down the email system and severely damaging the file systems. From Sunday through Wednesday, June 18, CIT technical staff and engineers from Sun worked around the clock to restore the email service. Technical details about the outage are posted online.

I want to thank everyone for your constructive patience as we have worked to resolve this outage.

I’m not sure any of us fully anticipated how debilitating this outage would be, but it is clear that we cannot tolerate the loss of what has become our main communication channel. In some ways, the fact that Cornell has not experienced an outage like this at least in the last decade has led all of us to have very high expectations for availability of our university communications services.

Cornell Information Technologies and Sun are already in the process of an after-action review to determine the root causes of this failure, how we can minimize the probability of a recurrence, and how we can prepare for more effective recovery. The results of this review will be made available to members of the community, and we invite your contributions as well.

It is important for everyone at Cornell to take stock of the impact of this situation on university business and to think creatively about ways we can prepare in advance to be better able to cope with such an emergency.

I fervently hope that the next one does not involve university-wide email, but much of what we learn from this thought process will apply in many other types of emergencies, and certainly to other communications failures. If we can do even small things within our individual departments to prepare, we will be ahead of the game.

I thank everyone who has helped the university cope with this outage. In particular:

  • The front-line IT staff in each unit. They are heroes every day, and they have been a key support point as they have helped us understand what was going on, communicate about the outage within their units, and in many cases, put in place alternate email services so that university work could continue.
  • My colleagues in Day Hall. They have offered support, understanding, and many excellent suggestions that have helped guide the priorities of the CIT team doing the systems recovery.
  • The CIT staff in all parts of the organization who have worked tirelessly through the entire period (and still are) to minimize the impact. To mention just a couple of examples, there are maybe half a dozen folks in Systems and Operations who have worked around the clock for nearly 4 days (literally…they slept for a few hours on campus and went back to work). Other staff made personal phone calls in the evening to help desperate faculty members who needed to get information to their students. Still others have rearranged their priorities, including some very early and late hours, to stay on top of the communication process to get information to the community.

You cannot pay people enough to have this kind of dedication to their work and Cornell.

I would appreciate hearing your observations and suggestions about things that went well and ways we could have done better. Please write to me at email-feedback@cornell.edu.

Thank you.

Polley Ann McClure, Vice President for Information Technologies


CIT-Alert-L

Don't be the last one to know about viruses and outages of CIT services... join CIT-Alert-L.