E-mail outage caused by failed disk

By Вen Li

About 10 per cent of campus e-mail accounts became temporarily corrupted last Thursday due to a failed computer hard disk array. One thousand users whose e-mail was stored in a filesystem on the failed array were unable to access their inbound e-mail for the weekend.


An e-mail message was sent on the afternoon of Fri., Nov. 19 to some e-mail list subscribers soon after the discovery of the fault. At that time, system administrators estimated that the system would be repaired by 4:30 p.m. on Friday, but the failure was more severe than it appeared.


“We have now discovered that the corruption was much worse than we originally thought,” wrote University of Calgary Web and E-mail Services Manager Jeremy Mortis in an e-mail on Monday. “We are therefore forced to do a file restore back to 16:00, Thursday Nov. 18.”


Restoring the files reverted the contents of users’ mailboxes to their state on Thursday afternoon.


Meanwhile, incoming messages for affected users were stored for delivery this week.


“Obviously, this is a very distressing situation for all concerned,” continued Mortis. “We are doing everything we can to minimize the pain for end users.”


By Tuesday afternoon, most affected e-mail messages had been re-sent to affected users whose e-mail was stored on the failed file system. Users may receive duplicate copies of some e-mails sent to them since last Thursday, and may have to re-delete and re-sort messages into e-mail folders.


Re-delivered messages are marked “{Redelivered due to a system failure}” in the subject line.


“Because of the way we archive incoming e-mail, we had to redeliver each note to all recipients if any one of the recipients were on the failed filesystem,” explained Mortis on Tue., Nov. 23. “This means that we ended up resending notes to quite a few people not on the affected file system.”


You can contact the IT Support Centre at itsupport@ucalgary.ca.