Web interface actually still down
Incident Report for Riju
Postmortem
Posted Jul 27, 2021 - 03:24 UTC

Resolved
Situation appears to be resolved and somebody is running C# code :)
Posted Jul 27, 2021 - 03:17 UTC
Update
We are back up. Monitoring the situation to make sure it doesn't go down again.
Posted Jul 27, 2021 - 03:05 UTC
Update
It occurs to me that a much faster remediation would have been to simply scp a new supervisor binary directly to the existing instance, rather than bringing up a new one. However, I don't think there's a way to abort an instance refresh, so we'll just have to live with the downtime.
Posted Jul 27, 2021 - 02:43 UTC
Monitoring
Using an instance refresh on the ASG since things can't really get much more broken than they are now. New node will probably take around 20 minutes to be ready.
Posted Jul 27, 2021 - 02:41 UTC
Identified
Going to roll the AMI back as previously discussed.
Posted Jul 27, 2021 - 02:38 UTC
Investigating
Looks like that was a false alarm. We were up for a few minutes and then back down again.
Posted Jul 27, 2021 - 02:33 UTC
This incident affected: Web interface.