Web interface actually still down

Incident Report for Riju

Postmortem

Posted Jul 27, 2021 - 03:24 UTC

Resolved

Situation appears to be resolved and somebody is running C# code :)
Posted Jul 27, 2021 - 03:17 UTC

Update

We are back up. Monitoring the situation to make sure it doesn't go down again.
Posted Jul 27, 2021 - 03:05 UTC

Update

It occurs to me that a much faster remediation would have been to simply scp a new supervisor binary directly to the existing instance, rather than bringing up a new one. However, I don't think there's a way to abort an instance refresh, so we'll just have to live with the downtime.
Posted Jul 27, 2021 - 02:43 UTC

Monitoring

Using an instance refresh on the ASG since things can't really get much more broken than they are now. New node will probably take around 20 minutes to be ready.
Posted Jul 27, 2021 - 02:41 UTC

Identified

Going to roll the AMI back as previously discussed.
Posted Jul 27, 2021 - 02:38 UTC

Investigating

Looks like that was a false alarm. We were up for a few minutes and then back down again.
Posted Jul 27, 2021 - 02:33 UTC
This incident affected: Web interface.