Situation appears to be resolved and somebody is running C# code :)
Posted Jul 27, 2021 - 03:17 UTC
We are back up. Monitoring the situation to make sure it doesn't go down again.
Posted Jul 27, 2021 - 03:05 UTC
It occurs to me that a much faster remediation would have been to simply scp a new supervisor binary directly to the existing instance, rather than bringing up a new one. However, I don't think there's a way to abort an instance refresh, so we'll just have to live with the downtime.
Posted Jul 27, 2021 - 02:43 UTC
Using an instance refresh on the ASG since things can't really get much more broken than they are now. New node will probably take around 20 minutes to be ready.
Posted Jul 27, 2021 - 02:41 UTC
Going to roll the AMI back as previously discussed.
Posted Jul 27, 2021 - 02:38 UTC
Looks like that was a false alarm. We were up for a few minutes and then back down again.