All systems operational

Resolved
Host down in Canada

Started
November 30, 2023 at 11:36 PM
Status
Resolved after about 10 hours

Impact

Partial outage
Affected
Canada: General Infrastructure
  • Resolved
    Resolved

    All customers have been migrated and are operational. We are now doing a full barrage of tests on the affected system and will likely decommision it. We apologize for the repeated disruption these past 48 hours.

  • Identified
    Identified

    We have decided to go ahead and move all customers away from this failing machine, as we are clearly unable to stop the crashes from happening nor determine what is causing the issue exactly. It seems to be load dependent and the kernel does not give us any information when the halts happen. The system simply freezes up and our hardware monitoring is not showing any apparent CPU or RAM failures.

    In any case, we will move customers as quickly as we can today on a best effort basis. As we are already in a bit of a capacity crunch in Canada, some users may see that they have been upgraded to an equivalent Ryzen profile. But we will of course notify those users if that is the case.

    You are probably in no doubt whether you are affected or not, but if you are for some reason, then you can determine if you are by logging in to the Webdock dashboard where a big red alert will tell you which of your KVM machines are affected.

  • Resolved
    Resolved

    This incident has been resolved, for now. The team will once again take a look at this system tomorrow during office hours.

  • Investigating
    Investigating

    The same host which has given us problems recently went down again. The system is currently booting and all VPS servers should be up shortly.