Webdock - Notice history

All systems operational

Denmark: Network Infrastructure - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Denmark: Storage Backend - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Denmark: General Infrastructure - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 99.99%
Jul 2024
Aug 2024
Sep 2024

Canada: Network Infrastructure - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Canada: Storage Backend - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Canada: General Infrastructure - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 99.83%
Jul 2024
Aug 2024
Sep 2024

Webdock Statistics Server - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Webdock Dashboard - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Webdock Website - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Webdock Image Server - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Webdock REST API - Operational

100% - uptime
Jul 2024 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2024
Aug 2024
Sep 2024

Notice history

Sep 2024

Host down in Canada, potential prolonged downtime
  • Resolved
    Resolved

    At long last, about 20 minutes ago, we got the system up as remote hands finally woke up and got to it. We will be having a serious discussion with our datacenter provider in Montreal, and if this issue with extremely slow turnaround times going into many hours of waiting cannot be resolved, we will consider moving away from this DC entirely, as this is completely unacceptable. What's worse, we've experienced this once before back in October 2022, and after that incident we were promised that this would never happen again. But it did obviously...

    We are terribly sorry for the prolonged wait. All affected customers will receive a month free hosting with us as a small token of apology from us.

  • Monitoring
    Update

    We are still waiting in remote hands to complete the restart.

  • Monitoring
    Update

    We have now been promised a "quick" turnaround after having pushed them on the issue. They have refused to define what "quick" actually means, but we are hoping this means within the coming hour as otherwise they would be stretching the definition of quick, in this context.

  • Monitoring
    Monitoring

    As we feared we are completely unable to restart this system using our remoting tools and we sent a message to remote hands in Canada about 45 minutes ago. We strongly believe all that is needed is a simple power cycle of the affected system. However, we are now worried this simple operation may have a very long turnaround time as it is currently in the middle of the night in Canada, or about 02.45 AM. We know from experience that this can mean very slow turnaround from DC staff. We will keep pushing them to get the task done as quickly as possible.

  • Identified
    Identified

    We have a host down in Canada. Unfortunately so far efforts to reboot the system with remote management tools has failed. We have reached out to remote hands in the Canada DC in order to get the system rebooted

Jul 2024

DK DC1 power outage
  • Resolved
    Resolved

    We now believe we are completely recovered from the power outage this morning. We ended up having to roll back a number of customer servers which had experienced some data corruption, in order to fully resolve all issues. We will be reviewing all procedures and operations at our DC in order to firstly prevent any such power outage incident happening again, whether we are doing maintenance or not, and secondly look at whether we can build in better protections for our data pools in order to avoid the corruption issues we saw today. There are known methods for this, but they come at a performance penalty, which we will be evaluating in the coming week or so.

    We sincerely apologize for the inconvenience caused today. This was force majeure at work and/or inadequately prepared technical staff which was working on our UPS systems today.

  • Monitoring
    Update

    We are close to having all issues fully resolved. However, the power outage seems to have affected 3 hosts in an adverse way where the storage pools on these hosts are reporting as degraded. This in turn is preventing proper restarts of vps servers on those hosts. We are looking into how to resolve this issue. The good news is that all customers have been up for a long while now and we have no other outstanding issues except this current storage pool issue on the 3 hosts in question.

    We hope to resolve these last problems within the next few hours. The resolution may involve migrating a small number of customers to other locations, in which case you will receive a migration notification by email.

  • Monitoring
    Update

    Unfortunately we have had to recover from last known backups on a single one of our hosts which for some reason had a completely corrupted storage pool after the power outage. We will look at how we can avoid such corruption in the future. In any case, all customer servers on that system are coming up one by one as they are reprovisioned from the snapshot performed this past evening or about 9 hours ago. We will update here once all servers are up and we are happy with how all systems are looking.

  • Monitoring
    Update

    We are now down to a single host having problems. It seems like we may have to recover from last known backups for this system (backups from about 8 hours ago). We will try a few more things to recover the local storage pool, which was corrupted during the power outage somehow.

    In other news, the UPS guys have completed their maintenance work and believe they have identified the issue which caused the outage this morning. When they isolated one of our UPS units to do maintenance, the remaining units were unable to communicate properly causing them to drop the load to our DC. This is not supposed to happen and points to either wrong cabling or faulty components which were not caught during initial power outage testing before we went live with the DC

    It is ironic that they exact systems designed to protect us from power outage were the ones responsible for a power outage, but it is what it is and all we can do from our side is trust that our UPS guys have now gotten us back to a redundant state.

  • Monitoring
    Update

    Most customer VPS are up now and we are demoting this to a partial outage. We have a single host system where we are seeing some serious issues with the storage there after the power outage, which may take longer to recover than the others. We are working on this system right now.

  • Monitoring
    Monitoring

    We are slowly bringing up all customer VPS servers. It seems that in some cases a few seconds of data loss is to be expected when we've had such a hard power cut to all systems simultaneously. We are hoping this does not result in any corruption of data, but we have no overview of the impact yet. We will focus on getting customer servers up and running first of all, then we will inspect all systems one by one.

  • Identified
    Identified

    We have power again and most services are booting or are already booted. However, the UPS guys say the fault should of course never have happened on the first place, that's what we have emergency power systems in the first place. They are investigating the root cause and have asked us to hold off doing any work on our side as they say there is a chance we may have another power cut before they are done. We hope this will not be the case...

  • Investigating
    Investigating

    A work crew is doing some UPS maintenance today and it seems they somehow managed to cut power to the DC. We are currently investigating this incident.

Jul 2024 to Sep 2024

Next