Webdock - Network outage in Finland – Incident details

All systems operational

Network outage in Finland

Resolved
Degraded performance
Started 8 months agoLasted about 3 hours

Affected

Finland: Network Infrastructure

Degraded performance from 5:46 PM to 8:32 PM

Updates
  • Resolved
    Resolved

    We managed to work around the issue and it turned out we did not need another reboot after all. For this reason, we will call this issue resolved for now. We will keep monitoring the network of course and will open another incident if anything further happens. For now at least, we are good.

  • Monitoring
    Update

    We will be performing another reboot of key systems shortly. This will bring down the network for about 10 minutes. If network connectivity is OK after the reboot, then no further action will be taken and we will mark this incident as resolved.

    We apologize for any and all inconvenience this has caused tonight - trust us when we say that this has been no fun for us either :)

  • Monitoring
    Update

    We implemented a fix and are currently monitoring the result. All connectivity should now be OK. Unfortunately, we may need to reboot some systems causing another brief period of downtime of about 10-15 minutes before we can call this completely resolved. We will update here if that turns out to be required.

  • Monitoring
    Update

    Unfortunately we are still experiencing a severly degraded network. We identified an attack towards one of our customers and discarded that traffic - but it was not a high volume attack, only about 2Gbit/sec and 1.2 million packets/second. As you can see, mitigating this attack did not improve the situation so there are more attacks and/or issues at work at this time which we continue to investigate.

  • Monitoring
    Monitoring

    Still seeing degraded performance unfortunately and still investigating the situation. Our network team is hard at work trying to fix this issue, and have tried a few things already to narrow down the source, like disabling IPv6 to see if the traffic was originating on that part of our network and disabling certain other features. So far this looks to be an IPv4 based attack which is also interesting as IPv4 seems a lot less impacted than IPv6 - but this may be a quirk of how our hardware works rather than anything indicative of the source. We see periods of good functionality with periods of packet loss in between or about 50-60% packet loss on IPv4 continuing at this time.

  • Identified
    Update

    After key equipment reboot and various diagnostics we can now conclusively say this is an attack on our infrastructure and not a hardware fault. The reason why we couldn't see this immediately is because our monitoring is not flagging anything and our hardware is not being overloaded in the typical way where CPU or memory usage is super high (like when we have to deal with a lot of packets) - so this is some new type of attack we are unfamiliar with.

    We are also checking with our upstream provider (Hetzner) why their DOS filtering hasn't caught this and if they can spot the malicious traffic.

    We will keep updating here as we learn more.

  • Identified
    Update

    Key equipment is being rebooted. This process can take up to 10-15 minutes.

  • Identified
    Identified

    We are now working on a fix which entails taking some equipment down for a reboot, this means that we have a complete outage at the moment. We hope to be back up very soon.

  • Investigating
    Update

    We apologize for the wait for a fix here and the continued degraded performance. We are still pin-pointing the cause, this is not something we have seen before and we are trying to figure out if it's indeed a hardware fault or an attack of some sorts. Thank you for your patience this evening.

  • Investigating
    Update

    We are continuing to investigate the incident. IPv6 is much more impacted than IPv4 where we are seeing much higher packet loss and latency on IPv6 than IPv4, or as high as 90% and-6 seconds of latency. Without having found the exact root cause, this is now looking less like a DOS attack and more like a hardware fault or issue with our Arista switches in the Finland DC, as opposed to our firewalls being overwhelmed by malicious traffic. This is not conclusive yet however and we are still locating the root cause here.

  • Investigating
    Update

    We are seeing high latency and packet loss and not a complete outage as first observed. This is indicative of a DOS attack or similar event. We are continuing to work on identifying the root cause and a fix for this incident.

  • Investigating
    Investigating

    Looks like we are experiencing a network outage in Finland. We are currently investigating this incident.