Webdock - One fiber down in DK DC, some impact was seen on 217.78.237.0/24 – Incident details

All systems operational

One fiber down in DK DC, some impact was seen on 217.78.237.0/24

Resolved
Partial outage
Started 7 months agoLasted about 14 hours

Affected

Denmark: Network Infrastructure
Updates
  • Resolved
    Resolved

    Through pure luck we have been operational almost all day today as we had not shifted our entire network from Finland yet. We are still unclear as to how much of our network is incorrectly configured with GlobalConnect but we will make sure everything is in order with them tomorrow.

    We have now learned a bit more about the nature of the incident. It seems they had a major fiber break on a backbone bundle in connection with some freeway work near Kolding. As the break was beneath a busy freeway and the size of the bundle this meant their repair work has taken a very long time.

    They now report they should have our fiber up within the next 3 hours. As we are not operationally impacted and that we know the timeframe for their fix, we are calling this issue resolved on our side.

  • Update
    Update
    Our ISP GlobalConnect still has an outage in about half of their locations in Denmark. We believe that at this point this is looking like one of the biggest outages this provider has ever had in the nordics, at least given the scale of their affected area and the duration. We are still waiting to receive light on the affected fiber pair, but fortunately due to our redundant setup, as of now we are unaffected and have no issues with connectivity. All customers and ip ranges are still operational. We will update here once we are in a redundant and normalized state again in the DK DC. We have in the meantime reached out to other ISPs in the region today and are looking into establishing more fiber to our facility for further redundancy. The incident today was a warning to us that despite us having redundant fully-diverse-route connections, we are still relying on the infrastructure (and BGP configuration) of a single upstream provider. We are working towards fixing that as soon as possible. In addition to this we have implemented further monitoring so we can catch partial outages sooner. This morning we did not realize at first that a single ip range was non-functional as all the others were working, so it took us 20-30 minutes to realize that we were indeed affected by the ISP outage. This time to react should be significantly reduced now as we have automated monitoring on all IP addresses on our network.
  • Update
    Update

    It seems our ISP incorrectly configured one of our ranges so that it was not being advertised properly to the internet in a redundant fashion. We still have a tunnel from our old Finland location and we requested Hetzner to start advertising this prefix 217.78.237.0/24 again. They did so promptly and we now see connectivity again. The underlying issue is not fixed yet however so we will keep this incident open until the situation is fully resolved.

  • Update
    Update

    It seems we are not completely unaffected, it turns out one of our ip ranges 217.78.237.0/24 is being affected by the outage for unknown reasons. It should be routed the same as all other nets, but for some reason the partial ISP outage is affecting this net. This is impacting about 8% of our customers, so if you are one of the unlucky ones, rest assured we are working to resolve this issue.

  • Monitoring
    Monitoring

    We saw a fiber connect to our DK DC loose light at about 8.40 CET. After speaking with our ISP they report some core equipment went down in their Kolding location. We have of course redundant fiber connections so our other connects took over all traffic and all we saw was a short duration of some packet loss while any traffic that was flowing through Kolding was redirected through our other fiber from that ISP. There is no lasting impact on us at this time it seems. The ISP is working on their issue and they say we should receive light again sometime today so we are fully redundant again. We will monitor the situation, but there is nothing we can do on our side at the moment except wait.