Big social media is still in big trouble: Instagram, WhatsApp, Facebook and Facebook Messenger experienced widespread outages Monday morning, and many users around the world can’t use some of the most popular sites – Facebook’s services. While this isn’t unknown, the length of the outage, spanning over six hours at the time of this writing, is very rare – and it’s not clear when they’ll be fully back online across the globe.
Facebook.com seems to be coming back for some users, which started with the company’s Newsroom and seems to be extending to the full site and social network. But Instagram and other services remain down in some areas, leading us to believe services are resuming on a region-by-region basis.
DownDetector – a website that tracks outages of online services – had shown that all services were struggling in many territories. There had also been reports that services with accounts tied to Facebook logins, like Airbnb and Strava, are not working.
There’s currently no way to avoid the issues, so you’ll just have to wait until they’ve been solved to reconnect to WhatsApp, Instagram or Facebook. The company seems to be slowly implementing fixes for each service.
We’ve yet to hear an official reason for the outage, but we’ll update here when we have more definitive answers from Facebook itself. Analysts have speculated that the outage was caused by severe networking issues, which we’ve detailed below.
Shortly after the outage began, Facebook’s communications executive, Andy Stone, was the first to share an update on Twitter to say the company is aware of the issues and it’s currently working on a fix, followed by WhatsApp’s Twitter account. Soon thereafter, Facebook’s official account acknowledged that users were having trouble accessing the company’s apps and products. Facebook CTO Mike Schroepfer tweeted an apology four hours into the outage, and after six hours, the company sent another tweet apologizing as its services come back online:
To the huge community of people and businesses around the world who depend on us: we’re sorry. We’ve been working hard to restore access to our apps and services and are happy to report they are coming back online now. Thank you for bearing with us.October 4, 2021
The issue may have affected other Facebook products, too: some users have also reported issues with the company’s Oculus virtual reality gaming services. Noted Facebook and Twitter data miner Jane Manchun Wong warned users via tweet not to restart their Oculus devices during the outage lest they lose their games. VR game and software designer Julien Dorra tweeted a video of what it’s like to load up Oculus in the midst of the outage:
Facebook brought Oculus down with them 🙁 pic.twitter.com/rfapj1yaSUOctober 4, 2021
And the outage might have impacted Facebook’s real-world infrastructure as well: according to a tweet by New York Times reporter Sheera Frenkel, a Facebook employee reportedly can’t even enter company buildings due to malfunctioning badges. Another NYT report claims employees struggled to make calls from work-issued phones or receive emails from outside the company.
Facebook outages: what happened?
None of the Facebook, Whatsapp, or Instagram accounts have explained what originally caused the outage, leading to speculation and analysis. At this point, most agree that this isn’t a hack or directed attack on Facebook’s infrastructure, and sources have told the New York Times that it probably wasn’t a cyberattack because ‘one hack was unlikely to affect so many apps at once.’
Instead, evidence shows the company’s network paths to the outside web just disappeared without explanation this morning.
Brian Krebs of cybersecurity firm Krebs on Security tweeted his conclusion that the domain name system (DNS) records routing traffic to Facebook sites and services were simply withdrawn – as in, gone from the web – this morning:
Confirmed: The DNS records that tell systems how to find https://t.co/qHzVq2Mr4E or https://t.co/JoIPxXI9GI got withdrawn this morning from the global routing tables. Can you imagine working at FB right now, when your email no longer works & all your internal FB-based tools fail?October 4, 2021
In a follow-up tweet, Krebs clarified with his belief that the border gateway protocol (BGP) routes serving Facebook’s DNS were gone, making every site on a Facebook domain inaccessible. This presumably explains why its services and third-party login access, as well as Instagram/WhatsApp/Facebook Messenger, are completely down.
Other networking companies have noticed and theorized the issue is with BGP routes, including Cloudflare SVP Dane Knecht, who tweeted an observation that Facebook DNS and other services are down and ‘their BGP routes have been withdrawn from the internet.’ He noted that Cloudflare also saw its own failures, but a follow-up tweet suggested it was recovering. Separately, Cloudflare CTO John Graham-Cumming tweeted seeing Facebook’s BGP changes as they happened and suggested they were mostly BGP route withdrawals.
This BGP de-routing might be a very severe issue that is harder to fix than a DNS error
BGP is a big (global) problem
While DNS is a website’s numerical address on the internet (which is translated from the ‘www.___.com’ you type in your search bar), BGP routes are the pathways that requests take through servers and computers to get to their destination. When Facebook’s BGP routes were reportedly withdrawn from the internet, sites connected to those routes (like Cloudflare above) saw them collapse, and Facebook sites and services become inaccessible.
Internet theorizing on the r/sysadmin subreddit suggested that a configuration change happened this morning that caused the BGP routes to go down, and this cut Facebook off from making remote changes – from here on, only physical access could fix the damage (emphasized in a screenshot by Twitter user Andree Toonk).
An aforementioned New York Times report supports this theory, citing an alleged internal Facebook memo that a small team of employees was dispatched to the company’s Santa Clara, CA data center to manually reset the company servers.
Just before Facebook services started coming back online, Krebs cited a source in stating that the outage was caused by a faulty BGP update that blocked remote users from reverting changes while locking out local access:
From trusted source: Person on FB recovery effort said the outage was from a routine BGP update gone wrong. But the update blocked remote users from reverting changes, and people with physical access didn’t have network/logical access. So blocked at both ends from reversing it.October 4, 2021
Outages: a continuing problem?
“Outages are increasing in volume and can often point towards a cyber-attack, but this can add to the confusion early on when we are diagnosing the causes,” said Jake More, expert at cybersecurity and antivirus company ESET, in an emailed comment to TechRadar. “As we saw with Fastly in the summer, web-blackouts are more often originate from undiscovered software bug or even human error.”
March and April 2021 saw a similarly major outage where each of Facebook’s services affected today – Facebook, Instagram, WhatsApp, and Facebook Messenger – was down for over half an hour each time. But given how much faster those issues were resolved, the latest outage seems to be a catastrophe of a much higher magnitude.
Those last outages were due to a bug in the Domain Name System (DNS) of these services, but seemingly not as severe as a BGP issue.
social experiment by Livio Acerbo #greengroundit #techradar https://www.techradar.com/news/whatsapp-instagram-and-facebook-are-all-down-right-now/