Advertisement

Rogers lacked protections, redundencies that may have prevented 2022 outage: report

Click to play video: 'Rogers outage sparks new deal between Canada’s telecom companies'
Rogers outage sparks new deal between Canada’s telecom companies
RELATED: Rogers outage sparks new deal between Canada's telecom companies – Sep 7, 2022

An independent report into the 2022 Rogers outage says the company lacked several protections and redundancies that could have either prevented the outage or ended it sooner.

The report delivered to the Canadian Radio-Television and Telecommunications Commission says that since the outage, the telecom company has implemented the changes needed to address the cause of the outage and improve network resiliency and reliability.

In a separate letter posted to its website Thursday, the CRTC confirmed that Rogers has also implemented all the report’s additional recommendations.

“We said we would fix this – we completed a full review of our networks, strengthened our network resiliency, implemented all the report recommendations, and today our networks are recognized as the most reliable by global benchmarking leaders,” said Rogers spokeswoman Sarah Schmidt in a statement.

Story continues below advertisement

The outage in the early morning of July 8 two years ago lasted more than 24 hours and affected more than 12 million customers.

An configuration error during the network upgrade caused a flood of data to the core network routers, which crashed, according to the executive summary of the report by Xona Partners Inc. posted online Thursday.

The network failure could have been prevented if the core network routers had been configured with an overload limit, the report said.

Once the outage occurred, the report says it was prolonged by several factors.

Click to play video: 'Analysts skeptical of Rogers response to parliamentary hearings on outage'
Analysts skeptical of Rogers response to parliamentary hearings on outage

The Rogers network operation centre and other critical remote infrastructure sites did not have redundant connectivity from other service providers, the report said, limiting access to critical equipment during the outage. Staff had to be physically dispatched to remote sites in order to access the affected routers, delaying recovery efforts.

Story continues below advertisement

In addition, Rogers staff also didn’t have backup connectivity from alternative service providers, and so they couldn’t communicate with each other until the company sent SIM cards from other service providers to its remote sites.

Get the day's top news, political, economic, and current affairs headlines, delivered to your inbox once a day.

Get daily National news

Get the day's top news, political, economic, and current affairs headlines, delivered to your inbox once a day.
By providing your email address, you have read and agree to Global News' Terms and Conditions and Privacy Policy.

The report said that staff also didn’t initially have access to information like the routers’ error logs and were unable to pinpoint the root cause of the outage for around 14 hours. There had also been multiple configuration changes made that day. These two factors contributed to the root cause being initially misdiagnosed, the report said.

The measures taken by Rogers since the outage include addressing the critical deficiencies exposed by the outage, separating the IP core for its wireless and wireline networks, and improving the processes for change management and incident management, the report said.

The report made seven recommendations of additional measures Rogers could take to improve its network resiliency.

Click to play video: 'CRTC pushes Rogers for answers after national outage'
CRTC pushes Rogers for answers after national outage

Among the recommendations, which have since been taken by Rogers, are that the company test emergency roaming with other mobile network operators, develop a detailed root cause analysis for future outages, and expand the scope of incident management drills.

Story continues below advertisement

Rogers sent a letter to the CRTC on Jan. 17 outlining how it responded to the report’s recommendations of additional measures.

In the CRTC’s letter confirming those additional measures were implemented, the commission said that by July 4 next year, Rogers must report on whether the measures continue to address reliability issues, and on progress made in separating wireline and wireless core networks.

Rogers is partnering with Cisco in its work to split and build a new dedicated IP core, separating the two networks, said Schmidt. The company has also introduced new change controls that will limit the effects of “customer-impacting events,” she said, as well as “AI-based predictive simulation capabilities to strengthen our testing and monitoring.”

The report also included recommendations for all telecom network operators based on the “important lessons learned” from the outage. These include implementing router overload protection in the IP core and distribution networks; providing backup connectivity for the network operation centre, critical remote sites and critical staff; and simulating network failure and outage scenarios to uncover deficiencies.

Sponsored content

AdChoices