Corporate 'the buck stops here' at CenturyLink / Level3

home | blog | Terrible people and places | Covid-19 links | Teh Internet | guest blog |rants | placeholder | political | projects | Gwen and Liam | Citadel patched | Tools | Scouts




Key bit here:
"Corrective Actions: ... The individual responsible for this policy change has been identified."

I feel for that worker bee. Sounds like they need a scapegoat for bad process.
Root Cause: A
configuration issue impacted IP services in various markets across the United
States.

Fix Action: The IP NOC reverted a policy change to restore services to a stable
state.

Summary: The IP NOC was informed of a significant client impact which seemed to
originate on the east coast. The IP NOC began investigating, and soon
discovered that the service impact was occurring in various markets across the
United States. The issue was isolated to a policy change that was implemented
to a single router in error while trying to configure an individual customer
BGP. This policy change affected a major public peering session. The IP NOC
reverted the policy change to restore services to a stable state.

Corrective Actions: An extensive post analysis review will be conducted to
evaluate preventative measures and corrective actions that can be implemented
to prevent network impact of this magnitude. The individual responsible for
this policy change has been identified.

This service impact has concluded; if additional issues are experienced, please
contact the CenturyLink Technical Service Center. There may be additional
analysis and discovery that occurs as the incident is reviewed by NOC
management. Any available updates will be relayed upon event ticket closure. At
that time, a customer satisfaction survey link may be available. We strive to
provide thorough communications containing the available information during a
service disruption. Please let us know if the updates you received during this
event were satisfactory.

More light reading:
https://news.ycombinator.com/item?id=15684372
When that link breaks:
https://dyn.com/blog/widespread-impact-caused-by-level-3-bgp-route-leak/
Even more on complex systems and root cause:
https://www.kitchensoap.com/2012/02/10/each-necessary-but-only-jointly-sufficient/



Update Oct 27 2018:
Look for WYLFIWYF

So, don't ask - why questions, as that will lead down a path as to who.
But rather ask the how questions, as they will uncover the what is responsible.

Once you are on the path of learning from issues, you can move forward.
As I have posted before, you could learn from curios people - thanks again Jeri!:
blog/01373517643



[æ]