As roll-out of 5G continues and traffic starts to flow, simple assurance automations are live (for example, certain elements of automated root cause analysis) and various industry working groups are setting out requirements for the provision of closed-loop or self-healing assurance as part of their automated network and services vision.
The ETSI working group for Zero touch network and Service Management have defined a framework for end-to-end, cross-domain/cross-technology service and network automations for all operational processes – including assurance. And the TM Forum Autonomous Networks project looks beyond the telco and discusses the place of automated assurance and operations to support telcos in deploying “a set of innovative business models and network services that enable the digitalization of vertical industries.”
What will automated assurance & operations deliver?
Automated assurance and operations will need to deliver significantly more functionality as 5G moves towards 6G; bringing with it multiple new services, automated and virtualized network architecture, deployments in the cloud and edge capabilities.
The diagram below shows the span of these functionalities – moving from more simple descriptive tasks in the top row to prescriptive, closed-loop capabilities at the bottom; and the main consumers of task outputs across the columns. Focussing on the provision of assurance to the operational teams in this blog, the colored boxes indicate where automation is and will be included and the prevalence of colored boxes in the first three columns demonstrates the importance of this topic.
Source: Charlotte Patrick Consult, 2022
Looking at the use of automation in each of the colored boxes in turn:
Automation of monitoring and reporting
The pink boxes highlight the automation of more “descriptive” assurance tasks, such as:
- Identification of issues/events across domains to report or initiate alarms
- Tracking of customer experience across network, services, devices, or applications to report or initiate alarms – correlating network and other issues end-to-end
- Assessment of the service impact for network-related issues
- Monitoring of performance against enterprise service level agreements
These descriptive measurements increase in complexity as networks become virtualized and programable requiring automation of their processes to continuously monitor and alert in real-time. Automations such as the activation of virtual probes and active tests could be considered part of these tasks.
Automation of diagnostic tools
The yellow boxes identify the need for automation of diagnostics for root-cause analysis (e.g., network resource faults, service configuration issues etc.) to enable the enrichment of alarming and troubleshooting. Automation includes the detection of a problem, decision on importance of issue and sending of guidance to network or other systems for resolution. Like the monitoring above, the process can be improved in a closed-loop manner using automation and intelligence to enhance troubleshooting capabilities.
Automated network changes
The grey boxes describe the use of automation and machine learning to predict the need for change and make the appropriate changes:
- Adaptive thresholds track network KPIs over time and “understand” what an appropriate level might be for times of day or for known situations on the network. Thresholds for alarms are then automatically set dependent on these revised KPIs
- Dynamic SLAs calculate the minimum resources needed under low, normal, and peak workloads to meet SLAs. Automated assurance then provides the orchestrator with any changes needed to meet these SLAs and predicts expected peak needs to allow resource management ahead of time
Automation in lifecycle management
The green boxes bring together opportunities for automating assurance and monitoring as part of the lifecycle management of various network functions and domain-specific processes such as those found on the RAN:
- Virtual network functions and related infrastructure require automated assurance as part of their management to monitor their performance and spin up, increase capacity, restore, and take them down as needed. Self-testing may be considered part of these activities including automation of health checks of current vs previous configurations, active testing, and retesting when necessary
- On the RAN, monitoring across diverse types of cells and front-, mid- and backhaul allow for the automation of alarms, root-cause analysis and prescription of changes needed to topology, beamforming etc.
Assurance as part of operational automations
The light blue box shows the role of assurance feeding into operational processes:
- Provision of information into incident management systems
- Automatic opening of a trouble ticket and orchestrating of corresponding flow of actions
- Automated triggering of field service activities
Assurance in the self-healing network
This is the automation of assurance capabilities and integration with both service and domain orchestrators to support open and closed-loop automations. These automations include:
- Lifecycle management of services
- Capacity management (e.g., increase port bandwidth, automated rerouting)
- Building of predictive failure models to change network configurations and avoid failure conditions
- Automation of the handoff and compensation process to seamlessly offload workloads to healthy surrounding nodes before initiating troubleshooting procedures
And finally, there are automations which underpin all boxes on the diagram. These allow assurance functionalities to:
- Automate their lifecycle management (instantiation, configuration and take down as required to provide appropriate assurance at the points of need)
- Automate reporting on the status of the function to allow monitoring, root-cause-analysis, and self-healing (e.g., the ability to reconfigure itself in response to changes of status)
Why is this topic important?
In the early days of assuring virtual and programmable networks, simple automations of assurance helped to monitor and diagnose new networks – cutting costs and coping with degrees of complexity which would be beyond the ability of humans in the operational teams.
As automations are rolled out elsewhere in the network, automated assurance and operations becomes an integral part of the design – deploy – assure cycle. Orchestrating the network end-to-end will require increasingly complex choices to be made across multiple networks and vendor tools; and self-healing capabilities will be mandatory to ensure that customers are not affected for lengthy periods.
The concept of “Automated Assurance and Operations” encapsulates the combination of granular monitoring; multi-domain analytics; and process and network operations management capabilities – underpinned by both rule-based and AI/ML-driven automation.
This is a core enabler of a broad set of automated operations scenarios (outlined above), which means that Automated Assurance and Operations is essential for the efficient delivery of guaranteed, high performance connectivity services.
Key takeaways for telcos
When considering the deployment of automated assurance and operations, telcos should:
- Move from past behaviors of thinking about assurance requirements only once network management and other functionality is in place – given that closed loops will not be possible without integrating a range of assurance capabilities
- Start by focusing on rules- or metrics-based assurance automations, as more intelligent automations require excellent quality data sets which will not be available in the early days
- Create closed-loop assurances in simple processes which do not have a lot of upstream and downstream dependencies
- Think beyond the left-hand side of the diagram in their longer-term planning – especially towards the potential future needs for assurance by their enterprise customers. Without this planning, marketing and sales will not be able to successfully ramp up sales of many low-latency or other specialist services.
For more on automated assurance and operations, and why an integrated approach is essential to assuring advanced cloudified networks, read our new whitepaper.