Introducing a pragmatic approach for video streaming quality testing

Irina Cotanis

Feb. 8 2024

Video Streaming Quality OTT 5G network testing

The video streaming market is forecast to continue achieving over 18% Compound Annual Growth Rate (CAGR). Behind this are three key factors – the evolution of mobile network technologies, the sophistication of smartphones and the complexity of video streaming technologies. All three significantly rely on ML/AI techniques, making testing and optimizing the user experience of OTT video services potentially complex and costly for operators.

The evolution of mobile network technologies means that today high-resolution 4K and 8K video can be delivered with fewer of the quality issues, such as slow start-up times, rebuffering, resolution switches and frame rate degradation, associated with users’ video experience on older networks. The sophistication of smartphones with their complex display technologies, embedded for example in retina technologies used in iPhones, not only enable video resolutions of up to 4K at 60 fps but make it the new norm that users expect for OTT video streaming applications running on their mobile phones. Last, but not least, are the complexity of the video streaming technologies themselves. Their technical innovation ranges from smart compression algorithms supporting high-resolution video without increasing bandwidth or sacrificing speed, to ML/AI-based OTT video clients. This means video streaming technologies can sense the network performance conditions and the device’s capabilities and then extensively increase the size of the video buffer as well as use optimized buffer pre-filling (initial buffering) to significantly reduce or eliminate rebuffering events and video start delay.

Testing challenges and requirements

The growth in popularity of OTT video streaming applications, and crucially ensuring their user experience, is presenting operators with the challenge of reconciling the highly demanding network performance required by OTT applications with the need to minimize operational costs. Operators must control OPEX while still supporting application variety and diversity, proprietary (non-standardized) codecs and clients, and overcoming a lack of transparency for testing due to encryption and frequent software updates. Furthermore, all this must be done within the context of the increased complexity of mobile access technologies.

The continuously evolving complexity of OTT video streaming applications with their ML-based proprietary codecs, clients and delivery protocols, combined with the increasing sophistication of devices with Ultra High Definition (UHD) displays, has helped overcome more and more QoE problems. However, network problems, such as degraded RF performance and/or traffic congestion, can still result in poor latency, jitter and loss. Thanks to this technology evolution in the OTT application’s performance and the device, it leaves the impact of the network as the most significant factor affecting the QoE of video content. So, while the performance of the OTT applications is generally something that operators cannot control and/or manage, they do need an OTT quality measurement which quantifies only the network impact component of the overall quality.

An OPEX efficient testing solution which is focused only on the network itself needs to enable operators to troubleshoot, optimize and benchmark their network to meet the minimum performance requirements to support highly demanding, varied and diverse OTT video streaming applications.

This can be achieved with a pragmatic OTT video streaming testing solution which needs to satisfy the following criteria:

Run on-device, close to the user’s perception of the streaming quality
Support testing of a variety of OTT video streaming applications characterized by various encryption schemes and lack of testing transparency
To be consistent by proving perceived waiting time, perceived retainability and video quality measurements agnostic to the OTT application
Highly granular video quality measurement suitable for troubleshooting and optimization based on drive testing data
The video quality measurement must reliably reveal network-centric problems, free of both the OTT application’s configuration (codec/client), video content and performance and the device’s performance.

Metrics required for testing user’s perceived OTT video streaming quality

As described in ETSI TR 103.488 (Guidelines on OTT Video Streaming; Service Quality Evaluation Procedures), the user’s perceived quality of an OTT video streaming session has three dimensions: waiting time, video playback (also called ‘presentation’) quality and retainability. Evaluating the performance of any OTT video streaming application requires the determination of all these three dimensions, even when the evaluated OTT application shows a lack of transparency for testing, such as minimal encryption and access to its KPIs.

ETSI STQ-Mobile group developed and released TR 101.578 and TR 103.488, offering guidance for testing OTT video streaming applications. The set of defined KPIs, as well as their measurement, refer to both OTT video streaming access and retainability and the video streaming presentation (during playback) quality. However, when it comes to the latter, ETSI recommends describing it through several KPIs, rather than a single QoE/MOS score.

Meanwhile, ITU-T Study Group 12 spent significant effort to develop a series of models designed to estimate users’ subjective opinion (QoE/MOS) on video streaming playback quality. However, all these solutions, although showing good accuracy, have important drawbacks when it comes to the drive testing solution and/or on-device OTT application testing. These include a lack of fine measurement resolution (e.g. per 1sec), a reduced number of resolutions, and limited support for high-quality demand video content and OTT applications.

Therefore, while ETSI provides exact guidance for determining the perceived waiting time and perceived retainability of the video streaming sessions, the perceived video presentation (playback) quality remains largely unaddressed. The ETSI and ITU-T SG12 solutions are neither optimal for drive testing scenarios nor for the variety of OTT video streaming applications.

A pragmatic testing solution

Combining extensive experience in on-device measurements, OTT voice and video QoE testing with a deep understanding of operators’ network performance and concerns, Infovista developed a pragmatic OTT video streaming testing solution, following the criteria mentioned above.

The solution comes with two significant characteristics: a generic framework for testing a variety of native OTT video streaming apps (e.g., YouTube, Netflix, TikTok, Facebook) and a generic model (a.k.a VSQI, Video Streaming Quality Index) designed to provide a video streaming quality index that is expressed in QoE units (MOS), suitable for any on device OTT application, and meaningful for network centric problems troubleshooting and optimization based on drive test data.

The generic framework for native OTT application testing ensures consistency and efficiency through:

Automated and fast testing using one script in one drive test to collect field data for several OTT applications
A common set of KPIs for all tested native OTT applications, as defined by ETSI
The generic definition of triggering KPIs measurements for all tested native OTT applications, as defined by ETSI

All these are achieved in two testing steps: scripting and IP sniffing. The scripting contains user interface (UI) actions and trigger points for events to be sent to the ODM (On Device Measurement) app, generating measurement events and KPIs per streaming application, but with shared ODM service and KPIs. The IP Sniffing works on commercial devices embedded in the generic framework for ODM IP recording to provide KPI triggers as well as payload information for throughput calculations.

The Video Streaming Quality Index (VSQI) model’s design addresses aspects related to today’s OTT application testing in drive testing scenarios. These include:

selected input parameters (resolution, frame rate, playout state)
calibration to MOS scale
support for UHD video (max MOS for up to 8K resolution)
video content independency by using normalization to the most sensitive video content in the mapping process of the input parameters to the perceived video quality (MOS).

Because of this, VSQI output shows meaningful variability for detecting network problems, troubleshooting and optimizing for high-resolution content, and then scoring and quantifying the user experience (MOS scale).

The suitability for drive testing scenarios is also defined by a fine measurement resolution and two quality index outputs – VSQIinstant and VSQIsession. The fine geographical/spatial resolution of VSQIinstant enables accurate geographical pinpointing of where a network problem caused a degradation of the video quality. The VSQIsession score is provided over short video sessions (30-40sec) with the scope to help keep the video buffer small and consequently to ensure small delays from the moment when a network problem occurs to the moment its impact affects the video quality, which is the main scope of drive testing scenarios. Thus, the VSQIsession scores reflect the long-term effect while being better suited for drive test scenarios. At the same time, VSQIsession captures the long-term effects of video playout interruption (rebufferings) as well as possible resolution changes. Finally, the VSQI model is proven to show a trustful performance as defined by performance statistics values within the performance requirements on QoE video models (per ITU-T requirements) such as correlation coefficient R>80%, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) <0.5.

As the whitepaper shows in more detail, VSQI provides network-centric video quality with an accuracy characteristic to QoE video quality models, but with the added value of supporting high resolutions and frame rates, as well as feasible and suited to drive test scenarios.

Conclusions

Supporting OTT video streaming applications with seamless user experience is becoming a significant challenge for operators due to the variety and diversity of applications, their lack of transparency for testing, the increased complexity of the mobile access technologies and OPEX constraints requiring testing to be minimized. With the performance of the OTT applications showing continuous improvement but operators unable to control and/or manage it, a pragmatic network-centric testing approach is the most cost-efficient solution.

Infovista has developed a solution based on a generic framework for testing a variety of native OTT applications with a common set of KPIs describing user-perceived waiting time, retainability and video quality during playback. With this solution, Infovista empowers operators with a generic tool which enables consistency across the testing of various native OTT applications. The benefits of the network-centric Video Streaming Quality Index with calibration to the most sensitive video content and dual scoring (instantaneous and per session), are two-folded: accurate fine-resolution troubleshooting suited to drive testing and benchmarking of overall OTT video streaming session quality.

Written By

Irina Cotanis

Learn More

Download OTT Telephony Application Testing (1/2) Whitepaper

Download OTT Telephony Application Testing (2/2) Whitepaper

How to successfully test the user experience of OTT video streaming on mobile networks

Learn More

Irina Cotanis

Dr. Irina Cotanis has more than 25 years of experience in wireless communications systems, statistical signal processing and analysis, and statistics. Her expertise covers mobile networks troubleshooting and services quality assurance for users’ experience optimization, machine learning-based QoE modelling and predictive QoS/QoE testing. Dr. Cotanis's latest research is focused on the need of advanced testing techniques emerged from the evolution to 5G technologies designed to simultaneously deliver consumer and IoT services with highly demanding user/device and/or machine-centric quality performance and spectral efficiency. These techniques are based on user/device/machine-centric predictive testing and ML/AI analytics for autonomous cloud-based testing systems suitable to work within the context of 5G self- and content-aware networks and devices. For the past 20 years, she has been actively involved in the development of various ITU-T Study Group 12 recommendations and ETSI STQ Mobile specifications. During her carrier, she earned several patents, authored several IEEE papers, acted as reviewer and chair, and delivered keynote talks at various IEEE conferences and industry workshops.