It's already a reality ... mobile video services account for 70 per cent of all mobile data consumption, and that is only set to grow. As a result, there is an urgent need for operators to test, monitor and, more importantly, the optimize video quality in order to match customer expectations.
For operators, controlling and cost efficiently balancing quality of experience (QoE), network resources, bandwidth and capacity is therefore a necessity. But one of the most challenging questions is: ‘Which metric should be used to measure video QoE, given the huge range of both standardized and non-standardized solutions?'
Don't panic! Here's the good news... I outline below five simple questions that you can apply to help solve that dilemma.
First, what types of QoE metrics are available?
There are three main categories: perceptual, parametric and hybrid. Perceptual is based on human perception and cognition; parametric uses transport layer and client parameters (and payload, in which case it is known as ‘bitstream'); while hybrid combines all metrics. There are also several variations within each category, depending on the type of input data. In the case of perceptual metrics, it can refer either to both the reference and received video clip (full reference models), or just the received video clip (non-reference models). Perceptual metrics can also use reduced reference video information, rather than the full reference video, in which case this is called ‘reduced reference metrics'. Similarly, hybrid metrics, which also use the video signal itself, can be full, reduced or non-reference. In addition, hybrid solutions can be based either on the payload (bitstream category) or on transport header and client information only (parametric category).
Second, why are there so many types of video QoE metrics instead of just one, such as POLQA (ITU-T P.863)?
The answer is the multi-dimensionality of the video service itself. Various types of video content, resolutions, formats, and bit rates, as well as delivery protocols (e.g. real time, streaming) and end-user clients, require different QoE metrics, especially when it comes to parametric and hybrid solutions, which rely on fine-tuning various combinations of all of these factors. Regardless of how intelligent an algorithm is, it cannot single-handedly measure such a broad variety of conditions and scenarios.
In addition, the encryption of video content, during transport or delivery, requires different QoE estimation models depending on the available input information. However, there is an exception: perceptual full reference QoE metrics are more or less independent of encryption as the models are using the video signal itself, which can be recorded and analyzed after playback. Similarly, (although with limited parametric and bitstream information within an encrypted context) hybrid full reference QoE metrics can compensate for missing information from the video signal.
Third, are there any standardized and non-standardized video QoE solutions?
Yes, there are several standardized solutions already available with more under development, which take into account the dynamic of the video services' characteristics, such as the shift from SD to HD to 3D video, as well as new streaming services using new clients and codecs. Perceptual and hybrid QoE metrics are also emerging from work performed by the Video Quality Experience Group (VQEG) and being standardized within the ITU-T J.2xx and J.3xx.x series. Meanwhile, parametric QoE metrics are being developed and standardized within the ITU-T Study Group 12 in P.12xx.1 series.
And yes, there are also many third parties solutions available, but the vast majority relate to parametric data only (including bitstream, if encryption is not present). Very few are available for perceptual full reference and hybrid types, and they are generally tuned and/or adapted versions of the standardized solutions provided by the same third parties that worked on the standards themselves. The main reason for this trend lies in the fact that using the video signal itself involves complex knowledge and modeling of human perception and cognition. However, parametric solutions (including bitstream) offer a more straightforward model based on network and client KPIs and/or payload.
Fourth, what are the differences between each metric?
The main difference is the accuracy and yes, you guessed, it depends on the algorithms used and the input data type. As might be expected, solutions that use human perception and cognition, as well as the reference video clips, are the most accurate (e.g perceptual and hybrid bitstream based on both reference and received video clips). All other types are less accurate, with accuracy deteriorating further if less input data is available. Therefore, parametric solutions that use just header and client information would be expected to show the least accuracy, with parametric bitstream metrics (using payload as well) displaying better accuracy. In addition, accuracy improves if information from the video signal itself is used, as with hybrid reduced and non-reference models.
Of course, the more complex the algorithm, the better the accuracy. But, as with everything, these benefits do not come for free. Complex algorithms require intensive processing time and extensive memory, and so are not suitable for use in some testing solutions, such as on-device testing, which conversely can provide the most meaningful metrics as they accurately reflect user experience. In addition, not all video services allow testing of some QoE metrics types. For example, applications such as Netflix do not allow (or rather are incompatible with) testing solutions that use full reference input data, whether perceptual or hybrid.
Fifth, which metric should you use?
Unfortunately, there is no ‘one-size-fits-all' answer. Rather, it depends on which service you are testing. Ask: ‘What is the goal of the testing?' Here is some guidance:
There are three main mobile video services categories: conversational/video telephony (e.g. ViLTE); on-demand streaming (e.g. YouTube, Netflix, Amazon, Hulu); and, live (real time) streaming (e.g. eMBMS).
- Conversational scenarios can typically be described as ‘head and shoulder' video clips. In this case, the test set-up is similar to voice service testing. Therefore, perceptual full reference (e.g. J.247) and/or hybrid full reference (J.343.5/6) solutions are suitable and can provide accurate real user experience in either drive testing or monitoring testing situations. However, for drive testing scenarios there are two caveats: first, implementation optimization is required for real time QoE scoring; and, second, today's smartphones do not yet offer a video in/output feature. Therefore, selecting one of the parametric/bitstream (P.120x.x, x=1,2 or third-party solutions based on these) or hybrid reduced/non reference (J.343.1-4) solutions is likely to be the best alternative currently available. On the other hand, monitoring solutions that detect quality trends across regions, and/or time windows (e.g. days, weeks, months), are more suitable for parametric and/or bitstream (e.g. P.120x.x; x=1,2 or third-party solutions based on these ) or hybrid reduced/non reference (e.g. J.343.1-4) QoE metrics. The choice between the two categories should be based mainly on the complexity of the testing procedure, as there are no convincing technical arguments for one or the other. As they require the use of the recorded received video, as well as its processing and analysis, hybrid solutions are more complex and prone to measurements set-up errors.
- For on-demand streaming services, which are mostly dynamic adaptive today, standardization does not yet offer any QoE metrics. However, early next year ITU-T P.NATS is expected to become a standard for such video service types. Until then, a third-party offering is the best solution; either based on previous standards (e.g. perceptual full reference PEVQ-S) or non-standard solutions (all parametric, or bitstream if encryption is not present). It's important to recognize that when dealing with encryption, very poor accuracy is only to be expected. Using monitoring solutions that just reveal trends could be, to some extent, good enough, especially if aggregation is applied over very long time windows, and if other network KPIs are monitored and analyzed in order to understand the quality of the behavioral trend. However, drive testing solutions require higher accuracy similar to that provided by perceptual full reference-based testing, such as PEVQ-S optimized to run on devices. It's important to understand though that perceptual full reference solutions can only be used for YouTube-type video services when a reference video clip can be uploaded to the YouTube server, for example. Netflix, Hulu and others are only suitable for parametric solutions as encryption is always present.
- Last, but not least, it is important to note that within the video standardization and video community it is very much expected that video service QoE evaluation will always require various QoE metrics depending on different service and test scenarios. And, more importantly, that different QoE metric results cannot be directly compared to each other. In addition, unlike voice services (for which POLQA has been safely used for many years), the field of video QoE metrics is set to become a very dynamic and adaptive environment in which new delivery / transport and container video formats, as well as new clients and codecs/codecs profiles with improved error concealment schemes, cen be expected on a regular basis; something that we are already experiencing today.
You can find out more about video service QoE metrics and InfoVista's evaluation solutions in a selection of white papers here.