As a background, here is an overview of the most common video compression standard that is used today, H264: https://www.maketecheasier.com/how-video-compression-works/
H265 has improved here, but it was a minor flop for a number of reasons. It’s in use and will continue to grow, but licensing issues are an issue. To fix this, 3 new video standards have been approved by MPEG: https://ottverse.com/vvc-evc-lcevc-mpeg-video-codecs/
A consortium of companies has also recommended a license-free codec called AV1. However, here I will mainly be talking about the EVC baseline, which is used at 512p resolution. While there are differences in codec efficiency at this resolution, known as SD resolution, it’s not great – only about 20% between worst and best (excluding the H264). It’s not used a lot these days – people are switching to HD (720p) or Full HD (1024p). It continues with 4k, which is now becoming common with streaming services like Netflix, and the push for 8K is on. At these higher resolutions, there is a big difference between codec efficiency and complexity. The bitrate for 4k over the internet has basically been resolved, but 8k has been a problem so far. Hopefully the new codecs will help tame it to around 20Mbps. 4K is around 15Mbps, although Netflix recommends 25Mbps to be on the safe side. However, with new technologies like Shot Based Encoding, Netflix has made it even more efficient: https://hackaday.com/2020/09/16/dec…laining-optimized-shot-based-encoding-for-4k/
LCEVC and basic EVC
Here I am going to talk about LCEVC and basic EVC. Baseline EVC at 512p is close to the efficiency of the most efficient codec, VVC, but has very little complexity and no license fees. It’s mostly an arbitrary choice; any codec will do and not make much of a difference in efficiency.
The secret of LCEVC is that traditional video codecs don’t compress the high frequency portion of the video well. To fix the problem, a company called V-Nova has developed LCEVC. You will find a lot of information under the following link: https://www.lcevc.org/
You can find an overview at: https://www.lcevc.org/how-lcevc-works/
The video is scaled down by 4 and encoded using a standard codec (EVC baseline in this description), but any codec will do. That will be transmitted. Also, the difference between the original and the scaled down version has been added. The method used to restore the original is to scale it up by 2 and then add the difference between the original downscaled by 2 and the upscaled version. Once corrected, it is again scaled up by 2 and the differences added to keep the original. It’s flexible in that you can skip the second downscale if you want and add additional downscaling steps (although I don’t think V-Nova made another 3 downscale version – but IMHO it would be an advantage at 8k, as explained later). They developed some “tricky” techniques, like m-prediction and using the previous frame to guess the difference, in order to make it more efficient. Details can be found in the patent: https://patents.google.com/patent/WO2020188273A1/de
I read it – it takes a while, but it’s the only way to understand exactly what’s going on.
Since the coding is done with a lower resolution, it is computationally fast. Slower presets that allow better compression can be used and still code faster overall. Because upscaling is done with 2X2 or 4X4 blocks, upscaling can use many concurrent threads. In general, it adds little to decode time (with one exception that I’ll mention later).
Even with just one reduction (with 2 or 3 reductions I would expect better results) the performance was impressive: https://www.lcevc.org/wp-content/up…rmance-of-LCEVC-Meeting-MPEG-134 – May-2021.pdf
For 8k content you should use 3 downscales. The encoder can run the basic codec with 512p. At this resolution, the difference in codec performance versus improvement data is minimal – especially with the newer codecs. For example, the basic free EVC is only about 15% worse than the most efficient but complex VVC with all of its licensing problems. At the expense of increased processing, task-aware upscaling (TAU) and downscaling can be used, which allows minor corrections after upscaling and greater efficiency: https://cv.snu.ac.kr/research/taid/.
Experiments Samsung has done with its Scalenet technology (a version of Task Aware Upscaling) show that it’s very, very difficult to tell the difference (I heard it has a VMAF of almost 100 and PSNR.) . of anything under 40). Given that it’s hard to tell the difference between 8k and scaled down to 4k, the third layer isn’t going to be a lot of work – it will all be in the first and second layers. Since they are also using Task Aware upscaling for the other upscales, they will also be more efficient. The downside is the processing power required – but the processing power is still getting cheaper – it will likely happen. But even without super resolution, it is possible to reduce the bit rate by 40 to 50%. See: https://8kassociation.com/lcevc-licensing-offers-different-model-to-kickstart-8k-market/
VMAF 93 and above is generally considered indistinguishable from the original for all practical purposes, and use of H265 with LCEVC was achieved at around 20 mbs. 8K transmission is now possible at bit rates that almost everyone has available. With Per Scene Encoding, Netflix has made it even more efficient: https://hackaday.com/2020/09/16/dec…laining-optimized-shot-based-encoding-for-4k/
Make sure that even 20 Mbit / s are significantly reduced. Companies are working to integrate it with Content Adaptive Encoding (CAE), which lowers it even further: https://blog.beamr.com/2019/09/11/cabr-content-adaptive-rate-control/
Harmonic managed to achieve around 25 Mbit / s only with CAE 8K – without LCEVC. Work is currently underway to combine it with LCEVC. The combination of all these “tricks” means that the distribution of 8K content will no longer be an issue at all in the future. Even now it’s not a real problem – we need the content to make it worthwhile and a new infrastructure.
My biggest interest is how we can see the world so that what science tells us is intuitive.