From Wikipedia, the free encyclopedia - View original article
High Efficiency Video Coding (HEVC) is a video compression standard, a successor to H.264/MPEG-4 AVC (Advanced Video Coding), currently under joint development by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) as ISO/IEC 23008-2 MPEG-H Part 2 and ITU-T H.265. MPEG and VCEG have established a Joint Collaborative Team on Video Coding (JCT-VC) to develop the HEVC standard. HEVC is said to improve video quality, double the data compression ratio compared to H.264/MPEG-4 AVC, and can support 8K UHD and resolutions up to 8192×4320.
In 2004, the ITU-T Video Coding Experts Group (VCEG) began significant study of technology advances that could enable creation of a new video compression standard (or substantial compression-oriented enhancements of the H.264/MPEG-4 AVC standard). In October 2004, various techniques for potential enhancement of the H.264/MPEG-4 AVC standard were surveyed. In January 2005, at the next meeting of VCEG, VCEG began designating certain topics as "Key Technical Areas" (KTA) for further investigation. A software codebase called the KTA codebase was established for evaluating such proposals. The KTA software was based on the Joint Model (JM) reference software that was developed by the MPEG & VCEG Joint Video Team for H.264/MPEG-4 AVC. Additional proposed technologies were integrated into the KTA software and tested in experiment evaluations over the next four years.
Two approaches for standardizing enhanced compression technology were considered: either creating a new standard or creating extensions of H.264/MPEG-4 AVC. The project had tentative names H.265 and H.NGVC (Next-generation Video Coding), and was a major part of the work of VCEG until its evolution into the HEVC joint project with MPEG in 2010.
The preliminary requirements for NGVC was the capability to have a bit rate reduction of 50% at the same subjective image quality compared to the H.264/MPEG-4 AVC High profile and computational complexity ranging from 1/2 to 3 times that of the High profile. NGVC would be able to provide 25% bit rate reduction along with 50% reduction in complexity at the same perceived video quality as the High profile, or to provide greater bit rate reduction with somewhat higher complexity.
The ISO/IEC Moving Picture Experts Group (MPEG) started a similar project in 2007, tentatively named High-performance Video Coding. An agreement of getting a bit rate reduction of 50% had been decided as the goal of the project by July 2007. Early evaluations were performed with modifications of the KTA reference software encoder developed by VCEG. By July 2009, experimental results showed average bit reduction of around 20% compared with AVC High Profile; these results prompted MPEG to initiate its standardization effort in collaboration with VCEG.
A formal joint Call for Proposals (CfP) on video compression technology was issued in January 2010 by VCEG and MPEG, and proposals were evaluated at the first meeting of the MPEG & VCEG Joint Collaborative Team on Video Coding (JCT-VC), which took place in April 2010. A total of 27 full proposals were submitted. Evaluations showed that some proposals could reach the same visual quality as AVC at only half the bit rate in many of the test cases, at the cost of 2×-10× increase in computational complexity; and some proposals achieved good subjective quality and bit rate results with lower computational complexity than the reference AVC High profile encodings. At that meeting, the name High Efficiency Video Coding (HEVC) was adopted for the joint project. Starting at that meeting, the JCT-VC integrated features of some of the best proposals into a single software codebase and a "Test Model under Consideration", and performed further experiments to evaluate various proposed features. The first working draft specification of HEVC was produced at the third JCT-VC meeting in October 2010. Many changes in the coding tools and configuration of HEVC were made in later JCT-VC meetings.
On May 25, 2012, the JCT-VC announced that an evaluation of HEVC proposals for Scalable Video Coding (SVC) would be held in October 2012. This will eventually lead to an amendment to HEVC that will add support for SVC.
The Draft International Standard of HEVC, based on the eighth working draft specification, was approved in July 2012. Per Fröjdh, Chairman of the Swedish MPEG delegation, believes that commercial products that support HEVC could be released in 2013.
On January 25, 2013, the ITU announced that HEVC had received first stage approval (consent) in the ITU-T Alternative Approval Process (AAP). The JCT-VC will continue to work on extensions for HEVC such as support for 12-bit video and 4:2:2/4:4:4 chroma subsampling. On the same day MPEG announced that HEVC had been promoted to Final Draft International Standard (FDIS) status in the MPEG standardization process.
The timescale for completing the HEVC standard is as follows:
On February 29, 2012, at the 2012 Mobile World Congress, Qualcomm demonstrated a HEVC decoder running on an Android tablet, with a Qualcomm Snapdragon S4 dual-core processor running at 1.5 GHz, showing H.264/MPEG-4 AVC and HEVC versions of the same video content playing side by side. In this demonstration HEVC showed almost a 50% bit rate reduction compared with H.264/MPEG-4 AVC.
On August 22, 2012, Ericsson announced that the world's first HEVC encoder, the Ericsson SVP 5500, would be shown at the upcoming International Broadcasting Convention (IBC) 2012 trade show. The Ericsson SVP 5500 HEVC encoder is designed for real-time encoding of video for delivery to mobile devices. On the same day, it was announced that researchers are planning to extend MPEG-DASH to support HEVC by April 2013.
On August 31, 2012, Allegro DVT announced two HEVC broadcast encoders called the AL1200 HD-SDI encoder and the AL2200 IP Transcoder. Allegro DVT says that hardware HEVC decoders shouldn't be expected before 2014 but that HEVC can be used earlier for applications that use software based decoding. At the IBC 2012 trade show Allegro DVT will demonstrate a HEVC delivery system based on the AL2200 IP Transcoder with a live IP video stream.
On September 2, 2012, Vanguard Software Solutions (VSS) announced a x86 PC software based HEVC encoder based on the Draft International Standard that was designed for real time performance. The Vanguard HEVC encoder will be available later this year and will be shown at the IBC 2012 trade show. On September 9, 2012, VSS demonstrated that their real time HEVC software encoder could encode 1080p (1920×1080) at 30 frames per second (fps) video using a single Intel Xeon processor.
On September 6, 2012, Rovi Corporation announced that a MainConcept SDK for HEVC would be released in early 2013 shortly after HEVC is officially ratified. The HEVC MainConcept SDK will include a decoder, encoder, and transport multiplexer for Microsoft Windows, Mac OS, Linux, iOS, and Android. The HEVC MainConcept SDK encoder was demonstrated at the IBC 2012 trade show.
On September 7, 2012, Envivio Inc. first demonstrated its next-generation HEVC codec capabilities at IBC in Amsterdam, showing a technology demo of video quality comparable to AVC (H.264) at half the bit-rate. Envivio Muse™ software-based encoders are designed to support HEVC via software upgrade in the future.
On September 9, 2012, ATEME demonstrated at the IBC 2012 trade show a HEVC encoder that encoded video with a resolution of 3840×2160p at 60 fps with an average bit rate of 15 Mbit/s. ATEME is planning to release their HEVC encoder in October 2013.
On January 3, 2013, Allegro DVT announced that they will show HEVC video hardware decoder IP at the 2013 International CES. The HEVC decoder IP can be used on FPGA and SoC with support for up to 4K resolution. The HEVC decoder IP is compliant with the HM 9.1 reference software and will be made compliant with the final standard after it is released.
On January 7, 2013, ViXS Systems announced that they will show the first hardware SoC capable of transcoding video to the Main 10 profile of HEVC at the 2013 International CES. On the same day Rovi Corporation announced that after the HEVC standard is released that they plan to start adding support for HEVC to their MainConcept SDK and to their DivX products.
On January 8, 2013, Broadcom announced the BCM7445 which is an Ultra HD decoding chip capable of decoding HEVC at up to 4096×2160p at 60 fps. The BCM7445 is a 28 nm ARM architecture chip capable of 21,000 Dhrystone MIPS with volume production estimated for the middle of 2014. On the same day Vanguard Video announced the release of the V.265 which is a professional HEVC software encoder.
On January 30, 2013, Elemental Technologies, Inc. announced its implementation of HEVC/H.265 encoding. Video processing solutions from Elemental will offer support for the HEVC/H.265 standard via a software upgrade. Elemental first demonstrated H.265 encoding at IBC in September, 2012 in a side-by-side demonstration with AVC/H.264. Elemental will demonstrate concurrent encoding of MPEG-2, H.264/MPEG-4 AVC, and HEVC/H.265 on a single system at the NAB Show in April 2013.
On February 4, 2013, NTT DoCoMo announced that starting in March it will begin licensing its implementation of HEVC decoding software. The decoding software can allow playback of 4K UHDTV at 60 fps on personal computers and 1080p on smartphones and will be demonstrated at the 2013 Mobile World Congress. In a JCT-VC document NTT DoCoMo showed that their HEVC software decoder could decode 3840x2160 at 60 fps using 3 decoding threads on a 2.7 GHz quad core Ivy Bridge CPU.
On February 11, 2013, researchers from MIT demonstrated the world's first published HEVC ASIC Decoder at the International Solid-State Circuits Conference (ISSCC) 2013. Their chip was capable of decoding a 3840×2160p at 30 fps video stream in real time consuming under 0.1W of power.
On March 14, 2013, Ittiam Systems announced the immediate availability of its real-time HD HEVC encoder and decoder solutions which were demonstrated at CES 2013 and MWC 2013 and will be demonstrated at NAB 2013. The x86 based encoder running on a multi-core Intel Xeon server class processor is targeted at the broadcast encoding market. The decoder is an optimized multi-core ARM (Cortex A7/A9/A15 cores with Neon acceleration) implementation designed for smartphones, set-top boxes, tablets, and Smart TVs which has been demonstrated on the next generation Qualcomm Snapdragon S800.
The design of most video coding standards is primarily aimed at having the highest coding efficiency. Coding efficiency is the ability to encode video at the lowest possible bit rate while maintaining a certain level of video quality. There are two standard ways to measure the coding efficiency of a video coding standard which is to use an objective metric, such as peak signal-to-noise ratio (PSNR), or to use subjective assessment of video quality. Subjective assessment of video quality is the most important way to measure a video coding standard since humans perceive video quality subjectively.
HEVC benefits from the use of larger Coding Tree Block (CTB) sizes. This has been shown in PSNR tests with a HM-8.0 HEVC encoder where it was forced to use progressively smaller CTB sizes. For all test sequences when compared to a 64×64 CTB size it was shown that the HEVC bitrate increased by 2.2% when forced to use a 32×32 CTB size and increased by 11.0% when forced to use a 16×16 CTB size. In the Class A test sequences, where the resolution of the video was 2560×1600, when compared to a 64×64 CTB size it was shown that the HEVC bitrate increased by 5.7% when forced to use a 32×32 CTB size and increased by 28.2% when forced to use a 16×16 CTB size. The tests showed that large CTB sizes become even more important for coding efficiency with higher resolution video. The tests also showed that it took 60% longer to decode HEVC video encoded at 16×16 CTB size than at 64×64 CTB size. The tests showed that large CTB sizes increase coding efficiency while also reducing decoding time.
The HEVC Main Profile (MP) has been compared in coding efficiency to H.264/MPEG-4 AVC High Profile (HP), MPEG-4 Advanced Simple Profile (ASP), H.263 High Latency Profile (HLP), and H.262/MPEG-2 Main Profile (MP). The video encoding was done for entertainment applications and twelve different bitrates were made for the nine video test sequences with a HM-8.0 HEVC encoder being used. Of the nine video test sequences five were at HD resolution while four were at WVGA (800×480) resolution. The bit rate reductions for HEVC were determined based on PSNR.
|Video coding standard||Average bit rate reduction compared to|
|H.264/MPEG-4 AVC HP||MPEG-4 ASP||H.263 HLP||H.262/MPEG-2 MP|
|H.264/MPEG-4 AVC HP||-||44.5%||46.6%||55.4%|
HEVC MP has also been compared to H.264/MPEG-4 AVC HP for subjective video quality. The video encoding was done for entertainment applications and four different bitrates were made for nine video test sequences with a HM-5.0 HEVC encoder being used. The subjective assessment was done at an earlier date than the PSNR comparison and so it used an earlier version of the HEVC encoder that had slightly lower performance. The bit rate reductions were determined based on subjective assessment using mean opinion score values. The overall subjective bitrate reduction for HEVC MP compared to H.264/MPEG-4 AVC HP was 49.3%.
École Polytechnique Fédérale de Lausanne (EPFL) did a study to evaluate the subjective video quality of HEVC at resolutions higher than HDTV. The study was done with three videos with resolutions of 3840×1744 at 24 fps, 3840×2048 at 30 fps, and 3840×2160 at 30 fps. The five second video sequences showed people on a street, traffic, and a scene from the open source computer animated movie Sintel. The video sequences were encoded at five different bitrates using the HM-6.1.1 HEVC encoder and the JM-18.3 H.264/MPEG-4 AVC encoder. The subjective bit rate reductions were determined based on subjective assessment using mean opinion score values. The study compared HEVC MP with H.264/MPEG-4 AVC HP and showed that for HEVC MP the average bitrate reduction based on PSNR was 44.4% while the average bitrate reduction based on subjective video quality was 66.5%.
HEVC was designed to substantially improve coding efficiency compared to H.264/MPEG-4 AVC HP, i.e. to reduce bitrate requirements by half with comparable image quality, at the expense of increased computational complexity. Depending on the application requirements HEVC encoders can trade off computational complexity, compression rate, robustness to errors, and encoding delay time. Two of the key features where HEVC was improved compared to H.264/MPEG-4 AVC was support for higher resolution video and improved parallel processing methods.
HEVC is targeted at next-generation HDTV displays and content capture systems which feature progressive scanned frame rates and display resolutions from QVGA (320×240) to 4320p (8192×4320), as well as improved picture quality in terms of noise level, color gamut, and dynamic range.
The HEVC video coding layer uses the same "hybrid" approach used in all modern video standards, starting from H.261, in that it uses inter-/intra-picture prediction and 2D transform coding. A HEVC encoder first proceeds by splitting a picture into block shaped regions for the first picture, or the first picture of a random access point, which uses intra-picture prediction. Intra-picture prediction is when the prediction of the blocks in the picture is based only on the information in that picture. For all other pictures inter-picture prediction is used in which prediction information is used from other pictures. After the prediction methods are finished and the picture goes through the loop filters the final picture representation is stored in the decoded picture buffer. Pictures stored in the decoded picture buffer can be used for the prediction of other pictures.
HEVC was designed with the idea that progressive scan video would be used and no coding features are present specifically for interlaced video. HEVC instead sends meta-stream data that tells how the interlaced video is sent. Interlaced video may be sent either by coding each field as a separate picture or by coding each frame as a different picture. This allows interlaced video to be sent with HEVC without needing special interlaced decoding processes to be added to HEVC decoders.
HEVC replaces macroblocks, which were used with previous standards, with a new coding scheme that uses larger block structures of up to 64×64 pixels and can better sub-partition the picture into variable sized structures. HEVC initially divides the picture into coding tree units (CTUs) which are then divided for each luma/chroma component into coding tree blocks (CTBs). A CTB can be 64×64, 32×32, or 16×16 with a larger block size usually increasing the coding efficiency. CTBs are then divided into coding units (CUs). The arrangement of CUs within a CTB is known as a quadtree since a subdivision results in four smaller regions. CUs are then divided into prediction units (PUs) of either intra-picture or inter-picture prediction type which can vary in size from 64×64 to 4×4 (prediction units coded using 2 reference blocks, known as bipredictive coding, are limited to 8×4 or 4×8 so as to save on memory bandwidth). The prediction residual is then coded using transform units (TUs) which contain coefficients for spatial block transform and quantization. A TU can be 32×32, 16×16, 8×8, or 4×4.
At the July 2012 HEVC meeting it was decided, based on proposal JCTVC-J0334, that HEVC level 5 and higher would be required to use CTB sizes of either 32×32 or 64×64. This was added to HEVC in the Draft International Standard as a level limit for the Log2MaxCtbSize variable. Log2MaxCtbSize was renamed CtbSizeY in the October 2012 HEVC draft.
Internal bit depth increase (IBDI) allows for pictures to be internally processed at a bit depth that is higher than the bit depth they are encoded at. IBDI can be done at up to 14-bits and is processed at that bit depth up until the point where the pictures are fed into the loop filters.
HEVC uses a context-adaptive binary arithmetic coding (CABAC) algorithm that is fundamentally similar to CABAC in H.264/MPEG-4 AVC. CABAC is the only entropy encoder method that is allowed in HEVC while there are two entropy encoder methods allowed by H.264/MPEG-4 AVC. CABAC in HEVC was designed for higher throughput. For instance, the number of context coded bins have been reduced by 8x and the CABAC bypass-mode has been improved in terms of its design to increase throughput. Another improvement with HEVC is that the dependencies between the coded data has been changed to further increase throughput. Context modeling in HEVC has also been improved so that CABAC can better select a context that increases efficiency when compared to H.264/MPEG-4 AVC.
HEVC specifies 33 directional modes for intra prediction compared to the 8 directional modes for intra prediction specified by H.264/MPEG-4 AVC. HEVC also specifies planar and DC intra prediction modes. The intra prediction modes use data from neighboring prediction blocks that have been previously decoded.
HEVC uses half-sample or quarter-sample precision with a 7-tap or 8-tap filter while in comparison H.264/MPEG-4 AVC uses half-sample precision and a 6-tap filter. For 4:2:0 video chroma is filtered with eighth-sample precision and a 4-tap filter while in comparison H.264/MPEG-4 AVC uses a 2-tap filter. Weighted prediction in HEVC can be either uni-prediction in which a single prediction value is used or bi-direction in which the prediction values from two prediction blocks are used.
HEVC defines a signed 16-bit range for both horizontal and vertical motion vectors (MVs). This was added to HEVC at the July 2012 HEVC meeting with the mvLX variables. HEVC horizontal/vertical MVs have a range of -32768 to 32767 which given the quarter pixel precision used by HEVC allows for a MV range of -8192 to 8191.75 luma samples. This compares to H.264/MPEG-4 AVC which allows for a horizontal MV range of -2048 to 2047.75 luma samples and a vertical MV range of -512 to 511.75 luma samples.
HEVC allows for two MV modes which are Advanced Motion Vector Prediction (AMVP) and merge mode. AMVP uses data from the reference picture and can also use data from adjacent prediction blocks. The merge mode allows for the MVs to be inherited from neighboring prediction blocks. Merge mode in HEVC is similar to “skipped” and “direct” motion inference modes in H.264/MPEG-4 AVC but with two improvements. The first improvement is that HEVC uses index information to select one of several available candidates. The second improvement is that HEVC uses information from the reference picture list and reference picture index.
HEVC specifies four transform units (TUs) sizes of 4×4, 8×8, 16×16, and 32×32 to code the prediction residual. A CTB may be recursively partitioned into 4 or more TUs. TUs use integer basis functions that are similar to the discrete cosine transform (DCT). In addition 4×4 luma transform blocks that belong to an intra coded region are transformed using an integer transform that is derived from discrete sine transform (DST). This provides a 1% bit rate reduction but was restricted to 4×4 luma transform blocks due to marginal benefits for the other transform cases. Chroma uses the same TU sizes as luma so there is no 2×2 transform for chroma.
HEVC specifies two loop filters that are applied in order with the deblocking filter (DBF) applied first and the sample adaptive offset (SAO) filter applied afterwards. Both loop filters operate during the inter-picture prediction loop.
The DBF is similar to the one used by H.264/MPEG-4 AVC but with a simpler design and better support for parallel processing. In HEVC the DBF only applies to a 8×8 sample grid while with H.264/MPEG-4 AVC the DBF applies to a 4×4 sample grid. DBF uses a 8×8 sample grid since it causes no noticeable degradation and significantly improves parallel processing because the DBF no longer causes cascading interactions with other operations. Another change is that HEVC only allows for three DBF strengths of 0 to 2. HEVC also requires that the DBF first apply horizontal filtering for vertical edges to the picture and only after that does it apply vertical filtering for horizontal edges to the picture. This allows for multiple parallel threads to be used for the DBF.
The SAO filter is applied after the DBF and is made to allow for better reconstruction of the original signal amplitudes by using offsets from a transmitted look up table. Per CTB the SAO filter can be disabled or applied in one of two modes: edge offset mode or band offset mode. The edge offset mode operates by comparing the value of a sample to two of its eight neighbors using one of four directional gradient patterns. Based on a comparison with these two neighbors, the sample is classified into one of five categories: minimum, two types of edges, maximum, or neither. For each of the first four categories an offset is applied. The band offset mode applies an offset based on the amplitude of a single sample. The sample is categorized by its amplitude into one of 32 bands. Offsets are specified for four consecutive of the 32 bands, because in flat areas which are prone to banding artifacts, samples amplitudes tend to be clustered in a small range. The SAO filter was designed to increase picture quality, reduce banding artifacts, and reduce ringing artifacts.
The January 2013 HEVC draft defines three profiles: Main, Main 10, and Main Still Picture. It also contains provisions for additional profiles. Future extensions that are being discussed for HEVC include increased bit depth, 4:2:2/4:4:4 chroma subsampling, Multiview Video Coding (MVC), and SVC. The first version of HEVC is expected in January 2013 with HEVC range extensions expected in January 2014.
A profile is a defined set of coding tools that may be used to create a bitstream that conforms to that profile. An encoder for a profile may choose which features to use as long as it generates a conforming bitstream while a decoder for a profile must support all features that can be used in that profile. Current HEVC profiles have the following constraints:
The Main profile allows for a bit depth of 8-bits per color.
The Main 10 profile allows for a bit depth of 8-bits to 10-bits per color. A higher bit depth allows for a greater number of colors. The Main 10 profile allows for improved video quality since it can support video with a higher bit depth than what is supported by the Main profile.
The Main 10 profile was added at the October 2012 HEVC meeting based on proposal JCTVC-K0109 which proposed that a 10-bit profile be added to HEVC for consumer applications. The proposal stated that this was to allow for improved video quality and to support the Rec. 2020 color space that will be used by UHDTV. A variety of companies supported the proposal which included ATEME, BBC, BSkyB, CISCO, DirecTV, Ericsson, Motorola Mobility, NGCodec, NHK, RAI, ST, SVT, Thomson Video Networks, Technicolor, and ViXS Systems.
The Main Still Picture profile allows for a single still picture to be encoded with the same constraints as the Main profile. An objective performance comparison was done in April 2012 in which HEVC reduced the average bit rate for images by 56% compared to JPEG. A PSNR based performance comparison for still image compression was done in May 2012 using the HEVC HM 6.0 encoder and the reference software encoders for the other standards. For still images HEVC reduced the average bit rate by 15.8% compared to H.264/MPEG-4 AVC, 22.6% compared to JPEG 2000, 30.0% compared to JPEG XR, 31.0% compared to WebP, and 43.0% compared to JPEG.
A performance comparison for still image compression was done in January 2013 using the HEVC HM 8.0rc2 encoder, Kakadu version 6.0 for JPEG 2000, and IJG version 6b for JPEG. The performance comparison used PSNR for the objective assessment and mean opinion score (MOS) values for the subjective assessment. The subjective assessment used the same test methodology and images as those used by the JPEG committee when it evaluated JPEG XR. For 4:2:0 chroma subsampled images the average bit rate reduction for HEVC compared to JPEG 2000 was 20.26% for PSNR and 30.96% for MOS while compared to JPEG it was 61.63% for PSNR and 43.10% for MOS.
|Still image coding|
standard (test method)
|Average bit rate reduction compared to|
A HEVC performance comparison for still image compression was done in January 2013 by Nokia. HEVC has a larger performance improvement for higher resolution images than lower resolution images. For lossy compression it took on average 2.2× bit rate for JPEG over HEVC to code the same image at similar quality.
The January 2013 HEVC draft defines two tiers, Main and High, and thirteen levels. A level is a set of constraints for a bitstream. For levels below level 4 only the Main tier is allowed. The Main tier is a lower tier than the High tier. The tiers were made to deal with applications that differ in terms of their maximum bit rate. The Main tier was designed for most applications while the High tier was designed for very demanding applications. A decoder that conforms to a given tier/level is required to be capable of decoding all bitstreams that are encoded for that tier/level and for all lower tiers/levels.
|Level||Max luma sample rate|
|Max luma picture size|
|Max bit rate for Main and Main 10 profiles (kbit/s)||Example picture resolution @|
highest frame rate[A]
|Main tier||High tier|