VP9 is an open and royalty free video coding format.
GITHUB UPDATE

Efficiency

In Netflix's "Large-Scale Video Codec Comparison of x264, x265 and Libvpx for Practical VOD applications", libvpx came out 30% more efficient than x264 and 20% less efficient than x265 by Netflix's own VMAF metric. The comparison evaluated the slowest encoding speeds available for each encoder. The disparity between libvpx and x265 was much less on the SSIM metric (3%), which is consistent with previous findings that showed x265 to narrowly beat libvpx at the very highest quality (slowest encoding) whereas libvpx was superior at any other encoding speed, by SSIM.

In a subjective quality comparison conducted in 2014 featuring the reference encoders for HEVC (HM 15.0), MPEG-4 AVC/H.264 (JM 18.6), and VP9 (libvpx 1.2.0 with preliminary VP9 support), VP9, like H.264, required about two times the bitrate to reach video quality comparable to HEVC, while with synthetic imagery VP9 was close to HEVC. By contrast, another subjective comparison from 2014 concluded that at higher quality settings HEVC and VP9 were tied at a 40 to 45% bitrate advantage over H.264.

Performance

An encoding speed versus efficiency comparison of the reference implementation in libvpx, x264 and x265 was made by an FFmpeg developer in September 2015: By SSIM index, libvpx was mostly superior to x264 across the range of comparable encoding speeds, but the main benefit was at the slower end of x264@veryslow (reaching a sweet spot of 30–40% bitrate improvement within twice as slow as this), whereas x265 only became competitive with libvpx around 10 times as slow as x264@veryslow. It was concluded that libvpx and x265 were both capable of the claimed 50% bitrate improvement over H.264, but only at 10–20 times the encoding time of x264. Judged by the objective quality metric VQM in early 2015, the VP9 reference encoder delivered video quality on par with the best HEVC implementations.

A decoder comparison by the same developer showed 10% faster decoding for ffvp9 than ffh264 for same-quality video, or "identical" at same bitrate. It also showed that the implementation can make a difference, concluding that "ffvp9 beats libvpx consistently by 25–50%".

Another decoder comparison indicated 10–40 percent higher CPU load than H.264 (but does not say whether this was with ffvp9 or libvpx), and that on mobile, the Ittiam demo player was about 40 percent faster than the Chrome browser at playing VP9.

Comparison of encoding quality

VP9
MPEG-2
H.264

Hardware encoding/decoding support

The following chips, architectures, CPUs, GPUs and SoCs provide hardware acceleration of VP9. Some of these are known to have fixed function hardware, but this list also incorporates GPU or DSP based implementations – software implementations on non-CPU hardware. The latter category also serve the purpose of offloading the CPU, but power efficiency is not as good as the fixed function hardware (more comparable to well optimized SIMD aware software).

Intel Kaby Lake CPU family, Intel Apollo Lake CPU family, Nvidia Maxwell GM206 & Pascal GPU family have full fixed function VP9 hardware decoding for highest decoding performance and power efficiency.


Company Chip/Archi­tecture Enco­ding Deco­ding
AMD Polaris
Bristol Ridge
Stoney Ridge
ARM Mali-V61 ("Egil") VPU
AllWinner A80
Amlogic S9 family
HiSilicon HI3798C
Imagination PowerVR Series6
Intel Bay Trail
Merrifield
Moorefield
Skylake
Kaby Lake
MediaTek MT6595
MT8135
Helio X20/X25
Helio X30
NVIDIA Maxwell GM206
Pascal
Tegra X1
Qualcomm SnapDragon 820/821
Realtek RTD1295
Samsung Exynos 7 Octa 7420
Exynos 8 Octa 8890
Exynos 9 Octa 8895

Argon Streams VP9 Coverage Report

Technology

VP9 is a traditional block-based transform coding format. The bitstream format is relatively simple compared to formats that offer similar bitrate efficiency like HEVC.

VP9 has many design improvements compared to VP8. Its biggest improvement is support for the use of coding units of 64×64 pixels. This is especially useful with high-resolution video. Also the prediction of motion vectors was improved. In addition to the VP8's four modes (average/"DC", "true motion", horizontal, vertical), VP9 supports six oblique directions for linear extrapolation of pixels in intra-frame prediction.

New coding tools also include:

  • eighth-pixel precision for motion vectors;
  • three different switchable 8-tap subpixel interpolation filters;
  • improved selection of reference motion vectors;
  • improved coding of offsets of motion vectors to their reference;
  • improved entropy coding;
  • improved and adapted (to new block sizes) loop filtering;
  • the asymmetric discrete sine transform (ADST);
  • larger discrete cosine transforms (DCT, 16×16 and 32×32);
  • improved segmentation of frames into areas with specific similarities (e.g. fore-/background).

In order to enable some parallel processing of frames, video frames can be split along coding unit boundaries into up to four rows of 256 to 4096 pixels wide evenly spaced tiles with each tile column coded independently. This is mandatory for video resolutions in excess of 4096 pixels. A tile header contains the tile size in bytes so decoders can skip ahead and decode each tile row in a separate thread. The image is then divided into coding units called superblocks of 64×64 pixels which are adaptively subpartitioned in a quadtree coding structure. They can be subdivided either horizontally or vertically or both; square (sub)units can be subdivided recursively down to 4×4 pixel blocks. Subunits are coded in raster scan order: left to right, top to bottom.

Starting from each key frame, decoders keep 8 frames buffered to be used as reference frames or to be shown later. Transmitted frames signal which buffer to overwrite and can optionally be decoded into one of the buffers without being shown. The encoder can send a minimal frame that just triggers one of the buffers to be displayed ("skip frame"). Each inter frame can reference up to three of the buffered frames for temporal prediction. Up to two of those reference frames can be used in each coding block to calculate a sample data prediction, using spatially displaced (motion compensation) content from a reference frame or an average of content from two reference frames ("compound prediction mode"). The (ideally small) remaining difference (delta encoding) from the computed prediction to the actual image content is transformed using a DCT or ADST (for edge blocks) and quantized.

Something like a b-frame can be coded while preserving the original frame order in the bitstream using a structure named superframes. Hidden alternate reference frames can be packed together with an ordinary inter frame and a skip frame that triggers display of previous hidden altref content from its reference frame buffer right after the accompanying p-frame.

VP9 enables lossless encoding by transmitting at the lowest quantization level (q index 0) an additional 4×4-block encoded Walsh–Hadamard transformed (WHT) residue signal.

In container formats VP9 streams are marked with the FourCC VP90 (or in the future possibly VP91, ...) or VP09. In order to be searchable, raw VP9 bitstreams have to be contained either in Googles Matroska-derived WebM format (.webm) or the older minimalistic Indeo video file (IVF) format which is traditionally supported by libvpx.

Draft VP Codec ISO Media File Format Binding

Download: vp-codec-iso-media-file-format-binding-20160516-draft.pdf (226KB PDF)


Product Support

  • Microsoft EdgeMicrosoft announced in April 2016 that the Edge browser will support VP9 (and Opus).
  • WebRTCVP9 in WebRTC became available in Google Chrome 48 (stable) in January 2016, for both desktop and Android.
  • Google ChromeVP9 decode support was first enabled by default in Google Chrome 29 Dev channel (r206883) on 2013-06-26.
  • Mozilla FirefoxVP9 decode support was first added to Firefox Aurora (“pre-beta”) nightly builds on 2013-12-06.
  • VLCExperimental VP9 decode support was added to VLC in version 2.1.2.
  • FFMpeg / LibavSearch the FFMpeg codebase for recent libvpx-related commits