Ultra HD Forum Introduction to Ultra HD
Ultra HD Forum Guidelines
Red Book: Introduction to UltraHD
Ultra HD Forum
8377 Fremont Blvd., Suite 117
Fremont, CA 94538
UNITED STATES
VERSION 3.0.0
Apr 7, 2023
1. Foreword
This new version v3 of the Ultra HD Forum Guidelines provides a holistic view of modern media systems, their mechanisms and workflows, and how those are impacted by the latest generation of improvements – the “Ultra HD” technologies, those that take media beyond the limits established at the start of this millennia, characterized in large part by the video resolutions and the dynamic range offered for media in “high definition”, i.e., ITU-R Rec. BT.709. The Forum considers Ultra HD to not only be any UHD media (i.e., 4K resolution, or higher), but also HD-resolution media with enhancements such as High Dynamic Range, Wide Color Gamut, etc. Ultra HD is a constellation of technologies that can provide significant improvements in media quality and audience experience.
This work represents over eight years of collaborative effort. These new books would not have been possible without the leadership of Jim DeFilippis, who represents Fraunhofer and chairs our Guidelines Work Group with invaluable support from the co-chair, Pete Sellar of Xperi as well as technical assistance from Ian Nock of Fairmile West Consulting.
Our gratitude to all the companies listed in the Acknowledgments that have participated in this effort over the years and specifically to Nabajeet Barman (Brightcove), Andrew Cotton (BBC), Jean Louis Diascorn (Harmonic), Richard Doherty (Dolby), Chris Johns (Sky UK), Katy Noland (BBC), Bill Redmann (InterDigital), Chris Seeger (Comcast/NBCUniversal), and Alessandro Travaglini (Fraunhofer).
This document, Introduction to Ultra HD (Red Book), is one of a series of books, referred to as the Rainbow Books, that compose the Ultra HD Forum Guidelines. If any of these terms sound unfamiliar, follow the link below to the Black Book. If a particular standard is of interest, links such as the one above are available to take you to the White Book, where references are collected.
The Rainbow Books are, in their entirety:
White Book Guidelines Index and References
Red Book Introduction to Ultra HD
Orange Book Foundational Technologies for Ultra HD
Yellow Book Beyond Foundational Technologies
Green Book Ultra HD Distribution
Blue Book Ultra HD Production and Post Production
Indigo Book Ultra HD Technology Implementations
Violet Book Real World Ultra HD
Black Book Terms and Acronyms
Updates in this new version of the Ultra HD Forum Guidelines are described on the following page.
I hope you will enjoy reading today.
If you want to know more about Ultra HD, and join our discussions on how it can be deployed, I invite you to join the Ultra HD Forum. You can start by visiting our website: www.ultrahdforum.org.
Nandhu Nandhakumar, President, Ultra HD Forum
April 2023
1.1 Changes from version 2.6 to 3.0
What’s new in the Spring 2023 version of the UHDF Guidelines Red Book, Introduction to Ultra HD (v3.0), edited by Jim DeFilippis.
The Introduction to Ultra HD is the first of the series of Rainbow Books on the Guidelines for Ultra HD. The scope and purpose of this book is to describe the foundations of Ultra HD technology, including a discussion of SDR vs. HDR. Included is a brief description of additional Ultra HD Technologies such as Wide Color, Next Generation Audio (NGA) and High Frame Rate video (HFR).
While most of the information in this edition is material from the previous version of the Guidelines (v2.6), the information has been updated. The section on Wide Color has been expanded to include information from ITU-R Recommendation BT.2020. Each section includes references to more detailed information contained in the companion Rainbow Books.
We hope this new format will be helpful in understanding UHD technologies as well as planning for new or expanded Ultra HD services.
Jim DeFilippis and Pete Sellar,
Guidelines Working Group Co-Chairs, Ultra HD Forum, April 2023
2. Acknowledgements
We would like to provide the acknowledgement to all the member companies, past and present, of the Ultra HD Forum who have contributed in some small or large part to the body of knowledge that has been contributed to the Guidelines Rainbow Books, including the specific subject of this book.
ARRIS |
ATEME |
ATT DIRECTV |
British Broadcasting Corporation |
BBright |
Beamr |
Brightcove Inc. |
Broadcom |
B<>COM |
Comcast / NBC Universal LLC |
Comunicare Digitale |
Content Armor |
CTOIC |
Dolby |
DTG |
Endeavor Streaming |
Eurofins Digital Testing |
Fairmile West |
Fraunhofer IIS |
Harmonic |
Huawei Technologies |
InterDigital |
LG Electronics |
Mediakind |
MovieLabs |
NAB |
Nagra, Kudelski Group |
NGCodec |
Sky UK |
Sony Corporation |
Xperi |
Technicolor SA |
Verimatrix Inc. |
V-Silicon |
|
|
3. Notice
The Ultra HD Forum Guidelines are intended to serve the public interest by providing recommendations and procedures that promote uniformity of product, interchangeability and ultimately the long-term reliability of audio/video service transmission. This document shall not in any way preclude any member or nonmember of the Ultra HD Forum from manufacturing or selling products not conforming to such documents, nor shall the existence of such guidelines preclude their voluntary use by those other than Ultra HD Forum members, whether used domestically or internationally.
The Ultra HD Forum assumes no obligations or liability whatsoever to any party who may adopt the guidelines. Such an adopting party assumes all risks associated with adoption of these guidelines and accepts full responsibility for any damage and/or claims arising from the adoption of such guidelines.
Attention is called to the possibility that implementation of the recommendations and procedures described in these guidelines may require the use of subject matter covered by patent rights. By publication of these guidelines, no position is taken with respect to the existence or validity of any patent rights in connection therewith. Ultra HD Forum shall not be responsible for identifying patents for which a license may be required or for conducting inquiries into the legal validity or scope of those patents that are brought to its attention.
Patent holders who believe that they hold patents which are essential to the implementation of the recommendations and procedures described in these guidelines have been requested to provide information about those patents and any related licensing terms and conditions.
All Rights Reserved
© Ultra HD Forum 2023
4. Contents
1.1 Changes from version 2.6 to 3.0 4
7. Foundation Ultra HD Technologies 10
9. Additional Ultra HD Technologies 19
9.2. Next Generation Audio (NGA) 24
9.3 High Frame Rate Video (HFR) 27
5. List of Figures
|
6. List of Tables
List Of Tables |
7. Foundation Ultra HD Technologies
These guidelines describe a number of Ultra HD-related technologies. The Ultra HD Forum notes that some of these technologies can be considered “Foundation” Ultra HD technologies, which are core aspects of UltraHD content and services, such as High Dynamic Range (HDR) and Wide Color Gamut (WCG). The Ultra HD Forum also describes additional technologies, which service providers may find to be valuable enhancements or enablers to a core Ultra HD service, such as dynamic HDR metadata.
Foundation Ultra HD content and services are often those that were commercially available as early as 2016, and thus have reached a level of market adoption and maturity. For the purposes of this document, Foundation Ultra HD comprises the following characteristics:
- Resolution – greater than or equal to 1080p and lower than or equal to 2160p, (progressive format; BT.2100 [5]. BT.2100 [5]does not include interlaced formats)
- Wide Color Gamut – color gamut wider than BT.709 [2]
- High Dynamic Range – Image dynamic range that provides a dynamic range larger than SDR, capturing and displaying increased highlight and shadow details. Use of tone curves as referenced in BT. 2100 (HLG and PQ).
- Bit depth – 10-bit
- Frame rates – up to 60fps (integer frame rates are preferred; note that cinematic content may opt to use lower frame rates, e.g., see DCI specification [17])
- Audio – 5.1 or higher channel surround sound or channel-based Immersive Audio
- Closed Captions/Subtitles – CTA
708 [19]/608
[18], ETSI
300 743 [20],
ETSI 300 472 [21], SCTE-27 [22], IMSC1 [23]
The following terms are used in this document for HDR and HDR plus WCG:
- HLG: Hybrid Log-Gamma (HLG) OETF, EOTF, and OOTF transfer functions specified in BT.2100 [5]
- HLG10: HDR systems or content employing Hybrid Log-Gamma (HLG), the wide color gamut specified in BT.2100 and 10-bit depth
- PQ: HDR systems or content employing Perceptual Quantization (PQ), the wide color gamut specified in BT.2100, and 10-bit depth
- PQ10: HDR systems or content employing Perceptual Quantization (PQ), the wide color gamut specified in BT.2100, and 10-bit depth
- HDR10: HDR systems or content employing PQ10 and further including or capable of providing SMPTE ST 2086 [10], MaxFALL, and MaxCLL metadata
Table 1. Foundation Ultra HD Workflow Parameters
Content Creation & Mastering |
Defined and documented standard workflows for Live and Pre-recorded content |
Service Type |
Real-time Program Services; On-Demand content that was originally offered as Live content |
Network Type |
Unicast (including Adaptive Bit Rate) Broadcast, Multicast |
Transport/Container/Format |
MPEG TS, Multicast IP, DASH ISO BMFF |
Interface to TVs (source format) |
IP connected (for OTT content delivered via managed or unmanaged network) HDMI (for services delivered via a STB, e.g., OTT, MVPD) |
Backward Compatibility |
Native (HLG), simulcast (HDR10/PQ10), decoder based (optional) |
Table 2. Foundation Ultra HD Content Parameters
Spatial Resolution |
1080p1 or 2160p |
System Colorimetry |
|
Bit Depth |
10-bit |
Dynamic Range |
SDR, PQ, HLG |
Frame Rate2 |
24(23.976), 25, 30(29.97), 50, 60(59.94) |
Video Codec |
HEVC, Main 10 Profile, Level 5 or 5.1 (single layer)3 |
Audio Channels |
5.1 or higher channel surround sound or channel-based Immersive Audio |
Audio Codec |
AC-3, E-AC-3, DTS-HD, E-AC-3 + JOC, HE-ACC, AAC-LC |
Captions/Subtitles Coding |
CTA 608/708, ETSI 300 743, ETSI 300 472, SCTE-27, IMSC1 |
Table 2 Notes:
1. 1080p together with WCG and HDR fulfills certain use cases for Foundation Ultra HD services and is therefore considered to be an Ultra HD format for the purpose of this document. 1080p without WCG or HDR is considered to be an HD format. The possibility of 1080i or 720p plus HDR and WCG is not considered here. HDR and WCG for multiscreen resolutions may be considered in the future.
2. Fractional frame rates for 24, 30 and 60 fps are included for compatibility with current plant video clock reference, but ultimately not preferred. Fractional frame rates will be necessary during migration from legacy video systems. The lower frame rates may be common for cinematic content.
3. For use in China, the AVS2 codec, Main10 profile, is used in addition to HEVC. See Indigo Section 9.1.
For the purpose of this document, including the above constraint on 1080p content, various combinations of these Foundation Ultra HD parameters can be combined to produce Ultra HD content. Additional, non-Foundation technologies may also be employed.
The Foundation Ultra HD codec is HEVC[1] for distribution to consumers, due to its support for HDR/WCG (10-bit) as well as coding efficiency, which makes 4K content more feasible. AVC specifications have been updated to include support for 10-bit depth with BT.2020/2100 system colorimetry and support of PQ/HLG High Dynamic Range formats. However, the Ultra HD Forum finds that deployed consumer decoders are mostly hardware-based with no feasible mechanism to upgrade to match these latest updates, and thus these new AVC capabilities are not generally usable for distribution purposes. For AVC, consumer decoders generally support 8-bit 4:2:0 formatted content and normally with limitations with regard to the maximum bit rates that would normally preclude decoding 4K content at the compression rates that AVC can achieve.
Using AVC for production, contribution and mezzanine workflows is more viable because changing to encoders with support of the HDR/10-bit capabilities is something that is more easily accomplished.
In the Violet Book, Table 1 and Table 2 categorizes decoders and services in terms of Foundation Ultra HD capability. Additionally, Table 2 offers some indications of which types of decoders are compatible with which service formats. Foundation decoder and service formats provide the base encoding formats for a number of enhancement features which may be implemented without causing service incompatibility with Foundation supporting devices. This will normally be described in the sections covering the enhancement features.
8. SDR vs. HDR
HDR is sometimes touted as providing a significant improvement in the “immersivity” of an image, as compared to SDR, thus providing an improved sense of “being there”. In some degree, this stems from a coincidental match between a dynamic range that the human visual system (HVS) can instantly perceive and the dynamic range presentable with SDR displays, vs a greater dynamic range presented by HDR displays that requires adaptation of the HVS.
In Figure 1 the total dynamic range of the human visual system is represented along the horizontal axis[2]. Photographs of scenes representative of and exposed for portions of that range are presented[3]. The left portion, from 10 cd/m2 to down around 10-6 cd/m2 represents levels of scene luminance where scotopic (rod) vision is active. There is only one kind of rod in humans, characterized by a specific photopigment, which leads to our perception that “night vision” is monochromatic. Accordingly, scotopic vision is somewhat less interesting for entertainment purposes, for being limited to black and white. The right portion of the axis is the range over which photopic (cone) vision is active, from 0.1 to 108 cd/m2. Human cones are present in the typical individual in three distinct kinds, corresponding to three photopigments having peak sensitivities to either long, medium, or short wavelengths of visible light.
The overlap between scotopic and photopic vision amounts to about 14% (2 decades of 14) of the total human visual range and is called mesopic vision, where both rods and cones operate simultaneously, starting from a scene luminance where cones are just barely sensitive enough yet rods are still able to register an image. At the upper end of this overlap range, the scene light is too strong for scotopic vision and saturates the rods, consuming all of the photopigment available in them, a “bleaching” effect leaving them completely desensitized. The cones, being less sensitive than the rods, allow color perception from dawn to dusk. Still, at a scene brightness around 108 cd/m2, as might occur for one staring into thin clouds backed by the sun, the cones too, become saturated and desensitized by bleaching. Recovery from bleaching is not quick. Cones can take two to five minutes to recover, while rods can take 15 to 30 minutes to “dark adapt”.
Figure 1. Comparing Dynamic Range of SDR, HDR, and the Human Visual System
A more subtle adaptation takes place, helping to normalize the significant dynamic range of human vision for higher level visual functions, such as contrast detection, texture recognition, object identification, etc. At any given moment, the human visual system is adapted to a point along the entire visual range, and while adapted to that point, is able to “instantly” (within 100 ms) see details in a small subset of the range. (See the solid white and orange sigmoid curves in Figure 1).
The instantaneous adaptation curve in white represents the simultaneously detectable range of 0.1-100 cd/m2. This corresponds to the SDR range recommended for HD television, as indicated in BT.1886 and other standards.
In parallel to adaptation that occurs at a particular light level, the pupillary luminance response is able to adjust quickly at nearly all levels of adaptation. By constricting and dilating, the pupil admits more or less scene light into the eye, with a range of almost four stops (i.e., 24:1), though this luminance response is limited once adapted to very dark or very bright scenes[4]. Overall, the pupillary adjustment represents only a small portion of the overall dynamic range of the human visual system, but is able to quickly extend the what can be seen given the current adaptation.
Pupillary response to luminance has a latency of about 1/4 second, or more. After that, the pupil can contract 1 stop (cutting light entering the eye by one-half) in about 1/3 s. Dilation is slower, taking about 3 s to open 1 stop.
Pupillary response is further driven by attention. When looking attentively into a shadow, the pupil quickly expands to see more detail, even though the overall amount of light in the scene hasn’t changed. When looking at highlights, the pupil quickly contracts to better see texture. The overall adaptation needn’t change, which would take many seconds, but is quickly extended at sub-second speeds, by the pupil’s attention-based operation. This expanded range of dynamic adaptation is exploitable in HDR video, specified in BT.2100 to extend the luminance range both higher and lower. When properly presented, a greater sense of immersion is induced when HDR video presents details in different parts of an image that are not instantaneously visible, placing those details 400 ms or more away due to the need for dynamic attentive pupillary response and triggering involuntary responses more similar to those presented by the real world.
9. Additional Ultra HD Technologies
In addition to the Foundation Ultra HD technologies, the Ultra HD Forum provides guidance on a number of additional technologies that can enable or enhance Foundation Ultra HD content and services.
The Ultra HD Forum notes that these additional Ultra HD technologies can be “layered” onto Foundation Ultra HD technologies in order to upgrade the consumer experience. For example, 7.1+4 immersive audio can be transmitted instead of 5.1 surround sound and/or dynamic HDR metadata can be included with HDR10 content and/or dynamic HDR metadata can be included with HDR10-based. Content-aware encoding can be employed to increase encoding efficiency. Content producers and service providers may elect to implement one or more additional Ultra HD technologies according to market drivers, the capabilities of the end-to-end ecosystem and consumer preferences. See Beyond Foundational Technologies (Yellow Book).
9.1. Wide Color
As part of the UltraHD basket of new features is the adoption of ITU Rec BT.2020 [3] system colorimetry. This color system extends the gamut range of video imagery beyond HDTV color, as defined in ITU Rec BT.709 [2], and covers most of the spectrum of reflected colors, also known as Pointer’s Colors. The primaries are located close to the limits of the CIE locus of monochromatic colors and thus require narrow band emitters to reproduce the full range of Rec 2020 color (see Figure 2).
Table 3. System Colorimetry for ITU-R Rec BT.2020
System Colorimetry Parameter |
Values |
||
Opto-electronic transfer characteristics before pre-correction |
Assumed Linear1 |
||
Primary colours and reference white2 |
Chromaticity Coordinates (CIE, 1931) |
x |
y
|
Red Primary [R] |
0.708 |
0.292 |
|
Green Primary [G] |
0.170 |
0.797 |
|
Blue Primary [B] |
0.131 |
0.046 |
|
Reference White [D65] |
0.3127 |
0.3290 |
Table 3 Notes:
- PIcture information can be linearly indicated by the tristimulus values of RGB in the range of 0-1.
- The colorimetric values of the picture information can be determined based on the reference RGB primaries and the reference white.
Figure 2. CIE Diagram showing Color System Gamuts, Rec2020, Rec709, P3
In comparison to HDTV color gamut as well as Digital Cinema (P3) color gamut Rec 2020 gamut increases the number of colors possible to represent in Ultra HD. Further with the addition of HDR, there is an increase in the volume of colors that can be represented (see Figure 3 below).
Figure 3. Color Volumes, Rec 2020, and Rec 709
Note that Rec.2020 uses a SDR luminance curve with a gamma of 0.45. ITU-R Rec BT.2100 [5] defines the use of both a PQ and HLG luminance curves with BT.2020 color gamuts.
The encoding of BT.2020 uses either constant luminance or non-constant luminance formats. The non-constant luminance is similar to the format used in Rec 709, with Y’C’BC’R signals conveying the color and luminance information. The constant luminance format in Rec 2020 is not in wide use today. However another encoding format, that is a constant luminance approach as well, is ICTCp which is defined in ITU-R Rec BT.2100. ICTCp is derived from the RGB color values, converted to LMS space and then have either the EOTF-1 of PQ or the OETF of HLG applied. Final conversion from L’M’S’ to ICTCp values via a 3×3 matrix transform.
Table 4. Color Encoding for ITU-R Rec BT.2020
Table 5. Color Encoding for ICTCP
Parameter |
Values PQ |
Values HLG |
L,M,S Color Space |
L = (1688R + 2146G + 262B)/4096 M = (683R+2951G + 462B)/4096 S = (99R + 309G + 3688B)/4096 |
|
Derivation of L’, M’, S’ (see note 1) |
{L’,M’,S’} = EOTF-1 (FD) where FD = {LD,MD,SD} |
{L’,M’,S’} = OETF (E) where E = {LS, MS, SS} |
Derivation of I |
I = 0.5 L’ + 0.5 M’ |
|
Derivation of Color Difference Signals |
CT = (6610L’-1361M’+7003S’)/4096 CP = (17933L’-17390M’-543S’)/4096 |
CT = (3625L’-7465M’+3840S’)/4096 CP = (9500L’-9212M’-288S’)/4096 |
Table 5 Notes:
1- The subscripts D and S refer to display light and scene light, respectively.
9.2. Next Generation Audio (NGA)
Complementing the visual enhancements that Ultra HD will bring to consumers, Next Generation Audio (NGA) provides compelling new audio experiences:
- Immersive – An audio system that enables high spatial resolution in sound source localization in azimuth, elevation and distance, and provides an increased sense of sound envelopment
- Personalized – Enabling consumers to tailor and interact with their listening experience, e.g. selecting alternative audio experiences, switching between languages, enhancing dialogue intelligibility.
- Consistent – Playback experience automatically optimized for each consumer device, e.g. home and mobile
- Object-based Audio – Audio elements are programmed to provide sound from specific locations in space, irrespective of speaker location. By delivering audio as individual elements, or objects, content creators can simplify operations, reduce bandwidth, and provide a premium experience for every audience
- Scene-Based Audio – An arbitrarily large number of directional audio elements composing a 3D sound field are mixed in a fixed number of PCM signals according to the Higher-Order Ambisonics format. Once in the HOA format, the Audio Scene can be efficiently transmitted, manipulated, and rendered on loudspeaker layouts/headphones/soundbars.
- Flexible Delivery – NGA can be delivered to consumers over a number of different distribution platforms including terrestrial, cable, and satellite broadcast, IPTV, OTT, and mobile. It could also be delivered over a hybrid of broadcast and OTT
- Flexible Rendering – NGA can be experienced by consumers through headphones or speakers (e.g., TV speakers, home theater systems including ceiling speakers, sound bars) as shown in Figure 4.
Figure 4. NGA in the consumer domain
9.2.1 NGA Audio Codecs
There are three Next Generation Codecs described in the Guidelines, AC-4, DTS-UHD and MPEG-H. A quick overview of each system is described below.
See Indigo Book Section 8.1 (AC-4), Section 8.2 (DTS UHD) and Section 8.3 (MPEG-H)
9.2.2 Metadata in NGA
NGA codec systems have a rich set of audio metadata features and functions. Each codec has its own set of definitions; however, there is a common framework for audio metadata developed by the EBU called the Audio Definition Model (ADM) [61].
In general, there are three types of audio metadata:
- Descriptive – Provides information regarding the available audio program features (i.e., Channel configuration, alternate languages, VDS).
- Functional – Provides information regarding how the audio should be rendered or presented (i.e., preselections, object audio locations, loudness controls, downmix coefficients).
- Control – Allows for personalization and user preferences (e.g. Dialog Enhancement, language preference, program preselection)
Immersive programming requires generating and delivering dynamic metadata to playback devices. For immersive programming, object position and rendering control metadata are essential for enabling the optimum set of experiences regardless of playback device or application. This section provides an overview of these important metadata parameters and how they are utilized during the creative process.
9.2.3 Audio Rendering
Audio Rendering is the process of composing an Audio Preselection and converting all the Audio Program Components to a data structure appropriate for the audio outputs of a specific receiver. Rendering may include conversion of a Channel Set to a different channel configuration, conversion of Audio Objects to Channel Sets, conversion of Scene-based sets to Channel Sets, and/or applying specialized audio processing such as room correction or spatial virtualization. In addition, the application of Dialog Enhancement as well as Loudness Normalization are parts of the audio rendering functionality.
Some uses of audio rendering include support for Audio Description services, multiple languages and personalized audio features without having to create separate Complete Mixes for each additional audio service. There are two main concepts of personalized audio:
- Personalization selection – The bitstream may contain more than one Audio Preselection where each Audio Preselection contains pre-defined audio experiences (e.g., “home team” audio experience, multiple languages, etc.). A listener can choose the audio experience by selecting one of the Audio Preselections.
- Personalization control – Listeners can modify properties of the complete audio experience or parts of it (e.g., increasing the volume level of an Audio Element, changing the position of an Audio Element, etc.).
9.3 High Frame Rate Video (HFR)
For the purpose of this document, High Frame Rate (HFR) refers to frame rates of 100 fps or higher, including 100, 120/1.00[5] and 120, Standard Frame Rate (SFR) refers to frame rates of 60fps or lower, which are commonly used including 24/1.001, 24, 25, 30/1.001, 30, 50, 60/1.001 and 60.
According to a SMPTE/HPA paper authored by Mark Schubin[6], frame rates of 100, 120/1.001 and 120 fps add significant clarity to high motion video such as sports or action scenes. Schubin also notes that high dynamic range put new demands on temporal resolution. He notes that, “Viewers of HDR imagery sometimes report increased perception of motion judder… Increased frame rate, therefore, might be necessary to accompany HDR.”
Citing Schubin again, “… the [EBU[7]] found … that in going from 60 frames per second (fps) to 120 fps or from 120 fps to 240 fps — a doubling of the frame rate — it is possible to achieve a full grade of improvement.” Further, doubling the frame rate from 50/60 fps to 100/120 fps is a very efficient means of gaining that full grade of improvement when compared to going from 2K to 4K spatial resolution, as illustrated in Figure 4.
Figure 4. Bandwidth increases for various video format improvements compared to HD
HFR has been included in newer DTT television standards including ATSC 3.0 [54] and DVB [63]. As such, the Ultra HD Forum considers HFR to be viable Ultra HD technology that can be layered onto Foundation Ultra HD for DTT. Both systems include a backward compatibility mechanism that enables 50/60 fps decoders to render a 50/60 fps version of the content while 100/120 fps decoders render the full HFR experience. See Section 7.3 in the Yellow Book for more information about backward compatibility.
Deployments of HFR, 4K HFR may exceed the capabilities of some portions of the end-to-end ecosystem. For example, while HDMI 2.1 supports 4K 120 fps or 8K 60 fps, most production environment transport systems currently support only 2K with 100/120 fps. Although this is likely to change in the future, the Ultra HD Forum describes HFR with 2K spatial resolution in order to provide an HFR guideline for a full end-to-end system. The parameters for 2K HFR content are shown in Table 6.
Table 6. 2k High Frame Rate Content Parameters
HFR Parameters |
Value |
Frame Rate |
100 fps, 120 fps1 |
Spatial Resolution |
HD (1920×1080) |
Scan Type |
Progressive |
Dynamic Range |
SDR, HDR |
System Colorimetry |
ITU BT. 709, ITU BT. 2020 |
Bit Depth |
10 |
Distribution Codec |
HEVC Main 10 Level 5.1 |
HDMI Interface2 |
100 fps: at least v1.4 |
Table 6 Notes:
- Earliest HDMI interfaces that support 10-bit, 1920x1080p, SDR high frame rate. HDMI 1.4 also supports 120fps with 4:2:2 chroma subsampling. HDR support requires HDMI 2.0a for HDR 10 and 2.0b for HLG.
- Note that 120/1.001 may be used for backward compatibility; however, the Ultra HD Forum recommends using integer frame rates for all Ultra HD content whenever possible.
For more complete information on HFR , please see Section 7.3 in the Yellow Book, Beyond Foundational Technologies.
(End of Red Book)
[1] For use in China, the AVS2 codec, Main10 profile, may be used instead of HEVC. See Indigo Section 9.1.
[2] Timo Kunkel, Erik Reinhard, A Reassessment of the Simultaneous Dynamic Range of the Human Visual System, Proceedings of the 7th Symposium on Applied Perception in Graphics and Visualization, 2010 http://doi.acm.org/10.1145/1836248.1836251
[3] Photo credits to (top to bottom): Artur Tomaszewski, Caleb Mullins, Sterling Davis, Logan Weaver, Mike Erskine, Van Mendoza, and Thomas Ciszewski, all from Unsplash.com
[4] C.J.K. Ellis, The pupillary light reflex in normal subjects, British Journal of Ophthalmology, 1981, 65, 754-759
[5] Although 120/1.001 is considered an example of HFR, the Ultra HD Forum recommends using integer frame rates for all Ultra HD content whenever possible.
[6] “Higher Resolution, Higher Frame Rate, and Better Pixels in Context The Visual Quality Improvement Each Can Offer, and at What Cost”, SMPTE/HPA paper, Mark Schubin, 2014, https://www.smpte.org/publications/industry-perspectives/schubin-HPA2014