MPEG-H Audio is a new, next-generation audio technology providing more realism through sound from above as well as around the listener. With its unique personalization features, MPEG-H Audio offers viewers great flexibility to actively engage with the content and adapt it to their own preferences. Regardless of the device, the MPEG-H Audio System delivers the best sound experience possible.
MPEG-H Audio is a complete audio solution and much more than just a codec. Among others, it offers the following big advantages compared to legacy audio codecs:
1) Immersive Sound: MPEG-H Audio allows the transmission of three-dimensional immersive audio (3D-audio) by adding elevated sound sources above and below the listeners position. MPEG-H Audio has been specifically designed for flexible loudspeaker signaling including traditional layouts such as stereo, 5.1, 7.1, as well as 3D configurations, namely 5.1+4H, 7.1+4H or 22.2 or even yet to be defined layouts. Within MPEG-H Audio, immersive sound can be carried as channels, objects or as a combination of those.
2) Interactive and Personalized Sound: MPEG-H Audio enables the listener to interact with the content and create personalized audio experiences. The advanced interactivity options range from simple adjustments, for example, increasing or decreasing the dialogue level in relation to other audio elements, to advanced scenarios in which audio elements may be selected and adjusted in level and/or position as preferred by the listener and under the limits authored by the content creator.
3) Universal Delivery: MPEG-H offers flexibility by delivering of the same bit stream through different distribution platforms (e.g., terrestrial, satellite, broadband or mobile networks) to all types of devices (e.g., TV set, AVR, soundbar, set-top box, tablet, virtual reality gears with 360-degree video) in various environments, for example, living room, home theater, or noisy mobile environments.
MPEG-H Audio is an international standard developed by the ISO/IEC Moving Picture Experts Group (MPEG), the organisation which has a long history in audio coding with mp3 and the AAC codec family. The MPEG-H Audio standard (ISO/IEC 23008-3) specifies two relevant profiles – Low Complexity (LC) and Baseline (BL) – essential for the broadcast and streaming industry, which allow decoding and rendering of immersive, 3D-audio content while enabling advanced personalization features. Audio objects may be used alone or in combination with channels for efficient delivery and reproduction of immersive sound. The use of these audio objects allows for interactivity or personalization of a program by adjusting the gain or position of the objects during playback. Details about the MPEG-H Audio standard can be found here
MPEG-H Audio is a complete audio solution. It does not use other audio codecs, its codec functionality builds upon the developments from previous generations of MPEG audio codecs such as the AAC codec family instead.
MPEG-H Audio enriches the audio experience by combining immersive sound and advanced personalization options with bit rate efficient, universal delivery to meet requirements of today’s consumer needs.
The MPEG-H Audio System has proven to be the most advanced audio solution for enhancing the broadcast and streaming services for sport events, empowering the audience to experience the emotion of the sports arena in their living room and to decide what is more important for themselves, for example, listening only to the crowd of their favorite team or focus on the commentary. Read more here and here
Similarly to sport events, streaming of live concerts is another major use case where service providers are eager to enhance the their services with immersive sound and interactivity options. Read more here and here
The advanced accessibility features of the MPEG-H Audio system are essential for the elderly and visually or hearing impaired audience. With its Dialog Enhancement and advanced Audio Description Services, MPEG-H Audio makes broadcast audio more accessible for all viewers.
MPEG-H was adopted in several broadcast, streaming and virtual reality standards. A list can be found at the end of this page.
MPEG-H Audio powers the music format 360 Reality Audio, initiated by Sony. The first 360 Reality Audio immersive music streaming services from Amazon Music HD, Deezer, nugs.net, Sony Select and TIDAL have launched in fall of 2019 with currently more than 3000 songs available. Major Labels supporting the 360RA initiative include Sony Music Entertainment, Universal Music, and Warner Music.
The MPEG-H Audio System is used as the sole audio system in the world’s first terrestrial UHD TV service in South Korea. Launch of the system was in May 2017 and commercial services from KBS, MBC and SBS are on-the-air 24/7 since then.
A growing number of devices support MPEG-H Audio, like the Sennheiser Ambeo sound bar, Audio-Video-Receivers from Denon, Marantz and McIntosh, the Amazon Echo Studio smart speaker or the Google ChromeCast Ultra 4K, as well as TV sets from Samsung and LG for the UHD TV service in South Korea.
Because of the flexibility of MPEG-H Audio when it comes to signal configurations, there is no simple answer to that question, as the bitrate depends on the number of signals (channel signals or object signals). With an increasing number of signals in a configuration, the efficiency of the codec increases and the resulting total bitrate is smaller than the sum of single-encoded signals.
The following table indicates bitrates for some common channel configurations resp. a combination of channel and object signals, starting with stereo and 5.1 surround to several 3D configurations (indicated by “H” for the height channels) and combinations of 3D channel configurations and different numbers of object signals.
All given examples use a total number 16 or less signals that is covered by “Level 3” in the MPEG-H Audio standard, except for the last configuration, “22.2”, that is covered by “Level 4”.
|Bit rates in kbit/s for||Good||Excellent||Transparent|
|7.1+4H/5.1+4H + 2 Objects||256 – 288||384 – 420||512 – 576|
|7.1+4H + 3 Objects/5.1+4H + 5 Objects||352 – 384||480 – 576||640 – 768|
Scale according to MUSHRA Recommendation ITU-R BS. 1534-3
Existing broadcast services that use AAC/HE-AAC stereo or surround audio, can be enhanced with the advanced MPEG-H Audio features by simply adding an additional MPEG-H Audio stream in the multiplex. All audio and video broadcast encoders that support MPEG-H Audio can create a multiplex containing the AAC stream as well as the MPEG-H Audio stream. The former can be decoded by legacy receivers and the latter will be decoded by newer receivers.
MPEG-H Audio enabled devices natively offer a “User Interface” which displays all the interactivity options enabled by an MPEG-H stream. Based on the content creator’s intentions, for each MPEG-H stream, different interactivity options might be offered to the viewers at home and through the User Interface they have the freedom to personalise their content.
An MPEG-H Audio scene comprises the audio content itself together with additional metadata. This metadata is created during production and contains all necessary information to render the audio content in arbitrary reproduction layouts and to ensure the best audio experience on any platform.
MPEG-H Audio has been carefully designed for enhancing broadcast, streaming and immersive music applications. To ensure the integrity of metadata in an SDI-based environment at any production step, the metadata is delivered in the “Control Track”. The Control Track is a “time-code like” audio signal and can be treated as a regular audio channel. This ensures the synchronization of metadata with its corresponding audio and video signals. The Control Track is robust enough to survive A/D and D/A conversions, level changes, sample rate conversions or frame-wise editing. The Control Track does not force audio equipment to be put into data mode or non-audio mode in order to pass through.
An MPEG-H Master carries all the uncompressed audio content and production metadata of the MPEG-H Audio scene. An MPEG-H Master can either be a Broadcast Wave Format File carrying Audio Definition Model metadata compliant to the MPEG-H Profile (MPEG-H BWF/ADM) or an MPEG-H Production Format (MPF) file carrying the metadata inside an MPEG-H Control Track.
The MPEG-H Control Track is a unique solution for delivering the metadata aligned with the audio and video data though existing SDI-based infrastructures. The Control Track is as a “time-code like” PCM audio signal that can be carried on an extra SDI or wave-file channel. It can be edited in a video editor just as any other audio signal.
It allows transport of the metadata tightly coupled with the audio content over any medium offering transport of PCM data, such as SDI, MADI, or AoIP. The Control Track can be treated like any other audio signal and is robust against sample rate conversions or level changes. The metadata contained in the Control Track is aligned to the audio and video data, thus any configuration change in live or post production can be applied at every video frame boundary.
The MPEG-H Production Format (MPF) is a multi-channel PCM audio file which contains all the audio content and production metadata of the MPEG-H Audio scene. The metadata is stored as a Control Track, which is a timecode-like PCM audio signal and one of the audio tracks in the multichannel wave-file.
The Audio Definition Model (ADM) according to ITU-R BS.2076 defines an open metadata format for production, exchange and archiving of next-generation audio (NGA) content in file-based workflows. Its comprehensive metadata syntax allows describing many types of audio content including channel-, object-, and scene-based representations for immersive and interactive audio experiences. A serial representation of the Audio Definition Model (S-ADM) is specified in ITU-R BS.2125 and defines a segmentation of the original ADM for use in linear workflows such as real-time production for broadcasting and streaming applications.
The MPEG-H ADM Profile defines constraints on ITU-R BS.2076 and ITU-R BS.2125 that enable interoperability with established NGA content production and distribution systems for MPEG-H Audio as defined in ISO/IEC 23008-3.
The freely available Fraunhofer ADM Info Tool is a software utility that provides support in creating profile-conform ADM metadata. Its conformance check framework runs input ADM metadata against an exhaustive set of checks derived from the MPEG-H ADM Profile, gathering detailed reports of any encountered conformance issues and providing information on how to resolve them.
With the MPEG-H Conversion Tool, Fraunhofer offers a simple one-click solution for converting existing Dolby Atmos BWF/ADM files into the MPEG-H Production Format. The tool is available as part of the MPEG-H Authoring Suite (MAS).
Fraunhofer IIS offers Production Tools, bundled in the MPEG-H Authoring Suite. The suite consists of the MPEG-H Authoring Plug-in (MHAPi), the standalone MPEG-H Authoring Tool (MHAT) and the MPEG-H Conversion Tool (MCO).
Register here for a download of the MPEG-H Authoring Suite
Other options for producing MPEG-H include the New Audio Technology Spatial Audio Designer and Blackmagic DaVinci Resolve Studio for post-production workflows, as well as the Linear Acoustic AMS and the Jünger MMA Hardware for live production with MPEG-H Audio.
The MPEG-H Authoring Suite (MAS) is a set of tools that make the production of MPEG-H Audio content easier, faster, more intuitive, and more powerful. They support the recently published MPEG-H ADM Profile, as well as binaural monitoring for immersive audio reproduction over headphones.
The MPEG-H Authoring Plug-in (MHAPi) takes you through all the steps of creating object- or channel-based MPEG-H Audio productions inside a VST3- or AAX-enabled digital audio workstation (DAW). You will be able to export your immersive and interactive MPEG-H Audio scenes to either MPEG-H Production Format (MPF) or MPEG-H BWF/ADM, containing audio and metadata and ready for distribution via MPEG-H-enabled channels.
The MPEG-H Authoring Tool (MHAT) is a new software tool for Mac and Windows that helps you create MPEG-H metadata with existing audio material. The MHAT allows for easy MPEG-H authoring without the need of a digital audio workstation (DAW). You can define specific MPEG-H parameters, instantly listen to your configurations and export your authored mixes as MPEG-H Production Format (MPF), MPEG-H BWF/ADM or as a template export in an XML file.
The MPEG-H Conversion Tool (MCO) is a software tool for Mac and Windows that can be used to convert MPEG-H compliant content masters. The MCO serves as interface to the MPEG-H Audio ecosystem and supports the import and export of MPEG-H Production Format (MPF) and BWF/ADM files.
The MPEG-H Production Format Player (MPF-Player) is a software tool for Mac and Windows to check the quality of already authored MPEG-H metadata and the accompanying audio mix, with or without a corresponding video.
Object-based production requires a metadata authoring step for the object-based interactivity and accessibility features as well as for loudness measurement. There is no single answer that fits all kinds of production environments and production requirements, but a range of typical workflows starting at simple, automated or preset-based authoring that fits the most common content types, up to comprehensive authoring workflows for advanced applications. See here for more information
The MPEG-H Audio System has been designed such that content creators can define multiple presets and explore new creative options. A broadcaster can prepare mixes (including the default or main mix of the program) using authoring tools that specify an ensemble of gain and position settings for objects to create preset mix selections that can be presented on a simple menu to the user. Even more control of the audio elements in a program is possible and can be enabled in the »advanced MPEG-H Audio interactivity menu« by enthusiast viewers. All interactivity features offered to the user are strictly defined by the broadcaster during metadata creation. This process of generating metadata is called »authoring« and is the most important difference in production of MPEG-H Audio content compared to a legacy production.
There are multiple solutions, depending on the production scenario. Using the tools of the MPEG-H Authoring Suite in post-productions, audio and metadata can be exported as:
MPEG-H BWF/ADM: An MPEG-H BWF/ADM (short for Broadcast Wave Format with embedded Audio Definition Model metadata) file is a multichannel wave-file which contains all the audio and metadata for the MPEG-H scene. The exported BWF/ADM file is compliant to the MPEG-H ADM Profile. Loudness will be measured during export and will be embedded into the exported file.
MPF: An MPF (short for MPEG-H Production Format) file is a multichannel wave-file which contains all the audio and metadata for the MPEG-H scene. The metadata is stored in the Control Track, which is one of the audio tracks in the multichannel wave-file and contains a modulated signal that is robust against sample rate conversions or level changes. Loudness will be measured during export and will be embedded into the exported file.
XML: This export option is intended for special applications that make use of MPEG-H scene definitions as XML representation. The XML is accompanied by a multichannel wave file containing the audio essence.
For MPEG-H live-productions, the Authoring and Monitoring Units (AMAU) export the audio signals and the Control Track in realtime. It allows transport of the metadata tightly coupled with the audio content over any medium offering transport of PCM data, such as SDI, MADI, or AoIP. The Control Track can be treated like any other audio signal and is robust against sample rate conversions or level changes.
For more information watch this video
MPEG-H Audio has been specifically designed for flexible loudspeaker rendering, including traditional layouts such as stereo, 5.1 and 7.1, as well as 3D-audio configurations with height channels, like 5.1+4H and 7.1+4H, or configurations with height, mid and lower-layer channels, for example 22.2, or even yet to be defined layouts.
The loudspeaker configuration depends on the requirements of the intended production. Recommendations for loudspeaker placement, studio design and productions workflows can be found here.
Yes, this option is available in version 3.5 of the MPEG-H Authoring Suite.
MPEG-H Audio supports downmixing to typical, common speaker layouts with a set of pre-defined downmix configurations. Additionally, it comes with customizable downmix options enabling content-specific downmixing that is configurable for each layout.
Yes, this functionality can be enabled using the Dynamic Gains feature in the MPEG-H Authoring Plug-in version 3.0 and higher and in the MPEG-H Authoring Suite.
Yes, the MPEG-H Authoring Suite comes with a set of template sessions for Nuendo, Pro Tools, Reaper and Sequoia.
As a first step, we’d like to recommend our series of tutorial videos to help you get started with MPEG-H Authoring using our MPEG-H Authoring Plug-in.
If you have further questions, you can always get in touch with our MPEG-H Tool experts via: email@example.com
STANDARDS & SPECIFICATIONS
ISO/IEC 23008-3: “Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio” Link
ATSC: A/342 Part 3:2021, MPEG-H System Link
Digital Video Broadcasting (DVB): ETSI TS 101 154, Specification for the use of Video and Audio Coding in Broadcasting and Broadband Applications Link
TTA (TTAK-KO-07.0127R3): Transmission and Reception for Terrestrial UHDTV Broadcasting Service Link
ABNT NBR 15602-2, Digital terrestrial television – Video coding, audio coding and multiplexing Part 2: Audio coding, Amendment 1 Link
SCTE: SCTE 242-3, Next Generation Audio Coding Constraints for Cable Systems: Part 3 – MPEG-H Audio Coding Constraints Link
UHD Forum: Ultra HD Forum Guidelines Link
International Telecommunications Union (ITU) Recommendation ITU-R BS.1196-7 (01/2019), Audio coding for digital broadcasting Link
ISO/IEC 23000-19:2020, Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media Link
CTA: CTA-5001, Web Application Video Ecosystem – Content Specification Link
DASH-IF: Guidelines for Implementation: DASH-IF Interoperability Point for ATSC 3.0 Link
HbbTV: HbbTV 2.0.2 Specification (ETSI TS 102 796): Hybrid Broadcast Broadband TV Link
3GPP: ETSI TS 126 118 v15.0.0 (2018-10) 5G: 3GPP Virtual reality profiles for streaming applications (3GPP TS 26.118 version 15.0.0 Release 15) Link
VR-IF: VR Industry Forum Guidelines Link
ISO/IEC 23090-2:2019, Information technology — Coded representation of immersive media — Part 2: Omnidirectional media format Link
Digital Video Broadcasting (DVB): ETSI EN 300 468, Specification for Service Information (SI) in DVB systems Link
Digital Video Broadcasting (DVB): MPEG-DASH Profile for Transport of ISO BMFF Based DVB Services over IP Based Networks Link
SCTE: SCTE 243-3, Next Generation Audio Coding Constraints for Cable Systems: Part 3 – Carriage of MPEG-H Audio Link
White Papers and Technical Reports
MPEG-H ADM Profile 1.0.0, April 2020, Requirements and Recommendations: https://www.iis.fraunhofer.de/en/ff/amm/dl/whitepapers/adm-profile.html
MPEG-H Audio for Improving Accessibility in Broadcasting and Streaming https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/wp/20191001_MPEG-H_Accessibility_White_Paper.pdf
Scientific Papers (open access)
Bleidt et al (2017): https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7874294
Herre et al (2015): https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7056445
Füg, Kuntz (2015): https://pub.dega-akustik.de/DAGA_2015/data/articles/000515.pdf
Paulus, et al (2019): https://www.aes.org/tmpFiles/elib/20200909/20489.pdf
Torcoli, et al (2021): https://arxiv.org/abs/2112.0949
Murtaza, et al (2021): MPEG-H Audio System for SBTVD TV 3.0 Call for Proposals