MPEG-H Audio Academy

Take a deep-dive into the MPEG-H Audio universe and explore tools for every stage of your journey

Learn all about MPEG-H Audio

Your journey starts here. Our tutorials and webinars are the ideal kick off to start working with MPEG-H Audio and experiencing first-hand the multi-layered production opportunities it has to offer. Tailored learning material guides you through the sophisticated production process.

Authoring Suite – First Steps

The MPEG-H Authoring Suite (MAS) is a set of tools that make the production of MPEG-H Audio content easier, faster, more intuitive, and more powerful. They support the recently published MPEG-H ADM Profile, as well as binaural monitoring for immersive audio reproduction over headphones.

Download

Learn

All the resources about MPEG-H Audio creation at your fingertips.

Play

Demo content for your first object-based MPEG-H Audio creation.

Coming soon!

Dialog+ Enhances Speech

Many people find it hard to follow speech in broadcasting and streaming due to loud background sounds. A survey carried out by Fraunhofer IIS and german public broadcaster WDR showed that 68% of the audience across all demographics frequently or very frequently had issues with understanding speech on TV. Dialog+ is an MPEG-H production technology particularly well suited for legacy content where only the final audio mix is available. It ensures clear speech by allowing the adaption of loudness levels of both speech and background sounds. To achieve this, it uses a solution based on deep learning that separates speech and background and remixes them in a way that lets listeners customize the speech level to individual requirements.

Read the Blog

Keep up to speed with the latest news and developments around MPEG-H Audio.

Test Signals

Creating a suitable environment is always indispensable for a successful workflow. Therefore, setting up you speakers correctly is crucial for MPEG-H Audio to be able to unfold its full potential and beauty. The Set-up guide, channel identifications and various technical notes set your work environment up for success.

TV Audio Setup

While setting up a home entertainment system is a breeze for gear heads, it may be a bit overwhelming for tech novices. With the quickly understandable instructions in this short PDF-guide, users will be plugging the right cables into the right sockets in no time. Easy to follow steps provide the information needed for any configuration.

Publications

Learn more about MPEG-H Audio in practice from the publications Fraunhofer IIS released and contributed to. They cover all relevant topics from standardisation issues to technical reports and scientific papers.

FAQ

What is MPEG-H Audio?

MPEG-H Audio is a new, next-generation audio technology providing more realism through sound from above as well as around the listener. With its unique personalization features, MPEG-H Audio offers viewers great flexibility to actively engage with the content and adapt it to their own preferences. Regardless of the device, the MPEG-H Audio System delivers the best sound experience possible.

What's the benefit of MPEG-H Audio compared to legacy audio codecs?

MPEG-H Audio is a complete audio solution and much more than just a codec. Among others, it offers the following big advantages compared to legacy audio codecs:

1) Immersive Sound: MPEG-H Audio allows the transmission of three-dimensional immersive audio (3D-audio) by adding elevated sound sources above and below the listeners position. MPEG-H Audio has been specifically designed for flexible loudspeaker signaling including traditional layouts such as stereo, 5.1, 7.1, as well as 3D configurations, namely 5.1+4H, 7.1+4H or 22.2 or even yet to be defined layouts. Within MPEG-H Audio, immersive sound can be carried as channels, objects or as a combination of those.

2) Interactive and Personalized Sound: MPEG-H Audio enables the listener to interact with the content and create personalized audio experiences. The advanced interactivity options range from simple adjustments, for example, increasing or decreasing the dialogue level in relation to other audio elements, to advanced scenarios in which audio elements may be selected and adjusted in level and/or position as preferred by the listener and under the limits authored by the content creator.

3) Universal Delivery: MPEG-H offers flexibility by delivering of the same bit stream through different distribution platforms (e.g., terrestrial, satellite, broadband or mobile networks) to all types of devices (e.g., TV set, AVR, soundbar, set-top box, tablet, virtual reality gears with 360-degree video) in various environments, for example, living room, home theater, or noisy mobile environments.

What is MPEG-H Audio standard?

MPEG-H Audio is an international standard developed by the ISO/IEC Moving Picture Experts Group (MPEG), the organisation which has a long history in audio coding with mp3 and the AAC codec family. The MPEG-H Audio standard (ISO/IEC 23008-3) specifies two relevant profiles – Low Complexity (LC) and Baseline (BL) – essential for the broadcast and streaming industry, which allow decoding and rendering of immersive, 3D-audio content while enabling advanced personalization features. Audio objects may be used alone or in combination with channels for efficient delivery and reproduction of immersive sound. The use of these audio objects allows for interactivity or personalization of a program by adjusting the gain or position of the objects during playback. Details about the MPEG-H Audio standard can be found here.

Which audio codec is used for MPEG-H Audio?

MPEG-H Audio is a complete audio solution. It does not use other audio codecs, its codec functionality builds upon the developments from previous generations of MPEG audio codecs such as the AAC codec family instead.

What are the use cases of MPEG-H Audio?

MPEG-H Audio enriches the audio experience by combining immersive sound and advanced personalization options with bit rate efficient, universal delivery to meet requirements of today’s consumer needs.

The MPEG-H Audio System has proven to be the most advanced audio solution for enhancing the broadcast and streaming services for sport events, empowering the audience to experience the emotion of the sports arena in their living room and to decide what is more important for themselves, for example, listening only to the crowd of their favorite team or focus on the commentary. Read more here.

Similarly to sport events, streaming of live concerts is another major use case where service providers are eager to enhance the their services with immersive sound and interactivity options. Read more here and here.

The advanced accessibility features of the MPEG-H Audio system are essential for the elderly and visually or hearing impaired audience. With its Dialog Enhancement and advanced Audio Description Services, MPEG-H Audio makes broadcast audio more accessible for all viewers.

What standards currently support MPEG-H Audio?

MPEG-H was adopted in several broadcast, streaming and virtual reality standards. A list can be found here.

What MPEG-H Audio content and services are available?

MPEG-H Audio powers the music format 360 Reality Audio, initiated by Sony. The first 360 Reality Audio immersive music streaming services from Amazon Music HD, Deezer, nugs.net, Sony Select and TIDAL have launched in fall of 2019 with currently more than 3000 songs available. Major Labels supporting the 360RA initiative include Sony Music Entertainment, Universal Music, and Warner Music.

The MPEG-H Audio System is used as the sole audio system in the world’s first terrestrial UHD TV service in South Korea. Launch of the system was in May 2017 and commercial services from KBS, MBC and SBS are on-the-air 24/7 since then.

What available end-consumer devices support MPEG-H Audio?

A growing number of devices support MPEG-H Audio, like the Sennheiser Ambeo sound bar, Audio-Video-Receivers from Denon, Marantz and McIntosh, the Amazon Echo Studio smart speaker or the Google ChromeCast Ultra 4K, as well as TV sets from Samsung and LG for the UHD TV service in South Korea.

What is the bitrate of a MPEG-H program?

Because of the flexibility of MPEG-H Audio when it comes to signal configurations, there is no simple answer to that question, as the bitrate depends on the number of signals (channel signals or object signals). With an increasing number of signals in a configuration, the efficiency of the codec increases and the resulting total bitrate is smaller than the sum of single-encoded signals. The following table indicates bitrates for some common channel configurations resp. a combination of channel and object signals, starting with stereo and 5.1 surround to several 3D configurations (indicated by “H” for the height channels) and combinations of 3D channel configurations and different numbers of object signals. All given examples use a total number 16 or less signals that is covered by “Level 3” in the MPEG-H Audio standard, except for the last configuration, “22.2”, that is covered by “Level 4”.

Bit rates in kbit/s for	Good	Excellent	Transparent
2.0	48	64	96
5.1	128	192	256
5.1+2H	160	256	320
5.1+4H	192	320	448
7.1+4H/5.1+4H + 2 Objects	256 – 288	384 – 420	512 – 576
7.1+4H + 3 Objects/5.1+4H + 5 Objects	352 – 384	480 – 576	640 – 768
22.2	512	768	1024

Scale according to MUSHRA Recommendation ITU-R BS. 1534-3

Can the MPEG-H stream be multiplexed alongside AAC streams?

Existing broadcast services that use AAC/HE-AAC stereo or surround audio, can be enhanced with the advanced MPEG-H Audio features by simply adding an additional MPEG-H Audio stream in the multiplex. All audio and video broadcast encoders that support MPEG-H Audio can create a multiplex containing the AAC stream as well as the MPEG-H Audio stream. The former can be decoded by legacy receivers and the latter will be decoded by newer receivers.

How can a user interact with audio elements inside an MPEG-H Audio stream?

MPEG-H Audio enabled devices natively offer a “User Interface” which displays all the interactivity options enabled by an MPEG-H stream. Based on the content creator’s intentions, for each MPEG-H stream, different interactivity options might be offered to the viewers at home and through the User Interface they have the freedom to personalise their content.

How does the en- and decoder know what options are embedded in the MPEG-H audio stream?

An MPEG-H Audio scene comprises the audio content itself together with additional metadata. This metadata is created during production and contains all necessary information to render the audio content in arbitrary reproduction layouts and to ensure the best audio experience on any platform.

How do I ensure the integrity of metadata during production?

MPEG-H Audio has been carefully designed for enhancing broadcast, streaming and immersive music applications. To ensure the integrity of metadata in an SDI-based environment at any production step, the metadata is delivered in the “Control Track”. The Control Track is a “time-code like” audio signal and can be treated as a regular audio channel. This ensures the synchronization of metadata with its corresponding audio and video signals. The Control Track is robust enough to survive A/D and D/A conversions, level changes, sample rate conversions or frame-wise editing. The Control Track does not force audio equipment to be put into data mode or non-audio mode in order to pass through.

What is an MPEG-H Master?

An MPEG-H Master carries all the uncompressed audio content and production metadata of the MPEG-H Audio scene. An MPEG-H Master can either be a Broadcast Wave Format File carrying Audio Definition Model metadata compliant to the MPEG-H Profile (MPEG-H BWF/ADM) or an MPEG-H Production Format (MPF) file carrying the metadata inside an MPEG-H Control Track.

What is the MPEG-H Control Track?

The MPEG-H Control Track is a unique solution for delivering the metadata aligned with the audio and video data though existing SDI-based infrastructures. The Control Track is as a “time-code like” PCM audio signal that can be carried on an extra SDI or wave-file channel. It can be edited in a video editor just as any other audio signal.

It allows transport of the metadata tightly coupled with the audio content over any medium offering transport of PCM data, such as SDI, MADI, or AoIP. The Control Track can be treated like any other audio signal and is robust against sample rate conversions or level changes. The metadata contained in the Control Track is aligned to the audio and video data, thus any configuration change in live or post production can be applied at every video frame boundary.

What is the MPEG-H Production Format?

The MPEG-H Production Format (MPF) is a multi-channel PCM audio file which contains all the audio content and production metadata of the MPEG-H Audio scene. The metadata is stored as a Control Track, which is a timecode-like PCM audio signal and one of the audio tracks in the multichannel wave-file.

What is the Audio Definition Model (ADM)?

The Audio Definition Model (ADM) according to ITU-R BS.2076 defines an open metadata format for production, exchange and archiving of next-generation audio (NGA) content in file-based workflows. Its comprehensive metadata syntax allows describing many types of audio content including channel-, object-, and scene-based representations for immersive and interactive audio experiences. A serial representation of the Audio Definition Model (S-ADM) is specified in ITU-R BS.2125 and defines a segmentation of the original ADM for use in linear workflows such as real-time production for broadcasting and streaming applications.

What is the MPEG-H ADM Profile?

The MPEG-H ADM Profile defines constraints on ITU-R BS.2076 and ITU-R BS.2125 that enable interoperability with established NGA content production and distribution systems for MPEG-H Audio as defined in ISO/IEC 23008-3.

The freely available Fraunhofer ADM Info Tool is a software utility that provides support in creating profile-conform ADM metadata. Its conformance check framework runs input ADM metadata against an exhaustive set of checks derived from the MPEG-H ADM Profile, gathering detailed reports of any encountered conformance issues and providing information on how to resolve them.

Is there an automatic conversion of Dolby Atmos content to MPEG-H?

With the MPEG-H Conversion Tool, Fraunhofer offers a simple one-click solution for converting existing Dolby Atmos BWF/ADM files into the MPEG-H Production Format. The tool is available as part of the MPEG-H Authoring Suite (MAS).

Where can I get MPEG-H Audio Production Tools?

Fraunhofer IIS offers Production Tools, bundled in the MPEG-H Authoring Suite. The suite consists of the MPEG-H Authoring Plug-in (MHAPi), the standalone MPEG-H Authoring Tool (MHAT) and the MPEG-H Conversion Tool (MCO).

Other options for producing MPEG-H include the New Audio Technology Spatial Audio Designer and Blackmagic DaVinci Resolve Studio for post-production workflows, as well as the Linear Acoustic AMS and the Jünger MMA Hardware for live production with MPEG-H Audio.

What can I do with the free available MPEG-H Authoring Suite (MAS)?

The MPEG-H Authoring Suite (MAS) is a set of tools that make the production of MPEG-H Audio content easier, faster, more intuitive, and more powerful. They support the recently published MPEG-H ADM Profile, as well as binaural monitoring for immersive audio reproduction over headphones.

The MPEG-H Authoring Plug-in (MHAPi) takes you through all the steps of creating object- or channel-based MPEG-H Audio productions inside a VST3- or AAX-enabled digital audio workstation (DAW). You will be able to export your immersive and interactive MPEG-H Audio scenes to either MPEG-H Production Format (MPF) or MPEG-H BWF/ADM, containing audio and metadata and ready for distribution via MPEG-H-enabled channels.

The MPEG-H Authoring Tool (MHAT) is a new software tool for Mac and Windows that helps you create MPEG-H metadata with existing audio material. The MHAT allows for easy MPEG-H authoring without the need of a digital audio workstation (DAW). You can define specific MPEG-H parameters, instantly listen to your configurations and export your authored mixes as MPEG-H Production Format (MPF), MPEG-H BWF/ADM or as a template export in an XML file.

The MPEG-H Conversion Tool (MCO) is a software tool for Mac and Windows that can be used to convert MPEG-H compliant content masters. The MCO serves as interface to the MPEG-H Audio ecosystem and supports the import and export of MPEG-H Production Format (MPF) and BWF/ADM files.

The MPEG-H Production Format Player (MPF-Player) is a software tool for Mac and Windows to check the quality of already authored MPEG-H metadata and the accompanying audio mix, with or without a corresponding video.

What is required to enable object-based production with MPEG-H in existing production workflows?

Object-based production requires a metadata authoring step for the object-based interactivity and accessibility features as well as for loudness measurement. There is no single answer that fits all kinds of production environments and production requirements, but a range of typical workflows starting at simple, automated or preset-based authoring that fits the most common content types, up to comprehensive authoring workflows for advanced applications. See here for more information.

What is the "MPEG-H Audio authoring step"?

The MPEG-H Audio System has been designed such that content creators can define multiple presets and explore new creative options. A broadcaster can prepare mixes (including the default or main mix of the program) using authoring tools that specify an ensemble of gain and position settings for objects to create preset mix selections that can be presented on a simple menu to the user. Even more control of the audio elements in a program is possible and can be enabled in the »advanced MPEG-H Audio interactivity menu« by enthusiast viewers. All interactivity features offered to the user are strictly defined by the broadcaster during metadata creation. This process of generating metadata is called »authoring« and is the most important difference in production of MPEG-H Audio content compared to a legacy production.

How can I export the audio alongside the metadata and ensure the integrity in all production steps?

There are multiple solutions, depending on the production scenario. Using the tools of the MPEG-H Authoring Suite in post-productions, audio and metadata can be exported as:

MPEG-H BWF/ADM: An MPEG-H BWF/ADM (short for Broadcast Wave Format with embedded Audio Definition Model metadata) file is a multichannel wave-file which contains all the audio and metadata for the MPEG-H scene. The exported BWF/ADM file is compliant to the MPEG-H ADM Profile. Loudness will be measured during export and will be embedded into the exported file.

MPF: An MPF (short for MPEG-H Production Format) file is a multichannel wave-file which contains all the audio and metadata for the MPEG-H scene. The metadata is stored in the Control Track, which is one of the audio tracks in the multichannel wave-file and contains a modulated signal that is robust against sample rate conversions or level changes. Loudness will be measured during export and will be embedded into the exported file.

XML: This export option is intended for special applications that make use of MPEG-H scene definitions as XML representation. The XML is accompanied by a multichannel wave file containing the audio essence.

For more information watch this video on Vimeo or this video on Youtube.

For MPEG-H live-productions, the Authoring and Monitoring Units (AMAU) export the audio signals and the Control Track in realtime. It allows transport of the metadata tightly coupled with the audio content over any medium offering transport of PCM data, such as SDI, MADI, or AoIP. The Control Track can be treated like any other audio signal and is robust against sample rate conversions or level changes.

For more information watch this video.

Can I export MPEG-H compliant ADM using the Authoring Tools?

Yes, the MPEG-H Authoring Suite supports the export of audio and metadata as BWF/ADM according to the MPEG-H ADM Profile (MPEG-H BWF/ADM). You can dowload the profile here.

What is the recommended MPEG-H loudspeaker configuration?

MPEG-H Audio has been specifically designed for flexible loudspeaker rendering, including traditional layouts such as stereo, 5.1 and 7.1, as well as 3D-audio configurations with height channels, like 5.1+4H and 7.1+4H, or configurations with height, mid and lower-layer channels, for example 22.2, or even yet to be defined layouts.

The loudspeaker configuration depends on the requirements of the intended production. Recommendations for loudspeaker placement, studio design and productions workflows can be found here.

How do I check that speakers are connected correctly for MPEG-H playback?

We offer MPEG-H test signals including channel identification, lip sync, and level checks for verifying that the speakers are connected and adjusted properly.

Is there an option to monitor on headphones with binaural rendering?

Yes, this option is available in version 3.5 of the MPEG-H Authoring Suite.

How does MPEG-H support downmixing?

MPEG-H Audio supports downmixing to typical, common speaker layouts with a set of pre-defined downmix configurations. Additionally, it comes with customizable downmix options enabling content-specific downmixing that is configurable for each layout.

Is it possible to make the selected language dip / duck the bed track by an adjustable amount?

Yes, this functionality can be enabled using the Dynamic Gains feature in the MPEG-H Authoring Plug-in version 3.0 and higher and in the MPEG-H Authoring Suite.

Are there example sessions or templates to be used for the MPEG-H Authoring Suite?

Yes, the MPEG-H Authoring Suite comes with a set of template sessions for Nuendo, Pro Tools, Reaper and Sequoia.

How can I get training or tutorials for MPEG-H Audio production?

As a first step, we’d like to recommend our series of tutorial videos to help you get started with MPEG-H Authoring using our MPEG-H Authoring Plug-in.

Watch on YouTube

Watch on Vimeo

If you have further questions, you can always get in touch with our MPEG-H Tool experts via: productiontools-techsupport@iis.fraunhofer.de

FAQ