skip to Main Content

Produce Content in MPEG-H Audio


Immersive audio – or 3D-audio – enables a new realism in sound reproduction by adding elevated speakers above or below the listeners. With the MPEG-H Audio System and its available production and authoring tools for live- and post-productions, mixing 3D immersive sound was never easier!

With MPEG-H Audio, listeners can interact with the audio elements and choose from different languages or commentators or even change the position of audio objects. The configuration of user interactivity is always under the full control of the content creator.

The MPEG-H Audio System is designed to work in streaming systems as well as in existing and future broadcast systems from contribution to emission. In numerous trials around the globe, broadcast specialists experienced the easy integration into their existing workflows.

Introduction to MPEG-H Audio Production

MPEG-H Audio enables content creators to produce immersive and personalized experiences. Sound sources such as vocals, chorus and instruments can be positioned in a three-dimensional space to perfectly match the creative and artistic intent. When playing back the resulting content, users can enjoy music that immerses them in sound from every direction as intended by the content creator.

The MPEG-H Audio System offers the possibility for content providers to define multiple presets and explore new creative options. A broadcaster can define versions of the mix (including the default or main mix of the program) using authoring tools to create preset mix selections that are presented on a simple menu to the user.

All interactivity features offered to the user are controlled through metadata that are defined by the content creator during production. The process of generating this metadata is called “authoring” and is an additional element in the MPEG-H Audio production compared to legacy content creation.

The result of the authoring step is an “MPEG-H Master” that is a bundle of metadata and audio content. MPEG-H Audio metadata contains all control information for user interactivity and also all necessary information that the playback device needs for reproduction and rendering to ensure the best audio experience on any platform.

MPEG-H Audio Production Workflow

Post-production File-based Workflow

In file-based workflows, the authoring of MPEG-H Audio scenes and the metadata is handled either by stand-alone tools or by plug-ins for Digital Audio Workstations (DAW). In those tools the scene is authored based on the audio mix, all user interactivity options are set and loudness is measured. After completion of the authoring, the metadata is exported together with the audio data. There are multiple file-based delivery solutions available, depending on the production scenario. Using the tools of the MPEG-H Authoring Suite in post-productions, the MPEG-H Master can be exported in the following formats:

MPEG­-H BWF/ADM: An MPEG-­H BWF/ADM file (short for Broadcast Wave Format with embedded Audio Definition Model metadata) is a multi­channel wave-­file which contains all the audio and metadata of the MPEG-­H Audio scene. The exported BWF/ADM file is compliant to the MPEG­-H ADM Profile.
MPF: An MPF file (short for MPEG­-H Production Format) is a multi­channel wave-­file which contains all the audio and metadata of the MPEG-­H Audio scene. The metadata is stored in the “Control Track”, which is one of the audio tracks in the multichannel wave-file and contains a “time-code like” signal that is robust against sample rate conversions or level changes.

For more information take a look at the MPEG-H Authoring Suite Tutorial series

Live-linear Realtime Workflow

The MPEG-H Audio System is designed to work with today’s streaming and broadcast equipment. In realtime scenarios, the authoring of MPEG-H Audio scenes and the metadata is handled by a device called “Authoring and Monitoring Unit” (AMAU). This device exports the metadata in realtime, tightly coupled with the audio signals and synchronized with the video signal on any of the connections that are commonly used in linear productions, such as SDI, MADI, or AoIP.
To ensure the integrity of metadata in an SDI environment in any production step, the metadata is delivered in the “Control Track”. The Control Track is a “time-code like” audio signal and can be treated as a regular audio channel. This ensures the synchronization of metadata with its corresponding audio and video signal. The Control Track is robust enough to survive A/D and D/A conversions, level changes, sample rate conversions or frame-wise editing. The Control Track does not force audio equipment to be put into data mode or non-audio mode in order to pass through.

To learn how to create MPEG-H in live productions for sports events, music shows or any other production scenario, watch our MPEG-H Live and Broadcast Tutorial series

MPEG-H Audio Production Tools

The MPEG-H Authoring Suite (MAS) is a set of tools that make the production of MPEG-H Masters easier, faster, more intuitive and more powerful. They support the MPEG-H ADM Profile, as well as binaural monitoring for immersive audio reproduction over headphones.

Register here for a download of the MPEG-H Authoring Suite.

MPEG-H Authoring Plug-In

The MPEG-H Authoring Plug-in takes you through all the steps of creating object- and channel-based MPEG-H Audio productions inside a VST3- or AAX-enabled Digital Audio Workstation (DAW). You will be able to export your immersive and interactive MPEG-H Audio scenes to MPEG-H Masters, containing audio and metadata, ready for distribution via MPEG-H-enabled channels.

MPEG-H Authoring Tool

The MPEG-H Authoring Tool is a new software tool for Mac and Windows for creating MPEG-H metadata with existing audio material. The MPEG-H Authoring Tool allows for easy MPEG-H authoring without the need of a Digital Audio Workstation. You can define specific MPEG-H parameters, instantly listen to your configurations and export your authored mixes as MPEG-H Masters or as a template export in an XML file.

MPEG-H Conversion Tool

The MPEG-H Conversion Tool (MCO) is a software tool for Mac and Windows that can be used to convert MPEG-H compliant content masters. The MCO serves as interface to the MPEG-H Audio ecosystem and supports the import and export of MPEG-H Masters.

MPEG-H Production Format Player

The MPEG-H Production Format Player (MPF Player) is a software tool for Mac and Windows to quality check already authored MPEG-H metadata and the accompanying audio mix, with or without the corresponding video.

The Audio Definition Model (ADM)

The Audio Definition Model (ADM) according to ITU-R BS.2076 defines an open metadata format for production, exchange and archiving of Next Generation Audio (NGA) content in file-based workflows. Its comprehensive metadata syntax allows describing many types of audio content including channel-, object-, and scene-based representations for immersive and interactive audio experiences. A serial representation of the Audio Definition Model (S-ADM) is specified in ITU-R BS.2125 and defines a segmentation of the original ADM for use in linear workflows such as real-time production for broadcasting and streaming applications.

The MPEG-H ADM Profile defines constraints on ITU-R BS.2076 and ITU-R BS.2125 that enable interoperability with established NGA content production and distribution systems for MPEG-H Audio as defined in ISO/IEC 23008-3.

The freely available Fraunhofer ADM Info Tool is a software utility that provides support in creating profile-conform ADM metadata. Its conformance check framework runs input ADM metadata against an exhaustive set of checks derived from the MPEG-H ADM Profile, gathering detailed reports of any encountered conformance issues and providing information on how to resolve them.


Improved speech mix

Many people find it hard to follow speech in broadcasting and streaming due to loud background sounds. A recent survey carried out by Fraunhofer IIS and WDR showed that 68% of the audience across all demographics frequently or very frequently had issues with understanding speech on TV. Dialog+, an MPEG-H production technology, addresses this issue and ensures clear speech by allowing the adaption of loudness levels of both speech and background sounds. To achieve this, it uses a solution based on deep learning and can be applied when only a final mix is available. This makes it possible to customize the speech level to individual requirements.

Dialog+ is next generation accessibility

Dialog+ is a technology that works particularly well to upgrade older content for which only the final audio mix is available. It also works on today’s legacy systems. Combined with the MPEG-H Audio system, it provides a whole new level of personalization to its users. Thanks to MPEG-H Dialog+, viewers can now select the mix they like and personalize the sound to meet their preferences.

Experience Dialog+ for yourself

Learn more about the technological background

For papers and other downloads, please visit the download page.

For more information about accessibility solutions from Fraunhofer IIS, email us at .

Training Courses

Fraunhofer has an interdisciplinary team of researchers, engineers and Tonmeisters. We support production teams on-site during live broadcast trials. We can offer seminars for MPEG-H production at our headquarters in Erlangen, Germany, or even on site at broadcast studios around the world. We also offer offline tutorial series and provide dedicated training material to fit your needs. Please contact us us for details.

See our Vimeo channel for all the videos:

MPEG-H Audio live production

Tutorials for the MPEG-H Authoring Suite

Studio Recommendations

Immersive audio or 3D­-audio mixing for home delivery is done using loudspeakers in an audio control room or near­ field mixing environment.

Fraunhofer with its extensive experience in 3D-audio mixing, and in setting up studios as well as listening rooms can provide all the structural requirements and technical specifications for a 3D-­audio production environment for accurate mixing and reproduction in a flexible manner for loudspeaker reproduction systems ranging from 1.0 up to 7.1+4H channel layouts.

Fraunhofer offers consultation for room geometry and room acoustics, loudspeaker positioning and electroacoustic performance, 3D-­audio monitoring and mixing capabilities and provides recommendations for related literature.

Our comprehensive paper on studio recommendations is a great starting point. Download it here.

Once your system is set up, you can test it with our MPEG-H test signals.

Back To Top