1. The MPEG-H Audio System#

Fraunhofer IIS has developed the MPEG-H Audio system, based on the MPEG-H 3D Audio standard, to offer an enhanced sound experience for broadcast and new media services such as UHDTV, immersive music services, 4K video streaming or Virtual Reality. In addition to improved bit-rate efficiency MPEG-H Audio offers:

Immersive audio: places your audience right in the middle of the action.

Personalized audio: Select your preferred audio presentation from several preconfigured versions within the same stream. Seamlessly switch between languages and adapt the audio mix to your needs and preferences.

Universal delivery: MPEG-H can go everywhere. On the go – in your living room – in your home theatre – in the car. One production, encoded into one bit stream that is delivered to many devices, such as TV sets, loudspeakers, head-phones, or MPEG-H equipped soundbars.

Improved accessibility: Enhanced speech intelligibility with Dialog Boost Presets, and advanced audio description features help your content reach even wider audience.

MPEG-H Audio is a next-generation open audio standard. The system is based on the MPEG-H 3D Audio standard from ISO/IEC MPEG, the international standards group responsible for many globally dominant media standards such as MP3, AAC, MPEG-1, MPEG-2, MPEG-4, AVC/H.264 and HEVC/H.265. The MPEG-H Audio system is included in the ATSC, DVB, TTA (Korean TV) and SBTVD (Brazilian TV) TV standards.

The MPEG-H Audio decoder can render the bit stream to a vast number of speaker configurations. Binaural sound rendering for 3D headphone reproduction is supported in MPEG-H enabled playback devices and during production. Other features that set MPEG-H apart from legacy audio compression standards, such as AAC, in TV broadcasting, is the integration of countless solutions for broadcast and streaming applications, such as extremely flexible rendering and downmixing functionality, advanced loudness and Dynamic Range Control management (DRC), seamless bitrate switching and a unique system design for connectivity across multiple devices. This flexibility is achieved through the use of MPEG-H Audio metadata, which contain all the information the playback device needs to offer the best listening experience in any given situation and environment.

Schematic MPEG-H workflow from production to reproduction. During authoring and up to the encoder all audio is plain, uncompressed PCM. The decoder unwraps the compressed bitstream. Together with metadata from the bitstream and additional information from the playback device, such as available number of channels, headphones and user interactivity, the renderer produces speaker or headphones signals.#

Most MPEG-H Audio metadata is created during a production in a process we refer to as metadata authoring. In a post-production environment this starts by assigning audio signals to signal groups (Components) and combining them in Presets, and ends with a final loudness measurement, which is automatically performed during the export of your production. Many of the required MPEG-H metadata are created in the background, using default values which have been carefully designed to work for most situations. At the same time, MPEG-H allows most of them to be customized, should it become necessary.

In a very basic MPEG-H post-production workflow, video and audio content can be produced in a conventional way. The final step before MPEG-H encoding would be MPEG-H authoring. Depending on the distribution workflow, metadata created during authoring can be exported as a Control Track. Usually, the Control Track and audio content are contained in the same multichannel WAV file for maximum convenience. For file based workflows, metadata can also be exported as MPEG-H BWF/ADM or MXF/S-ADM file consisting of both metadata and audio belonging to the MPEG-H scene or as a template export of the MPEG-H scene authoring in an XML file.