Object-Based Audio: In search of an experience

There is more flying around in the innovative audio space than just objects…

In post, yes, the object has arrived; in broadcast the talk is of objects, but in others it’s ‘programme elements’, substreams, presentations, and the oncoming AC-4. Their bedfellows – immersive audio and 3D audio – have been enabled by both the objects of Dolby Atmos, and the layers of the alternative Auro-3D. Can AC-4 be equally enabling in broadcast? Confused? No longer…

One of the big issues in all of this is the definition of an object and what it means to different technologies and platforms.

From object…to object

In a way, the purest object is the object created by software programmers in OOP (Object Orientated Programming), a kind of self-sufficient module that can be re-used, reproduced, bred, and left – in a conceptual sense – to its own devices. In games development a ball might travel the virtual world and choose its own sounds depending on what it hits or interacts with. A reverb might change with the environment, or a surface might determine the sample that is triggered. These ideas have certainly been discussed in the mix room – games sound designers have even been invited to speak at large post facilities, so it’s not so far fetched to imagine these kinds of objects playing a part at some point.

In a very closely related way, we’ve been dealing with objects in DAWs for a very long time already – regions, the audio pool, the clip list, and so on. With clip-based automation, the mix instructions travel with the audio and are rendered on playback. That clip is essentially an object, carrying around the information needed to become a useful part of the mix.

It’s that idea – that the instructions for rendering the mix are delivered separately from the audio itself – that is at the heart of the Dolby Atmos version of an object. The audio on a track is the raw material; when you put an Atmos panner on that track, it becomes an object. The description of where that audio is in a physical space is only imposed on the audio at the RMU (Rendering and Mastering Unit) for monitoring and at the CP850 processor in the cinema.

Maybe a more comfortable target for the ‘object’ label in Atmos would be the source itself – the thing that is making the noise on screen (or off). Whether it’s the feet of an extra, the mouth of the main character, or a bullet that’s making its way from behind the listener and into the victim – front and centre.

The broadcast object

Moving into the living room, the idea of objects in broadcast has
been talked about for a while, but 128 channels of audio, plus metadata, is not practical with current codecs and bandwidths. However, with oncoming improvements to those codecs, there is space for more audio options – and this is the broadcast definition of an object.

In May 2013 Tony Churnside set out a basic vision of what BBC R&D sees as the real world application of object-based audio. BBC R&D also conducted tests on object-based experiences and the way in which viewers might interact with the programming, including the on-going Breaking Out, which you can find at futurebroadcasts.com. A White paper – ‘Object-Based Audio Applied to Football Broadcasts – The 5 live Football Experiment’ (WHP 272) produced by The UK Audio Research Partnership (BBC R&D, Surrey, Southampton, and Salford universities) describes a test where listeners were given control over the commentary level in the mix, and the ability to focus on one end of the stadium depending on which supporters they wanted to side with.

On whole it was discovered that listeners tended to find a comfortable setting and stick with it, rather than explore the possibilities throughout the event – though of course this kind of thing rather depends on what options are available.

Dolby has moved this idea on with a wider-ranging concept demonstration at NAB 2014, is avoiding the term ‘object’ in this context and opting for ‘programme elements’.

Coinciding with the publication of the ETSI AC-4 Standard (TS 103 190 V1.1.1), Dolby has demonstrated the potential flexibility and options that a variety of programme elements, and increased channel count could offer. Dolby’s Prinyar Boon, Director of Systems Architecture, Broadcast, explains the potential in terms of the most important thing – the viewer experience.

“In terms of next generation audio there are two possible experiences. One an immersive experience, which we can roughly call Atmos for home if you like – there are height elements there. The second one is more personalisation, where you can choose the experience that will be delivered.”

Part of the Dolby demo is the playback of an ice hockey game through an experimental set-top box, where via a GUI the user can choose their ‘experience’…

“Just using the remote you can choose the neutral experience, or the home crowd, or the away crowd” says Prin. “You can, in effect move around the stadium, or choose no crowd at all. You can then choose the commentator – a neutral commentator, a biased commentator, or even no commentator at all.

“On the immersive side, the recently announced Atmos for Home Theatres has reference setups that include both 5.1.2 (5.1 system plus 2 ceiling speakers) and 7.1.4 configurations (7.1 system plus four ceiling speakers).”

Interestingly, the idea of height doesn’t necessarily have to come with physical speakers. Prin notes that Atmos-enabled speakers have been developed to create overhead sound, from a few feet off the floor, by firing sound upward that reflects off the ceiling.

A core message here is flexibility. A broadcaster could send a 5.1 mix, separate mono dialogues, 4.0 crowd options – with the experience determined by the user and the rendering guided by metadata in the AC-4 stream. Further, the metadata carries spatial information, so the experience is tailored to the system it is being played on.

“You can imagine that if you want to transmit all these multiple dialogues then good clean speech performances and low bit rates for all of those feeds becomes quite useful.”

AC-4 (etsi.org) is coming out in two main parts. Part 1 is already published, with Part 2 bringing further enhancements.

For now, Dolby is working to have Part 1 entered into the DVB Broadcast standards to provide a start-of-the art solution for today’s needs that will scale to support these new experiences.

“The syntax for choosing streams is supported in Part 1. There are things called ‘presentations’ in there which allow you to do that. But it’s not a full-blown object system yet.”

Part 2 will bring in the object metadata, additional channels (Part 1 is up to 7.1), and more. However, one big feature – dialogue enhancement technology, is already in Part 1. “The encoder can try to analyse the dialogue source in ‘blind mode’ (single ended),” explains Prin, “And send information to the receiver so you get some degree of control. Providing separate dialogue is ideal because, obviously, it gets the clean feed…”

In broadcast the object, or the element, is a new toy for the customer, provided at the discretion of the content creator. The final question might be how much control should the viewer or listener have? Is a broadcast best left as a presentation or is it really time to empower the consumer? In time this might turn out to be the big creative decision…

Dolby Atmos update

Atmos has exceeded Dolby’s expectations by quite some margin, but cinema refurbishments take time, and committing to doing an Atmos mix first then rendering to 7.1 is a big step for many – even though it’s a potential time and money saver.

¨The great thing about the world of Atmos, even at this relatively early stage, is that in production the discussion has become creative. Some spectacular examples of Atmos soundtracks have been well documented, not least the Oscar winning Gravity, which saw Glenn Freemantle of Sound 24 add a few expletives to the established language of film sound, and the sublime Life Of Pi, which even though it was an early Atmos trial, found a new freedom for its music through Atmos.

This has been, according to Dolby’s Nick Watson (Technical Marketing Manager), one of Atmos’ creative headlines: “I was certainly surprised that the music community has embraced objects in quite the way they have. I thought music would be the last thing to become ‘objectified’.

“A lot of mixers have started putting music in the front side surrounds – what are being dubbed the ‘widest speakers’ – the left-wide and right-wide… It frees up the screen channels for dialogue and effects and gives a wider stereo image as well.

“There is also a lot of benefit to be gained from putting some aspects of the music into the overhead beds, though actually what seems to be a little more effective is if you take out some of the music elements and place them as music objects in the overhead array. It gives more separation… You don’t necessarily know that you’re hearing it from above you or from a discrete point in the side array, but it just gives the music a definite sense of space. That’s what a lot of music mixers, editors, and composers have really enjoyed.

¨The idea of treating reverbs as objects was another relatively early adventure and has worked well for many mixers. “It was something we didn’t initially see happening,” says Watson, “Because in the first year we didn’t have object ‘size’ control in the plug-in. We introduced that last year and it means that any reverb can be returned as an object with size, either replicating channel-based reverb or making it slightly more controlled and being able to put different reverbs in different parts of the room.”

¨One thing that Gravity got straight (amongst others) was the contentious issue of dialogue panning. Before the luxury of full range surrounds and the panning resolution that Atmos brings, many branded it an unforgivable sin. In Gravity, however, Freemantle took advantage of the technology, and the environment, to keep the audience spinning along with the astronauts – a surprisingly natural experience thanks to the skill and care with which it was mixed.

¨With eyes on the horizon, in a utopia where bandwidth is ample and budgets are able, there is no doubt that the prize AC-4 seeks is delivery of more Atmos-like object audio into the home, into the mobile device, and into the car. The more imminent challenges for production are simply providing and producing additional presentation material, and being able to monitor those feeds during broadcast.

Dolby Atmos in the studio

Several manufacturers are involved in the provision of speakers, amps, and management for Dolby Atmos systems for post facilities and for cinema. Higher profile manufacturers include JBL (which was chosen, along with Crown amps, for the Ray Dolby Theatre at the Dolby HQ in London), and Meyer Sound, conventional studio monitors from companies like Genelec are also being specified and successfully used.

¨Meyer’s Atmos products include the Acheron 80 screen channel loudspeakers, the X-800C and X-400C cinema subwoofers, and the HMS-10 and 10 HMS-12 surround loudspeakers. JBL’s Atmos offerings include the three-way ScreenArray system, and the relatively new 9320 large-format cinema surround loudspeaker, with the rotatable Progressive Transition Waveguide.

Plugging Atmos

The main Dolby authoring tools in Pro Tools are the Dolby Atmos Monitor application and the Dolby Atmos panner plug-in. The other core component is the RMU (Rendering And Mastering Unit), which facilitates monitoring, recording a print master, and playing back a print master. During mixing, the Dolby pushes metadata to the RMU, which can then render an Atmos mix – or downmix – for monitoring. In the same way, native Atmos mixing provision in products like the Avid System 5 and Pro Tools send metadata directly to the RMU. The Dolby Monitor application controls and communicates with the RMU and provides lots of useful visual feedback, such as metering, signal present indicators, object positional display, and speaker configuration.

Metadata is transported to the RMU via Ethernet, while MADI is used to send the audio.

AC-4: A New Experience

AC-4, according to the published Part 1 document, is born of the need to focus on ‘delivering experiences’. To do this, the emphasis has gone implementing digitally efficient delivery systems, and multiple device environments. Those two things go together – we can’t pile on more content without more bandwidth, so the codec has to get better to free up the bandwidth. Thus, the first thing AC-4 does is to up codec efficiency.

From the main text you’ll note that another core advantage is multiple presentations and higher channel counts – to 7.1 (plus four overheads) in the first instance. It’s this, in the future, that could turn into an object-based delivery in the same way that Atmos is object based. In the mean time though the vision is to deliver a listening experience that can be customised for, and by, the listeners and viewers.

Other headline features for AC-4 include the following:

Coded audio frame alignment with video framing (configurable)

In other words, audio frame sizes can precisely match the video frame size, which allows the AC-4 data stream to be edited/spliced at video frame boundaries to maintain synchronization without the need to decode and re-encode the audio.

Intelligent Loudness -Â Fully automated loudness management means more precise control and eliminates problems with cascaded processing. It acts across a wide range of devices and applications (home theatre to mobile) and can be configured to align with numerous worldwide standards and/or recommendations. AC-4 supports the carriage of the full set of EBU loudness metadata, and a method of ‘signing’ this so that if this is changed then there is ‘tamper evidence’ and also a suitably equipped loudness processor could be switched into bypass mode in the knowledge that the source has been correctly loudness managed.

Advanced Dialogue Enhancement – End-users can have control of the dialogue level in relation to other sounds in the programme – suiting individual hearing needs and preferences. Users can modify the dialogue level guided by information from the encoder or content creator, both with and without a clean (separate) dialogue track presented to the encoder.

â€¢ Advanced Accessibility – Service providers can easily and efficiently deliver secondary audio in 5.1 surround sound for the visually impaired without doubling the file size or bitrate.

Designed for adaptive streaming and advertisement insertions – Bitrate and channel configuration can be switched without audible glitches.

Dual-ended post-processing – Metadata driven post-processing leverages media intelligence to optimize the experience across device types and ensures that only a single instance of each post processing algorithm is enabled throughout the entire chain.

Multiband DRC (Dynamic Range Compression) engine -Â AC-3 and E-AC-3 have single band DRC. There is also the facility to either use one of the existing E-AC-3 profiles, or a custom DRC profile can be defined.

Auro-3D

There has been some confusion over Auro Technologies’ Auro-3D – maybe because the ideas of 3D audio and object audio got melded together in people’s heads, or possibly because it too is on the trail of an immersive experience in cinema. Although, the ambitions for Audio 3D are little more wide ranging than that, possibly because the audio is encoded into a standard 5.1 stream in such a way that without codec the 5.1 is replayed unaffected, but in the presence of an Auro-3D codec, it’s full majesty is unpacked. The name of this wizardry is the Auro-3D Octopus Codec. It is not wholly lossless, but Auro-3D claims near lossless performance.

The actual audio output format of Auro-3D ranges – currently up to 13.1, though the basic characteristic is that the system is thought of in terms of ‘levels’ of height – you could also think of it as several planes of Panning. The layers are termed ‘lower’ (listener level), Height (imagine the upper part of a dome), and Top (the top of the dome).

Auro Technologies calls the Height layer the most important layer in the creation of immersible sound around the listener. It’s situated at 40-degrees above the listener and, according to the company, “The capturing of ambient reflections in this layer allows Auro-3D to deliver a more natural sound and improve sound source localisation. The top layer caps off the ‘enveloping sphere’.

A creative suite of production tools are available, including the Auro Codec plug-in for the final print, the ‘AuroMatic’ upmixer for anything from mono to 5.1 sources, the AuroPan plug-in for importing ‘acoustical fingerprints’ from the set, and the AuroPan plug-in for spacial panning over the Auro-3D bus.

In 2013, Audio Technologies announced that it would be joining with Barco (which incorporates Auro-3D into it’s cinema equipment) and DTS to develop an open format for object-based cinema sound…