Introduction

Immersive music should need no introduction, as it has been around in movie theaters for more than a decade. Commercial music releases in Atmos, Sony 360 Reality Audio, and Apple Spatial Audio formats began to appear in 2021. In 2023 something like 80% of the Billboard Top 100 songs were released in Atmos alongside the stereo version. Apple Music is so dedicated to immersive audio that they announced in early 2024 that Apple will pay a bonus of up to 10% more in royalties for all songs that are also available in Apple’s Spatial Audio format, a variant of Dolby Atmos.

It's only logical, then, that audio creators should jump into producing music in immersive formats. While stereo will remain the standard release format for now, consumers are swiftly becoming acquainted with and accustomed to 3D sound experiences. Immersive audio is increasingly prevalent on headphones in various binaural flavors, sometimes incorporating features like headtracking. Moreover, smart speakers, soundbars, many cars, and emerging products like AR goggles also support immersive audio playback. These consumer playback devices are rapidly improving in their ability to reproduce 3D sound, suggesting that the distribution of immersive music will likely expand quickly and potentially become the standard format in the near future.

Building an immersive mix setup is no simple task, but with careful planning, a system can be realized effectively. Key considerations encompass the room setup and acoustic treatment, production software, choice and placement of monitors, monitor control, and room calibration. Let's explore each of these aspects to better organize the planning process for an immersive mix room

Monitoring

Headphones or Speakers for Immersive Music Creation

Stereo music traditionally takes place on speakers in an acoustically treated environment, ensuring that most listeners experience playback that faithfully represents the studio creation. Consumers have increasingly shifted to listening on headphones rather than speakers, and, at the same time, creators have grown to rely on headphones as a primary or significant part of their studio setup. For better or worse, this trend extends to immersive music listening. While movies are often experienced in theaters and home systems with full-range speaker systems, music streaming is mainly consumed through headphones, Bluetooth speakers, or in automobiles. Consequently, creators need to understand how their immersive music translates across various playback systems 

Our brain naturally recognizes the space and sound localization of a multichannel speaker system because the rear speakers are physically a few feet behind us and overhead sounds are emitted from speakers above our head. When played back on headphones, immersive multichannel music relies on psychoacoustic techniques to simulate a spacious mix despite only having two sound sources located very close to the ears. With standard headphones, immersive playback can approximate spaciousness that expands beyond a traditional stereo soundfield. 

Headphone playback relies on a binaural translation, utilizing predefined reverbs and other processing to emulate the effects of sounds coming from different distances and locations. Technologies like Dolby Atmos and Sony 360 Reality Audio generate binaural renders that approximate the 3D panning and distance of a speaker setup to a certain degree. In fact, most immersive creation software can re-render immersive mixes into various formats, including standard 7.1, 5.1, stereo, and binaural for headphones, allowing listeners to experience these mixes even without immersive monitor systems

A woman listening to music with the HEDDphone 2

With at least three different playback systems to target, creators must consider both speaker and headphone versions when making mix decisions to ensure the best experience for the widest audience.

Mixes will inevitably sound different on immersive speaker systems compared to binaural headphone versions, and further complicating things, Apple Music generates a proprietary Spatial Audio headphone version that differs from Dolby’s binaural version. With at least three different playback systems to target, creators must consider both speaker and headphone versions when making mix decisions to ensure the best experience for the widest audience.

The question remains whether we should mix on speakers or headphones, and the answer for most professionals is that the ultimate mix should cater to speaker playback. We do need to reference our mix on headphones to ensure that the speaker mix translates to headphones reasonably well. Headphone playback algorithms are evolving to include custom Head-Related Transfer Functions (HRTFs), and may accurately represent speaker playback. So for now, if you consider yourself a professional immersive mixer, you should work on a calibrated multichannel 7.1.4 or larger system.

Can any Speakers Work for an Immersive Setup?

If you're interested in exploring immersive mixing without a significant investment, you might consider starting with a 5 or 7-speaker system and adding the overhead and LFE speakers over time. These setups could be assembled using your existing monitors along with other affordable speakers or ones you may already have lying around. This type of setup may offer a chance to practice setting up immersive mixes and gain some understanding of how commercial mixes utilize spatial elements. However, this type of setup will not yield professional-grade results. 

Professionals should assemble a proper setup, incorporating the correct number of matched speakers that meet or exceed Dolby’s recommendations for immersive mix rooms. As with stereo mixing, such setups ensure that mixes translate properly to the rest of the world. If a full setup is out of reach due to cost or space limitations, immersive mixes can be started on headphones or makeshift immersive systems, with the final mix taking place in a room equipped with a complete speaker setup. Since immersive mixing often requires significant setup time, this work can be done on a personal system before time is rented in a fully-equipped immersive room. Many mixers already adopt this approach, completing most of their mixes at home and then finalizing multiple songs quickly with the artist or producer at a commercial facility. Working in different rooms this way is also a valuable way to experience monitoring systems that one aspires to eventually own.

Below: HEDD immersive audio installations at Rimshot (UK), Mastering Academy (DE), Marx Audio (DE), and GLAB Studios (SK).

Speaker Mounting

When planning an immersive room, it's important to consider not only where to place the speakers but also the necessary mounting arrangements for each speaker. For example, overhead speakers require mounting to a structure capable of supporting their weight and enabling proper positioning. Speakers designed for immersive setups often integrate with dedicated third-party hardware to facilitate wall or ceiling mounting. Wall studs and ceiling joists can be used as mounting points in some cases, but a truss system is often needed to ensure adequate support and positioning of the speakers.

In smaller rooms, it may be desirable to use surround and overhead speakers that are smaller than the main left, center, and right speakers. While this is acceptable, all other speakers should come from the same brand and product line to maintain consistent tone and performance across the entire speaker system.

HEDD has teamed up with IsoAcoustics to offer a range of mounting solutions for our MK2 monitor range designed for the ceiling and wall requirements of immersive installations.

Browse our Atmos Mounting Gear

Bass Management and LFE In Immersive Rooms

Immersive audio setups require at least one subwoofer dedicated to the low-frequency effects (LFE) channel. The LFE channel requires at least one dedicated subwoofer and its location should be optimized for its in-room performance. Multiple LFE subs are recommended for large rooms, where additional output and headroom are needed.

One or more additional subwoofers can be employed to support the low-frequency performance of the main speakers, a practice known as bass management. However, implementing bass management in immersive systems presents challenges. At crossover points above about 80 Hz, listeners can localize the subwoofer. This can lead to confusion, particularly if a floor-mounted subwoofer is used to manage bass for overhead speakers — sounds intended to come from above may appear to come from overhead and the floor. It is therefore important that all the speakers are capable of full-range performance, and bass management should only supplement frequencies below about 80 Hz.

  • Group Delay Compensation

    To achieve true sound integration and system-wide phase linearity between the MK2 monitors and BASS subwoofers, we created a tool that compensates for the longer group delay of the subwoofers. This removes standard problems such as smeariness and disorientation that are often inherent in the use of traditional subwoofers.

  • HEDD Lineariser™

    Ensuring multiple sources are time-aligned eliminates smearing and enhances clarity, a key issue in multi-speaker immersive setups. All HEDD MK2 monitors feature the HEDD Lineariser, an integrated zero-phase filter that will correct any phase issues with a simple individual delay in the DAW. The Lineariser surpasses the usual analog speaker limitations and unlocks new levels of resolution, separation and imaging accuracy.

  • Closed or Ported

    HEDD's Closed or Ported feature combines the best of two monitoring systems into one. Each MK2 monitor offers the option to switch between a closed and ported cabinet, providing the advantages of both: increased precision, transient response, and texture quality in Closed Mode or more impactful bass and enhanced headroom in Ported Mode. The benefits of each mode are multiplied in an immersive setup.

Calibration

Which Interface, Monitor Controller, and Room Calibration Tools?

Regardless of your DAW choice, having an audio interface and multichannel monitor controller is indispensable for routing audio from the DAW or renderer app to your monitors. Interfaces must provide sufficient physical I/O for your speaker setup, while immersive monitor controllers must fulfill two critical functions: managing volume, muting, and soloing for all speakers in the system, and calibrating monitors for level, frequency response, and timing. You can opt for a monitor controller that integrates both functions into one device, or you might prefer room calibration tools and monitor control from different manufacturers. 

You'll need an audio interface with enough outputs to feed at least 12 speakers, along with additional outputs for alternate monitors, headphones, and cue mixes. If you already own an interface with, say, 16 analog outputs or a combination of analog and digital outputs like ADAT optical, you might just keep your current interface or simply add an 8-channel D/A converter to meet the required number of outputs. Next, you'll add a monitor controller, which could be software like GroundControl Sphere from Ginger Audio, or one of the many hardware monitor controllers available from Grace Design, Dangerous Music, SPL, or Trinnov.

Below: The RME Fireface UFX III monitor controller. Source: RME.

High-end monitor controllers often come with a hefty price tag. However, if you're looking to upgrade your audio interface or need one that includes monitor control for immersive speaker systems, many manufacturers now offer suitable options. Companies such as Antelope, Apogee, Audient, Avid, Merging Technologies, and RME provide such devices with prices starting under $3,000. Whether you choose to upgrade your interface or add a monitor controller to your current interface, you will also need to apply speaker calibration to your monitors.

Speaker calibration is essential for immersive mix setups.

  1. First, all twelve or more speakers need to be level-matched to ensure a balanced soundfield.
  2. Second, each speaker must present a similar frequency response to provide solid localization.
  3. Third, a sound played by each speaker must reach the listener simultaneously to reduce comb-filtering and time-smearing effects caused by poor time alignment.

If all three of these factors are properly normalized, mixes will translate successfully to other calibrated listening rooms. Otherwise, you can only speculate on how your mix might sound in another space.

Calibration features are increasingly becoming standard in many high-end interfaces and monitor controllers, offering a convenient option. If you already have an interface and monitor control solution and simply wish to add room calibration, software like Sonarworks SoundID Multichannel offers measurement and correction capabilities compatible with any DAW. Additionally, it can export calibration settings to certain hardware devices, monitor control apps, and even directly to some speakers. Explore the options available as interfaces and monitor controllers, along with the calibration features offered by each system. This will help you assemble a monitor chain that meets the functionality needs of your room.

"Room calibration is essential for immersive mixes to successfully translate to other calibrated mixing or listening rooms. Otherwise, you can only speculate on how your mix might sound in another space."

How to Calibrate an Immersive Speaker Setup

For loudness calibration, Dolby recommends that every speaker is calibrated to generate the same sound pressure level at the listening position, while the LFE subwoofer plays 10 dB louder. A pink noise generator along with an SPL meter or measurement software can be used to measure and calibrate each speaker’s volume. For large mix rooms, pink noise played at -20dBFS should generate 85 dB SPL (C-weighted, slow response) from each speaker. The LFE subwoofer should output 89 - 91.5 dB SPL for its intended frequency range. 85 dB SPL may be too loud for calibrating home studios and modest production rooms, so you may wish to decrease these SPL recommendations by 6 dB. Volume settings for each speaker can be made directly on some speakers or amplifiers, or stored in the monitor controller or calibration software.

Frequency response calibration for each speaker is best accomplished with software or hardware measurement tools, although calibration is possible using pink noise and an RTA. Some of the previously mentioned hardware monitor controllers provide measurement tools, and software like SoundID Reference Measure, Room EQ Wizard, and Smaart provide solutions to measure the loudness, frequency response, and timing of every speaker. They even allow you to port their calibration results to certain monitor controllers and interfaces.

Below: Multichannel Interface for the FLUX::Analyzer stand-alone application. Source: FLUX:: Audio .

Room Acoustics for Immersive Studios

Electronic room correction is essential, but first a room must be optimized with acoustic treatment. Professionals should already understand the importance of acoustic treatments for stereo rooms, and multichannel rooms offer a more complex challenge due to sound emanating from multiple points in the room. We’ve seen so far how time alignment, loudness calibration, and level matching are crucial for achieving an accurate monitor system, now let's delve into some of the key considerations for acoustic treatment.

Immersive mix rooms will require similar considerations and treatment to stereo rooms, but sound is now coming from all four walls and the ceiling and possibly reflecting off many more surfaces. Reflection control is the obvious problem, but just as important are Speaker Boundary Interference Response (SBIR) and space loading considerations. Every speaker in the room will interact with the nearby surfaces, and we have 12 or more speakers to worry about. 

One of the most detrimental effects of placing a speaker near a wall is SBIR, which occurs when low frequencies from a speaker reflect off a nearby wall or ceiling and recombine with the direct sound coming from the front of the speaker. This leads to the cancellation of a specific and narrow range of low frequencies. SBIR can be addressed in one of three ways: (1) by soft mounting speakers to avoid any low frequency reflections, (2) by placing speakers close to the front wall and treating it to absorb low frequencies down to about 150 Hz, or (3) by positioning speakers 2 meters or more away from any wall so that the low frequency problems are pushed to below 30 Hz.

Above: The Mastering Academy in Hamburg explains the extensive acoustic treatment by Dennis Busch it received to ensure it meets the necessary requirements of an Atmos-ready studio.

One or more of these solutions should be implemented, even in a small control room measuring 3.5m x 3m x 2.4m. Option two is usually the most effective solution. Simply mount the speakers as close to the wall or ceiling as is practical and treat the area behind and around the speaker with thick enough absorbers to reach down to 150 Hz or even a bit lower. If we can absorb the low frequency reflection, we can mitigate the notch created by the SBIR.

Here's an example illustrating SBIR: Without any acoustic treatment, positioning a speaker with its face 60cm (24”) from a wall causes a problematic dip at around 140 Hz. However, relocating the speaker so that its face is only 30cm (12”) from the wall raises the cancellation notch to 280 Hz — a more manageable issue. We can see that speakers mounted close to walls and ceilings need at least 10 to 15cm (4” - 6”) of acoustic treatment behind them to absorb a reasonable amount of energy down to 125 Hz, effectively reducing the SBIR notch. Therefore, a practical rule of thumb is to treat the walls behind and around all speakers with at least 10cm (4”) of absorption and position the speakers as close to the wall as possible. This may be difficult in small rooms, but do not skimp on this treatment or your low end will suffer.

Another concern is space loading, which arises when speakers are placed close to a solid boundary such as a wall, ceiling, or corner. This proximity boosts and amplifies the bass output, particularly below 250 Hz. Fortunately, this issue can be resolved by adjusting the frequency response of the speaker(s) using EQ. Space loading stands out as one of the few frequency response problems of a speaker in a room that can be tackled with an EQ adjustment. In fact, many active speakers offer a low-shelf cut control specifically for this purpose.

It's crucial that the decay time of a mixing or production room is consistent across all frequencies, and the decay time in pop music mixing rooms should fall between 150ms and 250ms. Since multichannel systems involve speakers facing various directions, it's essential to evenly distribute broadband absorption throughout the room. Thin absorbers (< 10cm) will only absorb mid and high frequencies, leading to muddy bass. Thicker absorbers will control a broad range of frequencies, from low to high, yielding a more frequency-balanced decay time.

Diffusers aren't effective or necessary except in the largest multichannel rooms. This is because sound sources are distributed throughout the room, and the soundfield should primarily represent that of the playback system, without the room contributing to the sound of the playback system.

Another important consideration for multichannel rooms is to carpet the floor as much as possible. This is because overhead speakers direct sound toward the floor, and side speakers may also produce floor-bounce, which needs to be taken into account.

System Configurations

For different budgets

Entry Level Setup

A 7.1.4 setup focused on 2-way monitors starting at $14,000:

  • DAW with or without Dolby Renderer or Sony Walkmix 
  • Audio interface: Motu 24Ao with Ginger Audio Sphere software monitor control ($1400)
  • Speakers: HEDD TYPE 07 MK2 x 7, TYPE 05 MK2 x 4, BASS 12 x 1 ($14,000)
  • Cabling and stands ($1000)
  • Add-ons: Sound ID Multichannel ($549)
View Monitor Bundle in Shop

Mid-Level Setup

A 7.1.4 setup featuring 3-way monitors starting at $24,000:

  • DAW with or without Dolby Renderer or Sony Walkmix 
  • Audio interface: Audient Oria with SoundID ($3150) 
  • Upgraded interface: RME UFX III + M1610 ($6,000)
  • Speakers: HEDD TYPE 30 MK2 x 3, TYPE 20 MK2 x 4, TYPE 07 x 4, BASS 12 x 1  ($24,000)
  • Cabling and stands ($2,000+)
View Monitor Bundle in Shop

Pro-Level Setup

A 9.1.6 setup featuring 3-way monitors and our custom-built tower starting at $100,000:

  • DAW with or without Dolby Renderer or Sony Walkmix 
  • Audio interface: Avid MTRX Studio ($4999) or Apogee Symph ($5995)
  • Upgraded interface: Grace Design M908 ($9585)
  • Speakers: HEDD Tower Mains x 3, TYPE 30 MK2 x 6, TYPE 20 MK2 x 6, BASS 12 x 4  ($95,000)
  • Cabling and stands ($3,000+)
View Monitor Bundle in Shop

Conclusion

Setting up a multichannel audio system may at first seem challenging, but by considering proper acoustic treatment, speaker alignment, and calibration, you can create an exceptional mixing environment that translates well to other multichannel playback systems.

The consumer’s experience for immersive music, even more so than stereo, can differ greatly from what we hear in a professional mixing environment. Hence, it's crucial to establish an accurate and consistent mixing environment to create a speaker mix and then evaluate your mixes as both binaural and spatial audio versions. This process, especially when starting with multichannel mixing, offers valuable insights into producing mixes that translate effectively in calibrated listening environments as we all as on consumer playback systems

We can expect immersive music technology to develop and change quickly, but establishing a trustworthy monitor environment will remain a valuable asset regardless of changes in consumer playback systems. Meanwhile, be sure to invest your time in listening to commercial immersive releases through both speakers and headphones. Just like with stereo mixing, experience is your best guide to creating effective mixes and you will likely have to adapt some of your old habits to work effectively in this new immersive environment.

Adam Kagan is a mixing and mastering engineer and educator in Los Angeles. He also provides technical and studio design services to studios around the world. Adam is credited on dozens of gold, platinum, and Grammy nominated albums and regularly contributes to TapeOp Magazine.