Explanation
The technique of placing sounds in three-dimensional space to create a realistic audio environment. Sound spatialization encompasses the full pipeline from capture or creation of spatial audio to real-time processing and final delivery through headphones or speakers.
Real-world example
Making voices come from the direction of the characters in a VR film.
Practical applications
- VR cinema: voices and effects that follow on-screen characters
- Video games: locating enemies by sound
- Virtual tours: authentic ambient sound of the location
- Collaboration: colleagues' voices coming from their avatar's position
Audio spatialization pipeline
Capture / Creation
- Ambisonics recording (360° microphone)
- Sound design with manual source placement
- Acoustic simulation of the environment
- Spatialized sound libraries
Example: Recording the ambiance of a train station with a Zoom H3-VR microphone
Real-time processing
- Spatial audio engine (Unity, Wwise, FMOD)
- Propagation, reverb, and occlusion calculations
- Binaural rendering via HRTF
- Optimization to avoid CPU overload
Example: A ricocheting bullet: the sound bounces off virtual walls
Playback
- Built-in or external headphones
- Speakers if the VR headset lacks audio
- Adjustment to the user's HRTF profile
- Perfect synchronization with the visual output
Example: The Valve Index's built-in earphones for optimal spatial audio
VR scenario
In a VR recreation of a historical event, you stand in the middle of a crowd. Conversations buzz around you, each localized. A speaker addresses the crowd from a balcony to your left -- their voice comes from that direction and reverberates off the building facades. A horse passes behind you -- you hear it approach before you see it. Audio immersion is just as important as visual immersion.
Why it matters in professional VR
- Full immersion: VR without spatial audio is like a silent film
- Realism: our brain is highly sensitive to audio inconsistencies
- Natural guidance: directing attention through sound
- Social presence: hearing where people are speaking from creates connection

