Google's goofy but cool Cardboard virtual-reality platform is getting an audio upgrade. On Wednesday, Cardboard project manager Nathan Martz announced via the Google Developer's Blog that software development kits (SDKs) for both the Java-based (Android) and Unity-based (iOS) implementations of the VR headset API would be getting support for audio spatialization. Now, Cardboard apps will be able to "produce sound the same way humans actually hear it."
What does that actually mean? Martz gives two illustrations:
1) The SDK combines the physiology of a listener's head with the positions of virtual sound sources to determine what users hear. For example: sounds that come from the right will reach a user's left ear with a slight delay, and with fewer high frequency elements (which are normally dampened by the skull).
2) The SDK lets you specify the size and material of your virtual environment, both of which contribute to the quality of a given sound. So you can make a conversation in a tight spaceship sound very different than one in a large, underground (and still virtual) cave.
Spatialized sound is a tricky thing and is mostly neglected in audio processing. And mostly we don't care either because starting at a screen isn't much of an immersive environment to begin with. Cheap stereo effects will very often do the job just fine.
Screenshot from Google's sample app for audio spatialization.
Virtual reality naturally demands more. The visual landscape is after all only one part of the total possible 'scape. The Google documentation for Cardboardoffers this advice: "Consider using environmental audio to make the application more realistic, and to draw the user's attention to various areas of the app. Audio provides a way of communicating the entire scene to the user simultaneously, without requiring the user to move their head to look around to discover their surroundings."
Spatializing sound is computationally expensive, however.
In the real world, sound arrives at each ear at very slightly different times — maybe off by less than a millisecond. Sound also responds to environments, reflecting more off some surfaces and being absorbed more by others. Getting things computationally right for an immersive audioscape means calculating lots of different delays all at once, some of which might be just a few dozen samples long, but still long enough to matter.
Martz offers two particular optimizations or potential optimizations. First, the SDK allows for audio processing to occur in a separate thread from other processes being handled by a smartphone's CPU, which means that it's not part of the normal stream of interwoven instructions being handled by a processor at any given moment. It kind of happens off to the side. Second, developers are able to prioritize different sounds within an audioscape, allocating more or less processing power to things that might be considered more important.