Where the Echoes Ring | William Bennett

Dynamic Geometric-Based Reverb

Dynamic Geometric-Based Reverb

The study of the propagation on sound with rays is know as geometric acoustics, or ray acoustics
Since light and sound can both be represented as waves or rays, we can use similar concepts to apply light to audio
Applying ray tracing to audio allows us to replicate the effect known as reverb
This allows for more realistic sounding virtual environments

Reverberation

Also called reverb
A collection of sound waves
Based on environmental factors
- Volume and geometry of the room
- Material absorption factors
  - How much energy a material absorbs
- Scattering factors
  - Different materials scatter energy differently

Below is an example of reverb:

Piano scale no reverb

00:00 / 00:05

Piano scale with reverb

00:00 / 00:05

Common Techniques

Algorithmic
- Can sound realistic
- Customizable parameters
  - Delay, room size, density, frequency filtering, etc.
- Does not represent a real place
Pre-Recorded Impulse Response (IR)
- Recorded in a real place
  - Captures scattering, material absorptions, and the size of the room
- Normally produced by a clap, popping a balloon, a gunshot, etc.
- Requires equipment to be taken to a location
- Uses convolution to apply the IR to the sound

Impulse Response

Reaction of a dynamic system to an external change
This is what is responsible for how much reverb is added to a signal
Made of 3 parts
- Direct path
  - Direct unblocked path from the source to the listener
- Early or specular reflections
  - When the sound bounces perfectly off an object
- Late or diffuse reflections
  - These reflections are spawned when sound hits an object and is scattered
  - Diffuse reflections are not calculated in this model
    - For every bounce, we would produce a number of rays that would increase the number of rays exponentially

Direct Path

Early Reflections

Late Reflections

Setup of Project

In order to get into the implementation of the impulse response generation and convolution, we need to set up a few things
Goal of the thesis artifact
- Play 3D audio and apply audio effects
- Generate an impulse response
  - Ray tracing
  - Compute shaders for parallelization of generating rays
- Apply an impulse response to an audio signal
  - Fast Fourier Transform and its inverse
  - Convolution
- Goal:
  - Audio with reverb that matches the virtual environment

Audio Engine

This project was done in my personal C++ game engine
XAudio2 API
- Made for game developers
- Already using DirectX11 rendering engine, so it seemed like the most reasonable choice
X3DAudio
- Extension to XAudio2
- Allows the use of listeners and emitters
XAPO
- These are Cross-Platform Audio Processing Objects
- This allows us to define our own audio effects

Ray Tracing

We use rays to propagate the sound energy across a space
Ray versus voxel grid
- The world geometry is made of 1 x 1 x 1 blocks in a voxel grid pattern
- There is an algorithm that computes a ray vs voxel grid really fast
- Utilize that the ray will cross the intersections at set intervals

Below shows the concept on a 2D grid, but it can be applied in 3D as well.

When the ray hits on of the walls, floor, or ceiling it reflects
- We also keep track of a few variables
  - Increment the number of bounces
  - Compute the reflected energy
  - Add the ray length to the total path length

Ray versus sphere
- The listening and emitting points are infinitely small
  - Almost impossible to hit with a ray
- Spheres are used instead to give volume and increase the chance of the ray hitting
When the ray hits the sphere
- We record the number of bounces, energy remaining and ray length plus total path length

Compute Shaders

We can utilize the GPU for the extra computation power that is needed
For a more accurate simulation, the more rays traced, the better
- The current simulation uses over 16 million rays that bounce 30 times
- The reason for so many is that most of them likely won't hit the target
It's part of the programmable shader pipeline in DirectX11
- Because I have a DirectX11 renderer, it was a matter of just hooking it up to the system
Need to precompile the shader
- The shader is long and can take several minutes to compile
- This allows us to start the program significantly faster

Fast Fourier Transform

Optimized algorithm to compute the Discrete Fourier Transform
Fast Fourier Transform (FFT) is used to transform a signal from the time domain to the frequency domain
Its inverse IFFT, is use to transform a signal in the frequency domain into the time domain
We use the Cooley-Tukey method
- Works best when the signal has a length with a power of 2

Impulse Response Generation

Once all the rays are collected, we still need to put it in a useable form
An impulse response is just the collection of those rays
- It still needs to be sorted based on the time it would arrive
The length of the array divided by the speed of sound tells us how long it took the ray to bounce back the sound
Multiplying the result by the amount of energy remaining in the system allows us to mimic different types of environments
The result of this describes an impulse and a collection of impulses summed together gives us the Impulse Response we were looking for

Programmatically Generated Impulse Response

Pre-generated Impulse Response from a Transit Center

Convolution

Convolution is a mathematical operation that uses 2 functions to create a new 3rd function
In DSP, we use it to map the Impulse Response to an audio signal
- The type of convolution that is done is called Discrete Convolution
Reasons to use
- Constantly getting new input, infinite length
  - A new frame every 10 milliseconds
- Impulse Response is very long, but it is finite
- Because of the length differences, we can decompose the signals into different blocks
- Uniform Partitioning
  - Treat each frame as an equal, uniform, block
  - Break up the Impulse Response into these blocks
- This is where FFT comes in
  - Convolution in the time domain equals multiplication in the frequency domain
    - This is from the convolution theorem
  - Much faster than computing convolution in the time domain
- Considerations for Convolution
  - It's expensive and you can't slow down the audio thread
There are multiple algorithms to be considered that have pros and cons
- Overlap save and overlap add algorithms
  - Very similar approaches
  - Both require 1 + N FFTs to be performed depending on the number of blocks to be considered in the new signal
    - About 1.9 ms per FFT
  - Not really feasible for multiple audio sources
  - Even with threads, it's not guaranteed to be finished on time
  - Works best for offline computing of the signal
- The algorithm opted for was the Frequency Domain Delay Line (FDL) algorithm
  - Only computes 2 FFTs per frame
    - Takes about 1.42 ms - 3.8 ms
  - Allows for more frames of previous audio data to be used
    - Was able to use up to 16 frames of data
  - Main features
    - List of blocks from previous frames
    - Doesn't have to keep track of extra data
    - Has a buffer for adding and storing each FDL block and Impulse Response block filter
  - Drawbacks
    - Needs to be threaded and operate at least 1 frame behind if we want multiple audio sources at once
      - FFT computation time
    - Still has a limit on the number of audio sources, but now more are allowed

Number of Bounces and Rays

There are only 30 bounces because, at that point, the energy has dropped below a defined energy threshold
- Energy <20%
These impulse responses were recorded in the same place and have line of sight with the audio source
The biggest differences that can be noticed between them is the difference in the magnitudes
- These magnitudes are the energy recorded

262,144 Rays

1,048,576 Rays

4,194,304 Rays

16,777,216 Rays

Drawbacks

Computationally intensive
Impulse response generation cannot be done in real-time
- There's a delay that freezes the simulation when you generate ad new impulse response
- The time for impulse response generation increases as you add more audio sources
Digital signal processing for convolution each takes a thread to run their computations

Applications

Saving off the impulse response
- Using trigger volumes, a new set of impulses can be loaded and applied dynamically in real-time
- Similar to light probes that are used to bake light into a scene, use this same technique with the impulse responses and fade between them
Quickly see how different sized spaces and materials affect sounds
More accurate scenarios
- Fire fights, horror, sneaking, etc

Future Work

Implement ray versus convex objects
- Have more interesting geometry
- Ability to see how the placement of objects affects the impulse response
Consider all frequency bands
- Currently each band is treated the exact same
- Each band absorbs and scatters different amounts of energy
Consider other elements outside of reverberation such as occlusion and obstruction
Handle diffuse reflections in a way that does not increase the number of rays exponentially
Optimizations to the HLSL shader code and the convolution algorithm

Dynamic Geometric-Based Reverb

Dynamic Geometric-Based Reverb

The study of the propagation on sound with rays is know as geometric acoustics, or ray acoustics

Since light and sound can both be represented as waves or rays, we can use similar concepts to apply light to audio​​

Applying ray tracing to audio allows us to replicate the effect known as reverb

This allows for more realistic sounding virtual environments

Reverberation

Also called reverb

A collection of sound waves

Based on environmental factors

Volume and geometry of the room​

Material absorption factors

How much energy a material absorbs​

Scattering factors​

Different materials scatter energy differently​

Below is an example of reverb:

Common Techniques

Algorithmic

Can sound realistic

Customizable parameters

Delay, room size, density, frequency filtering, etc.​

Does not represent a real place​

Pre-Recorded Impulse Response​ (IR)

Recorded in a real place

Captures scattering, material absorptions, and the size of the room​

Normally produced by a clap, popping a balloon, a gunshot, etc.​

Requires equipment to be taken to a location

Uses convolution to apply the IR to the sound

Impulse Response

Reaction of a dynamic system to an external change

This is what is responsible for how much reverb is added to a signal

Made of 3 parts

Direct path​

Direct unblocked path from the source to the listener​

Early or specular reflections​

When the sound bounces perfectly off an object​

Late or diffuse reflections​

These reflections are spawned when sound hits an object and is scattered​

Diffuse reflections are not calculated in this model​​

For every bounce, we would produce a number of rays that would increase the number of rays exponentially​

Direct Path

Early Reflections

Late Reflections

Setup of Project

In order to get into the implementation of the impulse response generation and convolution, we need to set up a few things

Goal of the thesis artifact

Play 3D audio and apply audio effects​

Generate an impulse response

Ray tracing​

Compute shaders for parallelization of generating rays

Apply an impulse response to an audio signal​

Fast Fourier Transform and its inverse​

Convolution

Goal:​

Audio with reverb that matches the virtual environment​

Audio Engine

This project was done in my personal C++ game engine

XAudio2 API

Made for game developers​

Already using DirectX11 rendering engine, so it seemed like the most reasonable choice​​

X3DAudio​

Extension to XAudio2​

Allows the use of listeners and emitters

XAPO ​

These are Cross-Platform Audio Processing Objects​

This allows us to define our own audio effects

Ray Tracing

We use rays to propagate the sound energy across a space

Ray versus voxel grid

The world geometry is made of 1 x 1 x 1 blocks in a voxel grid pattern​

There is an algorithm that computes a ray vs voxel grid really fast

Utilize that the ray will cross the intersections at set intervals

​

Below shows the concept on a 2D grid, but it can be applied in 3D as well.

When the ray hits on of the walls, floor, or ceiling it reflects

We also keep track of a few variables​

Increment the number of bounces​

Compute the reflected energy

Add the ray length to the total path length

Ray versus sphere

Since light and sound can both be represented as waves or rays, we can use similar concepts to apply light to audio

Volume and geometry of the room

How much energy a material absorbs

Scattering factors

Different materials scatter energy differently

Delay, room size, density, frequency filtering, etc.

Does not represent a real place

Pre-Recorded Impulse Response (IR)

Captures scattering, material absorptions, and the size of the room

Normally produced by a clap, popping a balloon, a gunshot, etc.

Direct path

Direct unblocked path from the source to the listener

Early or specular reflections

When the sound bounces perfectly off an object

Late or diffuse reflections

These reflections are spawned when sound hits an object and is scattered

Diffuse reflections are not calculated in this model

For every bounce, we would produce a number of rays that would increase the number of rays exponentially

Play 3D audio and apply audio effects

Ray tracing

Apply an impulse response to an audio signal

Fast Fourier Transform and its inverse

Goal:

Audio with reverb that matches the virtual environment

Made for game developers

Already using DirectX11 rendering engine, so it seemed like the most reasonable choice

X3DAudio

Extension to XAudio2

XAPO

These are Cross-Platform Audio Processing Objects

The world geometry is made of 1 x 1 x 1 blocks in a voxel grid pattern

We also keep track of a few variables

Increment the number of bounces

The listening and emitting points are infinitely small

Almost impossible to hit with a ray

Spheres are used instead to give volume and increase the chance of the ray hitting

When the ray hits the sphere

We record the number of bounces, energy remaining and ray length plus total path length

The current simulation uses over 16 million rays that bounce 30 times

It's part of the programmable shader pipeline in DirectX11

Because I have a DirectX11 renderer, it was a matter of just hooking it up to the system

Need to precompile the shader

The shader is long and can take several minutes to compile

Works best when the signal has a length with a power of 2

It still needs to be sorted based on the time it would arrive

The length of the array divided by the speed of sound tells us how long it took the ray to bounce back the sound

Multiplying the result by the amount of energy remaining in the system allows us to mimic different types of environments

Reasons to use

Constantly getting new input, infinite length

A new frame every 10 milliseconds

Impulse Response is very long, but it is finite

Treat each frame as an equal, uniform, block

This is where FFT comes in

Convolution in the time domain equals multiplication in the frequency domain

This is from the convolution theorem

Much faster than computing convolution in the time domain

Considerations for Convolution

It's expensive and you can't slow down the audio thread

There are multiple algorithms to be considered that have pros and cons

Overlap save and overlap add algorithms

Very similar approaches

About 1.9 ms per FFT

The algorithm opted for was the Frequency Domain Delay Line (FDL) algorithm

Only computes 2 FFTs per frame

Takes about 1.42 ms - 3.8 ms

Allows for more frames of previous audio data to be used

Was able to use up to 16 frames of data

Main features

List of blocks from previous frames

Doesn't have to keep track of extra data

Drawbacks

Needs to be threaded and operate at least 1 frame behind if we want multiple audio sources at once

FFT computation time

Still has a limit on the number of audio sources, but now more are allowed