Convolution Reverberation

From Inspired Acoustics Knowledge Base

Share/Save/Bookmark

Jump to: navigation, search

Convolution reverberation represents the term of reverberating audio material by means of a mathematical algorithm called convolution.

Contents

Reverberation software types

Before jumping into reverberation (or 'reverb' in it shortened form), it might be a good idea to discuss just what reverberation is. Reverberation is a natural sonic effect that occurs when some of the original sound waves are reflected from the internal surfaces within an enclosed space. Some of the reflected sound is in turn re-reflected throughout the enclosed space, again and again. As a result, the when the original sound source stops producing sound, the perceived sound does not end abruptly, but gradually dies away. A reverb unit is designed to simulate this effect digitally or electronically, and this may be accomplished with either hardware or computer software.

There are three basic methods of injecting reverberation into a raw audio signal:

1. Mechanical Hardware reverb units. A copy of the original electrical signal drives a transducer that converts electrical signal into sound waves. The transducer agitates a spring which begins to "resonate" on its own terms, according to the mechanical properties of the given spring. At the other end of the spring, another transducer converts the movement of the spring back into an electrical signal which is intentionally "blurred" by damping out sharp peaks of audio signal, and generally giving the effect that people know as a continuous "echo." The resulting signal is mixed with original signal. A "plate" reverb uses a metal plate instead of a spring, but the principle of operation is the same.

2. Analog Electronic. A copy of the original signal feeds a "bucket brigade" delay line, in which diminishing delayed copies of the original signal feed further stages. The output of some or all of the stages are mixed with the original signal, as well as feeding earlier stages. This usually means some signal degradation (particularly loss of high frequency detail) towards the end of the decaying sound. This is not necessarily a problem, as the same thing happens in real reverberant spaces. Electronic reverbs are typically much noisier in terms of adding unwanted hiss or graininess to the sound than digital units, however. Analog electronic hardware and software make no attempts to mimic the impulse responses of actual acoustic spaces.

3. Digital. Delayed copies of the original signal are diminished and mixed together using complex algorithms to determine delay intervals and feedback intensity. The summed output of the digital processing is mixed with the original signal. Convolution reverb, either as software or incorporated into actual hardware, is an important subset of digital reverb units.

Convolution software uses an impulse response and a dry signal as input. An impulse response (IR) is what results when you feed an impulse to some system. It is the sonic signature of a microphone, loudspeaker, filter, concert hall, or anything else a sound might pass through. An impulse, or “spike,” is typically an extremely short transient that contains all frequencies, like white noise. In the digital realm, this is approximated as a one-sample-long click at full amplitude. The clicks are noted either 44,100 times a second, or 48k, 96k, or 192k times per second. The reverberation that an impulse produces in any acoustic environment, sampled over a number of seconds (times the number of sample responses per second) renders that environment's "impulse response."

Convolution, in practice, can be calculated in the time-domain (using the original convolution formula), in the frequency-domain (using the Fourier transform and then a complex multiplication and an inverse Fourier transform) or in the complex frequency domain using the Laplace transform. When calculating in the frequency domain, usually the fast Fourier transform (FFT) algorithm is used.

The results of convolved sounds are completely dependent on the sources, more so than with most effects. The best way to get a sense of what convolution is like is to listen to an example. Imagine convolving a drum loop with a short, breathy flute sound. The result sounds a bit like someone playing a staccato rhythm on a flute.

Reverberation hardware and software available today can be categorized into 3 classes:

  • 1) Conventional reverberation hardware and software

These are traditional reverberation software and hardware systems, and have nothing to do with convolution. They are based on proprietary algorithms and methods to produce the perceived reverberation on the sound. They often produce the lowest CPU load when used as a software plug-in in a digital audio workstation (DAW).


  • 2) Impulse response-trained conventional reverberation hardware and software

These units are using proprietary reverberation algorithms, and real impulse responses to train several parameters of this algorithm. Such hardware or software are never capable of representing all the characteristics of a room impulse response, but they often provide results with low computational load. Manufacturers may claim that you are hearing audio in the actual room represented by its room impulse response, but you are not.


  • 3) Convolution-based hybrid reverberation software

Appearing nowadays, these software are either based on convolution - but they do not convolve the full length of the Room Impulse Response - or utilize traditional reverberation algorithms as well besides convolution. They are often treated as convolution reverbs, although they are not exactly the same. They may use measured impulse responses to train proprietary algorithms that produce 'similar' results as if convolution was used, or they do some optimization to cut the high demand of computer resources in other ways - for example by utilizing hybrid algorithms, omitting channels, convolving with only parts (e.g. the early reflection) of the Room Impulse Responses and having only one tail, etc.


  • Convolution reverberation software

These software perform fast convolution on the audio on the full range of the Room Impulse Response. As convolution requires a lot of computer resources, you may often distinguish these software from conventional, hybrid or simulation approaches by their relatively high CPU and memory bandwidth loads. The subjective acoustic quality you experience with convolution reverberation software depends on the quality of the room impulse responses it is using, opposed to the above mentioned other approaches, where the processing algorithm itself is what you hear, as it is not transparent. In those cases, you hear the algorithm and the data it is using together - if it is not tied into one piece.

Understanding convolution reverberation

To be able to reverberate a sound (which we will refer to as dry sound from now on), we have to find an algorithm that can calculate the reverberated sound using our original recording and the Room Impulse Response we captured. This algorithm is called convolution.

Convolution, in other words is the way, the method to compute the response [result] of a linear time-invariant (LTI) causal system [the acoustical space, e.g. the concert hall] to a known excitation [e.g. our dry sound]. Unlike any other methods, results of convolution based calculations using impulse responses of real measurements are almost identical (as far as the used equipment is perfect) to what we would indeed record at the very room. The impulse response is in more general the room transfer function from point A to B in the hall. But many other things have transfer functions, such as a piano case or a violin case, which can also be measured.

To understand how it exactly convolution reverberation works, let us first examine the convolution formula (the mathematics), and then visualize all this.

Let us interpret the sounds with discrete values sampled in time. The sound waves, which are continuous are converted to discrete amplitude values (quantization) at each certain point of time (sampling). A sound source would now look like this for the computer (0, 3, 15, 512, -241, -235, etc.). Let us call this sequence e (excitation), and one of its value as e[k], where k means that we are talking about the k-th sample value. We start counting k from zero, so for example, e[k = 0] or simply e[0] equals to 0 in our case, while e[1] = 3, etc. Now let us interpret the [Room Impulse Response] the same way and call it w (from the name 'weight function'). The response would be called y. Now the convolution formula is the summation of the excitation values multiplied by the shifted values of the impulse response. In other words, discrete convolution is defined as


y\left[ k\right] =\sum\limits_{i=0}^{k}e\left[ i\right] \cdot w\left[ k-i\right]

To visualize this, let us have a sound signal (excitation signal) containing 5 sound samples (see Figure 3 below) and an impulse response of 5 samples. Imagine these as short excerpts of a concert hall impulse response and my favorite Symphony from Beethoven, respectively. There are no negative values in these examples for a more easy view. The excitation signal is colored differently at each time value so that you can follow the convolution algorithms easily. The horziontal axis is time, the vertical is the amplitude.

Convolution reverberation methods

Convolution is very resource consuming - especially in terms of CPU and memory bandwidth usage when used in computers. Therefore, similar to many other algorithms, optimization methods were introduced to hasten the convolution process to allow realtime or almost realtime usage for audio applications.

Optimization for realtime convolution, convolution-based reverberation types:

  • Direct convolution (FIR filtering)

The discrete convolution formula means nothing more than 'applying' a FIR filter to the input, where the filter coefficients are the impulse response samples. Traditional personal computers nowadays are often not capable of applying very long impulse responses directly as FIR filters, because of the high computational load. Therefore, optimization is needed. This leads to applying frequency domain convolution, however, transforming a time-domain signal to frequency domain requires all the time-domain samples to exist a priori, which means that there is a significant latency, the delay that is caused by collecting the required samples. The transformation to the frequency domain - called the Discrete Fourier Transform (DFT) - can be calculated very fast with its implementation called the Fast Fourier Transform (FFT). The convolution in the frequency domain is simply a multiplication, so after conducting the multiplication, the inverse Fourier Transform (IDFT) is applied to get the time-domain signal of the result. Although this is very effective computationally, it cannot be applied on live audio because of the latency. A good compromise can be to divide the incoming signal and the impulse response into parts and apply the processing to these parts separately and then combine their outputs. This increases the computational load but decreases the latency. This algorithm is called 'fast convolution', and it is implemented by the partitioned convolution method. Partitioned convolution can be classified as

  • Partitioned convolution or Fast Convolution (mixed time-domain and DFT, or fully DFT-based) with
    • Uniform partitions (of equal or fixed length block of data)
    • Non-uniform partitions (of unequal or variable length block of data)

Other optimization methods are also known. The above mentioned methods produce a mathematically correct output, meaning that given an impulse response and an input data, the output is their convolution, accurately. However, even fast convolution requires a significant amount of processing power, therefore, several types of simplifications are introduced.

Tail reduction methods

A trivial way of optimization is to use convolution for only the Early Reflection (ER) part of the impulse response, and use simplification of the often longer tail parts, which we will refer to as tail reduction methods here. The reduction leads to an incorrect way of producing the reverberation, but may often sound acceptable, or sometimes even unnoticeable - however it cannot be guaranteed.

  • Processing of the room impulse response
    • ER is processed as is for all source positions
    • Tail processing is reduced by tail reduction methods
  • Tail reduction methods
    • Tail processed as a conventional reverb (feedback, all-pass filters, waveguides, etc.)
    • Tail processed by means of convolution, but only for a single source (less channels than required)
    • Tail processed by means of convolution, but on only a single channel, and use de-correlation methods for stereo imagery

Let us have two mono sound files, Dry1 and Dry2, two source positions in a room A and B, and two microphone positions MLeft and MRight, and let \circledast denote convolution. We obtain the output as follows:


WetLeft=Dry_{1}\circledast IR_{A-MLeft}+Dry_{2}\circledast IR_{B-MLeft}


WetRight=Dry_{1}\circledast IR_{A-MRight}+Dry_{2}\circledast IR_{B-MRight}

This is trivial, as the microphone hears A and B at the same time in the same room. Now if we separate all impulse responses to ER (early reflection) and tail (late reflection) parts, we can rewrite the equation above as:


WetLeft=Dry_{1}\circledast ER_{A-MLeft}+DELAY\left( Dry_{1}\circledast
TAIL_{A-MLeft}\right) +Dry_{2}\circledast ER_{B-MLeft}+DELAY\left(
Dry_{2}\circledast TAIL_{B-MLeft}\right)


WetRight=Dry_{1}\circledast ER_{A-MRight}+DELAY\left( Dry_{1}\circledast
TAIL_{A-MRight}\right) +Dry_{2}\circledast ER_{B-MRight}+DELAY\left(
Dry_{2}\circledast TAIL_{B-MRight}\right)

Note that we have to delay the tail part in order to make it appear right after the early part, this is why the DELAY is introduced.

Some convolution reverb software may use only two tails instead of the four above. This leads to a completely different result, but some may find it convincing enough, as the tail part often contains so many reflections that is might seem as a decaying random noise. Let us use the name TAIL_X-MLeft and TAIL_X-MRight for this new tail. Now the two channels can be calculated as follows:


WetLeft_{tail\_reduced}=Dry_{1}\circledast ER_{A-MLeft}+Dry_{2}\circledast
ER_{B-MLeft}+DELAY\left( \left( Dry_{1}+Dry_{2}\right) \circledast
TAIL_{X-MLeft}\right)


WetRight_{tail\_reduced}=Dry_{1}\circledast
ER_{A-MRight}+Dry_{2}\circledast ER_{B-MRight}+DELAY\left( \left(
Dry_{1}+Dry_{2}\right) \circledast TAIL_{X-MRight}\right)

It is also possible to use only one tail instead of two, and later apply some stereophonic adjustment - decorrelation of the channels - to have a stereophonic experience, but this is not discussed here.

One other optimization method is to remove the crosstalking channels, namely A-M2 and B-M1 and only process acoustic routes of A-M1 and B-M2. If we use one tail and this channel reduction, we can write


WetLeft_{channel\_and\_tail\_reduced}=Dry_{1}\circledast
ER_{A-MLeft}+DELAY\left( Dry_{1}\circledast TAIL_{X-MLeft}\right)


WetRight_{channel\_and\_tail\_reduced}=Dry_{2}\circledast
ER_{B-MRight}+DELAY\left( Dry_{2}\circledast TAIL_{X-MRight}\right)

but please note, that these three methods produce different results and only the first one is correct in theory. Actually, when there is a chance to compare them, the first method would sound best and most natural. In terms of math:


WetLeft\neq WetLeft_{tail\_reduced}\neq
WetLeft_{channel\_and\_tail\_reduced}
WetRight\neq WetRight_{tail\_reduced}\neq
WetRight_{channel\_and\_tail\_reduced}

contribution