I received the following e-mail from a mechanical engineering friend of mine. I thought some developers/users could cast some light on some of his questions... thanks!
In DSP/audio sampling of reverb (based on the fourier transform of the impulse response of the system), is it standard among the more professional softwares to use the energy decay envelope, or to use the highly complex/accurate sound pressure response?
Am I correct in assuming that the programming to solve the output (the inverse fourier transform of the fourier of the system response and the forcing function multiplied together) is a numerical solution as opposed to some sort of closed form approximation?
If it is a computational solution how is it done so fast?
The computational solutions that I program generally take over 20 min to run and generate the output. The source of this problem I think is my use of recursive functions run on several very large arrays.
Thanks for any explanation of how this is accomplished.
I\'ll see if I can help... the questions are a bit technical so the answers will probably need to be too.
>is it standard among the more professional
>softwares to use the energy decay envelope, or
>to use the highly complex/accurate sound
I\'m not sure what the exact definition is of the energy decay envelope, but in any event the term isn\'t often used with respect to convolution. If it is the decay envelope for the various frequencies across the spectrum, as measured by the fft at different time intervals, then this isn\'t used in any approach I\'ve seen; neither is the total energy content decay envelope. The second guess seems to be a good match to how it works - the recorded sound pressure response of a room to an impulsive sound (extremely short - theoretically infintessimally short) is called the impulse response, or simply the impulse. To perform the convolution, typically the impulse is transformed to the spectral domain (via fft normally), and multiplied point by point with the transform of an equal length window of the input signal. Before the transforms, both the impulse and input are zero padded to double their original length, so that there will be no aliasing in the time domain. The result is then inverse transformed, which puts it back in the time domain, convolution accomplished. For details a fairly good intro is Numerical Recipes in C which is free on the web.
>Am I correct in assuming that the programming to
> solve the output ... is a numerical solution as
>opposed to some sort of closed form approximation?
There may be closed form approximations, but they aren\'t typical as far as I know - probably because they would reduce accuracy a lot. There are methods that trade accuracy for speed, while still using the fft/mult/ifft technique.
>If it is a computational solution how is it
>done so fast?
>The computational solutions that I program
>generally take over 20 min to run and generate
The first thing to do is to use a fast fft. FFTW from fftw.org is free and extremely fast. It may be more than three times faster than what you\'re using. The standard technique for fast convolution is to first do some prep work. Start out by doubling the length of the impulse by appending zeros to it(zero padding), and then zero pad it further until you reach a power of two length. Then fft transform the impulse and SAVE it. This lets you reuse the transform again and again without recomputing it - the transform of the impulse will always be the same, so saving it alone can cut computation time by 33% or more. Also plan to always use the real fft/ifft rather than the complex fft/ifft - this saves somewhere around an additional 40%.
Now you can start convolving with the input.
Partitioning the input with lengths equal to the zero padded impulse length is the standard method. If you don\'t window, then you\'ll be stuck with doing an extremely long fft of the input and an equally long ifft of the multiplication result. So.. make simple rectangular windows of the input with lengths equal to the zero padded impulse\'s length. Transform each input window and do point by point multiply of the transformed input with the saved impulse transform. Inverse transform the result. This method is called overlap and save - with this method only the second half of the ifft result (now in the time domain) has any meaning. Thus, when you take your next input window, you must advance the window by only half the length of the window. Really I can\'t do this explanation justice so the best thing to do will be to look up \'overlap and save\'.
Beyond this, there are specialized algorithms for different applications. One thing to do is to make sure to *absolutely* avoid or minimize disk accesses during the transforms. Likewise if you really get serious, you minimize RAM access and do as much as possible in cache. Overlap and save or some otherwise decent algorithm generally takes care of the disk access issue for you, but if you\'re doing 100 convolutions at a time, maybe you\'ll need to keep it in mind. There\'s much more you can do to optimize, but this might help to start.
Thanks for the reply, but this leads to a few more questions...
How would a closed form solution be LESS accurate? It seems to me that as long as your equations are accurate, the closed form should always be more accurate than computational methods, and probably faster to deal with the arrays. It may be a mute point for audio cause from what I understand the system is nonlinear and so it must be done by computational methods but Im not sure about that.
How do Fourier Transform and Fast Fourier Transform differ?
That is are their integrals defined differently?
Thanks again for your explanations,
Hope we\'re not going too far off topic. Convolution is getting used quite a lot by musicians now, so that will be my excuse :;
>How would a closed form solution be LESS
My mistake. The discrete convolution operation as defined (and possibly the continuous version), is already in closed form. Both the discrete convolution and the discrete fft/ifft are closed, since they are summations with multiplies in the sum steps. I didn\'t know the exact meaning of \'closed form\'.
>It seems to me that as long as your equations
> are accurate, the closed form should always be
>more accurate than computational methods
I think you may be looking at the world from a continuous viewpoint. All audio processes, all digital processes in fact with maybe a few exceptions, use discrete data and discrete operations. So we don\'t need any numerical methods to approximate the discrete convolution or discrete fft (in fact the fft is discrete by definition). But then I could be misunderstanding the meaning of closed form still.
>How do Fourier Transform and Fast Fourier
The fourier transform I think generally means the continuous fourier transform, which is defined using an integral. The fast fourier transform is a special (fast algorithm) to compute the discrete fourier transform. The most obvious difference between the discrete fourier transform and continuous fourier transform is that the DFT uses a summation rather than an integral. Makes it in theory much easier to compute!
The fast fourier transform algorithm exploits properties of the natural exponential (one of the multiplicands in the DFT summation), so that it can divide the DFT into smaller steps that as a total take less time to calculate than the original DFT. By recursively dividing each already divided step into yet smaller steps, the FFT hugely reduces the number of calculations it performs. The benefit of this is that it can compute the DFT very very fast, and also the reduced operation count makes this algorithm extremely accurate.
Since audio convolution (typically) uses the FFT, a relatively insignificant multiply of transformed spectral data, and then the IFFT, it is both extremely fast and accurate. At least it\'s super fast compared to brute force convolution as carried out by definition. Which still doesn\'t help when a system goes down to its knees from fast convolution [img]images/icons/smile.gif[/img]
You asked about non-linearities - convolution can only model linear behavior, so when faced with replicating something non-linear like the behavior of a compressor or a tube amp, we throw up our hands and do it another way. Traditionally at least.. there\'s also dynamic convolution which models nonlinearities. I haven\'t investigated it yet so I don\'t know how it works. It seems you record a lot of impulses of the behavior of a system at different points (dynamics) of nonlinearity.
Im not EE or CompSci so Im probably not thinking or defining things along the same lines as you. When I say closed form Im thinking of a well defined symbolic function(s) that can be transformed by a solvable integral (by hand even if your patient enough, and have an integral table) to the frequency domain and back to the time domain.
I still dont quite grasp why the DFT is so much faster to compute, doesnt the pc end up with massive arrays to recursively multiply? Thats what my problem is, my code requires the processor to go through element of the frequency array multiplying by the exponent, which is also a function of the time array, summing the result along the way. That just give me the array for the FT of the forcing function, luckily the system response was kept fairly simple. Like what was I said above I ended up waiting nearly half an hour for the computer to run through maybe 20 lines of code
I think nonlinear system responses can be solved by convolution but not in a closed form (symbolically). Im working on a nonlinear differential equation right now. Im expected to write a program [img]images/icons/frown.gif[/img] that numerically solves the system. So I better get crackin...
Hey Alan, thanks for posting question on this forum,