the other day I had a discussion with Christian (main dev of LinuxSampler) regarding what's the best acceptable tradeoff between percormance and accuracy when it comes to sample playback.
The reason why I'm asking is because we want to maximize performance (aka polyphony, lower CPU usage etc) avoiding complicated sample playback algorithms which will not bring tangible advantages in practice.
We would like to hear your opinions as very demanding sampler users (both in terms of audio quality and system performance), opinions which will be valuable to influence the direction of the development of LinuxSampler.
Now to the question: Is sample-accurate rendering needed ?
Let me elaborate a bit:
Basically as we know most virtual instrument plugin standards like VST and others allow for sample accurate audio rendering.
This means for example that if you have a midi sequence in your sequencer and use a VST plugin to render the audio
then you will not be limited by the resolution of physical MIDI (which takes about 1.1msec to transmit a note-on/off message) but you can get down to the resolution of a single sample which at 44.1kHz equals to about 22usecs = 0.022msec.
The problem of handling sample accurate rendering in software samplers (and software instruments in general) is that since audio playback in PC enviroments is implemented by sending a buffer of N samples to the soundcard, the most efficient way to play a sample is to try to avoid complex routines and algorithms within the innermost audio rendering routines and sample accurate rendering requires some additional program logic which can decrease the overall performance in the playback.
Let's take the example of playing a sample with a simple exponential envelope.
Assume a buffer of 64 samples. (1.5msec worth of data at 44.1kHz).
If we support sample accurate rendering and want to start the playback of the sample at buffer offset 19
we need insert a branch instruction (if .. then) and calculate when we will sample offset 19 to start rendering. (this must not be within the inner loop so the the performance hit will not be so big).
But enveloping (and filters and any synthesis paramenter in general) do need some special checks which can sometimes be time consuming this decreasing the performance.
For example when doing sample accurate enveloping you have multiple ways to implement it.
One could be calculating the offset where the envelope starts, set the parameters and then calculate new envelope values within the innermost audio rendering loop.
Or you could (LinuxSampler uses this approach currently), pre-render the envelope information for the whole audio buffer (64 samples in the above case) and then just have the inner audio loop fetching those values from a table.
Unfortunately this causes a performance hit since you are moving lots of values back and forth in memory (even if they are cached in the CPU's cache it will still eat some CPU).
So our question was wheter sample accurate rendering is really necessary in practice or if it's just overkill.
For example if you take normal MIDI devices using regular MIDI cables. As we know MIDI works using a serial connection with 31kbit. Each midi message is made of one or more bytes (for example a MIDI NOTE ON is made of 3 bytes). Transmitting 3 bytes over the MIDI cable takes about 1.1msec.
So for example if you play a 7 note chord on a MIDI master keyboard connected to a MIDI device, the latest note will start sounding about 7msecs later.
But it still seems acceptable even for live play since all electronic instruments still use MIDI as a commuication protocol. (let's forget about firewire,m-lan etc for now since they are not widespread yet).
Of course with serial MIDI communication you can hit the wall pretty fast, especially when playing a full (16 parts) arrangement through a single MIDI cable. Timing gets sloppy, and you have to do tricks like moving non rythmic parts off the bar/beat marks to avoid the drums, basslines etc sounding like played by a bad musicians.
Now enter the PC:
When a sequencer like Cubase sends MIDI events to VST plugins it does not have to go through the slow external MIDI protocol. It uses the computer's RAM which is blazing fast and the limit of the number of note events per second is only given by the speed of the computer and as we see we can easily achieve sample accurate rendering.
But does our ear hear it ?
For live playback even a minimum of chords smearing (the above example of 7msec delay) can hardly be heard and MIDI devices are used live all the time.
The PC can process many simultaneous MIDI notes without delays. For example assume the 7 note chord example but this time using Cubase and a VST sampler.
If you look at the audio notes will probably all start at the same time because the VST plugin renders N samples (N = current audio buffer size in the sequencer).
So what if the VST sampler instead of doing sample accurate rendering quantized the MIDI events a bit.
For example even if you use a relatively big audio buffer of 2048 samples(for offline rendering or to achieve greater performance).
The VST sampler could quantize MIDI events to an offset that is a multiple of 64. Which would mean that you would loose sample accurate audio rendering and all midi events would be quantized with a granularity of 1.5msec which would be as good as MIDI devices using the normal serial MIDI protocol playing a single note.
The advantage of PC based sampling would be that even if you quantized events if the sequencer sends a 10 note chord where all notes start at the same time, in the resulting audio the notes would still start all at the same time because RAM based MIDI transmission is instantaneous.
So after all this boring explanations I pose my question again:
Is sample accurate playback really needed in most cases (plase make an example where you cannot live without it) or is quantized sample playback with reasonably low quantization times (like the above 1.5msec, alias 64 samples) just fine ?
The advantage of quantized sample playback is that you can use simpler algorithms which means bigger performance.
I think its probably the reason why Gigastudio is more performant that other software samplers.
Gigastudio cannot work as a VST plugin and therefore can make the above assumtions of quantizing the rendering MIDI events at the start of buffer boundaries.
Did anyone make some tests with common VST/AU softsamplers to see if they are really sample accurate, if this sample accurate mode can be turn off and if yes what performance gains you had ?
We would like to hear your opinions and experiences on the sample accurate / quantized sample playback rendering matter.
You could try to do the following, assuming you use a software sampler which provides sample accurate playback: make a sequence in your sequencer and render 2 audio files. One with the sample accurate events and the other where you quantized notes where you use 1.5msec quantization values. Post the files here without telling what one is the sample accurate audio file and which one is the quantized one.
Then let people here on the forum try to figure out which is the one is the quantized one.
Thanks for listening and for your valuable inputs.
PS: for the curious. LinuxSampler is not an useless piece of opensource code but it actually makes sound
Below is a videoclip taken at musikmesse of the PMI Boesendorfer played on a linux-based keyboard, too bad the 76key keyboard cannot bring the Boesi to its full expressivenes.
PS2: porting the sampler to other operating systems is in the works and I think a native VST/AU is not too far away. Plus, many of the GIG v3 format features are already supported too (except convolution).