Soundfont specification

Wavetable synthesis

Wavetable synthesis is fundamentally based on periodic reproduction of a single-cycle waveform. The audio samples or waveforms are stored as structured data that is known as wavetable. A sample represents the sampled sound of an instrument at a given pitch (but can also be mathematically created).

The samples

For many instruments, the played sound consists of two parts: a transient waveform (initial sound or attack sound) followed by a periodic waveform (recurring sound, sustained sound). A piano has an initial sound that includes the hammer striking the string, then a decaying sound of the string vibrating.

Here an example of a recorded Slap Bass sound, you can listen the recorded sound as uncompressed wav file or as compressed mp3 file :

Slap Bass sound recording

We can clearly distingish the two parts of sound:

The transient (attack) part
The periodic (sustained) part
Slap Bass sound recording Slap Bass sound recording
Listen as wav or mp3
Listen as wav or mp3

To make the instrument sound rendered by the synthesizer as realistic as possible, both parts are generally stored (in the wavetable) and used.

The timbre of an instrument is composed of multiple harmonic or non-harmonic partials (individual sine waves) of different frequencies and amplitudes that generaly change over time. But the timbre is not the only important component of an instrument sound, the amplitude variation is another one. Sound synthesis techniques often employ an envelope generator that controls the amplitude of played sound at any point in its duration. One envelope generator commonly used is the ADSR envelope (Attack Decay Sustain Release) that consists of four parameters dividing the envelope in four parts.

Soundfont file Format

The SoundFont file format, developed in the early 1990s by E-mu Systems and Creative Labs, is a brand name that refers to a file format and associated technology.

A SoundFont file contains one or more sampled audio waveforms (samples) which can be re-synthesized (by using an interpolation technique) at different pitches, each sampled waveform may be associated with a range of pitches.

Such a SoundFont file format contains uncompressed samples in PCM format (similar to wav files) that are organized and used to create virtual instruments. It also contains other music synthesis parameters (such as vibrato effect, velocity, etc.). The quality of the produced instrument sound depends on the samples quality and parameters settings.

MIDS Soundfont

Once again, as my primary goal was to render music using a minimum of cpu-time, I've made some choices and compromises. These choices have course an impact on the synthesizer capacities and also on the quality of produced waveforms.

I have studied the 2.0 version of the SoundFont file format, the interpolation techniques used for pitching samples and some parts of the techniques used for advanced features (such as vibrato). The advanced features need a lot of computations that increase the execution time. On the other hand I've estimated in my first approach that the attack part of the sound could be replaced by the the attack part of the envelope of the sound (in fact that was not an as good as idea than I though and after finishing the latest version of my synthesizer I've a little bit changed the way samples are played).

The samples used in my MIDS soundfont format consist of only one part. That means each waveform corresponds to a sample that can be played in loop (or not). When a sample is defined as looped, it is played in loop whatever the waveform that relies to.

So, if you want to use samples that contains attack part (transient part) the best way is to use a large size sample and to define the release time of the ADSR envelope to cut off the played note according its duration.

My synthesizer supports samples in PCM format (mono, 16 bits, sample rate = 22050 Hz) which maximum size is about 64 kb. That corresponds to a maximum duration of about 1.5 s.

Example of long sample cut using ADSR (listen as wav or mp3)

Slap Bass sound recording

Example of short looped sample, without attack part, using ADSR (listen as wav or mp3)

Slap Bass sound recording

As you can see (and listen!), the produced sounds are not the same. The instrument sound is more realistic by using the first sample.

What kind of sample sould you use? the answer is depending on many things:

  • Samples having attack part are more realistic but have a limited duration (1.5 s) and may use over 64kb.
  • Looped samples use less memory, the note duration is not limited, but the sound is less realistic.
  • You have planed to produce a game music? a good quality music?

Thus you will probably have to perform some tries before having the sound you want, however I've always found a good solution for all songs I've converted.

Even if this point is described in tutorials, I think that is useful to remind it in this section. Using my MIDS soundfont editor, for enabling samples not played in loop to be smoothly cut off (according to the note duration) you must check "Enable Note Off", then set the sustain value or the decay min value to 100% and finally adjust the release time as illustrated by the example bellow:

Enabling note off

If you uncheck "Enable Note Off" (the "Loop samples" is not checked) the sample will be played along its duration.

If you check "Enable Note Off" (the "Loop samples" is not checked) but you do not correctly define the release delay (as described above) the sample will be suddenly cut off, producing an audible transient sound (as a "click").

The way the samples are stored in MIDS Soundfont is quite similar as for Soundfont 2.0 version from Creative Labs:

All sample points are surrounded by continuous audio data. Sample data points are provided beyond the last point (loop end point): This data is identical to the data at the begining point (start of the loop). Sample data points are provided before the first point (start of the loop): This data is identical to the data at the ending point (loop end point). Four valid data points are present before the loop start and after the loop end, this is illustrated by the picture bellow:

Sample storage

The sample data points are the blue ones (within the blue area), the four red points on each side are the duplicated sample data points (within the yellow area).

You do not matter about that point, the Soundfont editor does the job for you. The only thing you have to do is to well define the sample as it can be looped without any transient sound. That means you have to find the starting point end the ending point of the sample. Such loop points are often included in the samples found in soundfonts, you can edit them by using the Vienna Soundfont Studio from Creative.

For waveform editing you can use any audio editor, even if some editors have more powerful and useful features.

You can download the MIDS soundfont specification from the downloads area.