Lecture Notes: Sound Processing

Computer Science 1050
Introduction to Computer Science: Multimedia

Lecture Notes: Sound Processing

Note: These notes are based on using the `minim` package which was a common library used in Processing 2. It is still supported in Processing 3, but there has been significant development of a new sound library in Processing 3. For a tutorial about digital audio in general, and use of the `processing.sound` library, see this tutorial from the `processing.org` website.

To work with audio, we will use a package named minim which included in the standard Processing distribution. In these notes, we explore the tip of the iceberg of its capabilities. For more detailed information we point toward:

The minim project home page
Their Quickstart Guide (some of which I'll be borrowing from day)
Their formal documentation

Overview of Digital Audio

In the physical world, sound is composed of waves, the vibrations of which are detected by the ear. As a point of reference, humans have a typical hearing range that can detect wave frequencies anywhere from 20 to 20000 vibrations per second, and the tone known as "Middle C" on a piano has frequency approximately 261.6 Hz (i.e., vibrations per second).

On computers, the analog signals are converted to digital by rapidly taking samples of the vibration levels. (Read more on wikipedia.) By convention, CD-quality sound uses 44100 samples per second (although other sampling rates can be used), and the amplitude of each sample is often measured as a 16-bit number, interpreted as a magnitude from -32768 to +32767. If the audio is recorded in stereo, then there would be a separate left track and right track, with each sample consisting of a left and right magnitude.

When stored as a file, some audio file formats (e.g. WAV, AIFF, AU) store the raw data in uncompressed format (perhaps including some meta information within the file as well). Many other audio formats use compression, in some cases at the possible degradation of quality (e.g. MP3, AAC). In either case, sound files must typically be returned to the traditional sample representation for playback.

Getting Started with `minim`

While minim ships as part of the Processing distribution, to use it you must explicitly import one or more libraries. You may then create a "Minim" instance. The first class we will explore is named AudioPlayer and it effectively manages the storage and playback of an audio sample. As our first example, we consider the following example from the Minim Quickstart Guide:

   import ddf.minim.*;
   
   Minim minim;
   AudioPlayer song;
   
   void setup()
   {
     size(100, 100);
   
     minim = new Minim(this);
   
     // this loads mysong.wav from the data folder
     song = minim.loadFile("mysong.wav");
     song.play();
   }

   void draw() { }  // necessary for interactive playback of audio

In this example, variable song represents an AudioPlayer. That player effectively manages the playback of the audio clip and manages a virtual "playhead" that represents a position within the clip. In addition to play(), an AudioPlayer supports functions pause(), rewind(), skip(ms) to skip forward or backwards a certain number of milliseconds (with negative being backward), and cue(ms) which sets the playhead to the absolute number of milliseconds from the start of the clip. In addition, the length() function returns the length of the clip in milliseconds, and position() returns the current position of the playhead in milliseconds.

To examine individual samples, there are two approaches:

By default, the AudioPlayer groups a series of nearby samples into larger buffers which are transmitted to the audio output as a batch. The number of samples in each buffer is reported by bufferSize(). Using the AudioPlayer object, we can only access those samples that are currently in the buffer. Individual samples for the left channel can then be read by syntax such as song.left.get(i) for index i within the buffer (and similarly for right channel). It is worth noting that you can read samples in the buffer, but cannot alter them in such a fashion.

To have access to an entire audio clip and to edit the samples, you may use the AudioSample class. an example of such code follows:

   import ddf.minim.*;
   
   Minim minim;
   AudioSample song;
   
   void setup()
   {
     size(100, 100);
   
     minim = new Minim(this);
   
     // this loads mysong.wav from the data folder as AudioSample
     song = minim.loadSample("mysong.wav", 2048);

     float[] leftChannel = song.getChannel(AudioSample.LEFT);
     float[] rightChannel = song.getChannel(AudioSample.RIGHT);

     // ...can examine or edit those arrays...

     song.trigger();  // start playing the song
   }

   void draw() { }  // necessary for interactive playback of audio

However be warned that these arrays can be huge. With 44100 samples per second, even a 1-minute audio clip is already using over 2.5 million samples (per channel).

Also, the AudioSample class provides more limited control of playback, supporting only a trigger() function to start play, and a stop() function to stop play.

Michael Goldwasser

CSCI 1050, Spring 2016
Last modified: Tuesday, 26 April 2016

Saint Louis University

Computer Science 1050
Introduction to Computer Science: Multimedia

Michael Goldwasser

Spring 2016

Dept. of Math & Computer Science