Sonic data analysis with SoX

Sonic data analysis with SoX

In this post I’ll be using SoX to export audio to data values that can be processed outside the audio realm. The example I’ll use here is a breif recording that will be plotted with GNUPlot. SoX and other programs provide spectrograms, which are frequency domain images of audio files. I’ll be using GNUPlot to create an image of an audio file in the time domain.

First, I’ll use this command to convert the audio wav file into a text based dat file.

sox "/home/local/Desktop/Export to Data/44k_Mono.wav" "/home/local/Desktop/Export to Data/44k_Mono.dat"

Here is a sample of first few lines of the resulting dat file.

; Sample Rate 44100
; Channels 1
0 0
2.2675737e-05 0
4.5351474e-05 -3.0517578e-05
6.8027211e-05 0
9.0702948e-05 0
0.00011337868 0

The first two lines contain the sample rate and number of channels in the audio. The remaining lines contain two columns of data, the first is timestamp and the next is the audio level at that point in time. The example above uses 44100 samples per second, so each row increments in time by 22.6 microseconds.

For our purposes, we are only interested in the second row. So we’ll need to remove the first 2 rows and the first column from the dat file.

Now open up GNUPlot by entering “gnuplot” at the terminal. From here enter in the following command to create a simple plot with the resulting audio data.

set output "/home/local/Desktop/Export to Data/44k_Mono.png"
set terminal pngcairo background rgb 'black'
set xlabel 'Time' tc rgb 'white'
set ylabel 'Amplitude' tc rgb 'white'
plot "/home/local/Desktop/Export to Data/44k_Mono.txt" with lines linecolor rgb "green"

SoX provides a way to export audio into numeric data. This is a fairly advanced feature that allows analysis of data well beyond the audio realm. Sonic locating and triangulation are some possible uses, but these applications require a high degree of time domain accuracy.  Even so, the information gathered from a simple microphone can be used for data logging.  Having the audio data available is the key.

Comments are closed.