In music theory, notes represent a musical sound that can represent a sound’s
To track different notes in an audio file, we can detect the change in the onset_detect()
function to get the frames for all the onsets in the audio.
We also need to find the chroma_stft()
function provided by librosa. This function returns the chromatogram from a short-time Fourier transform (STFT) representation of an audio signal. This chromatogram represents the energy distribution of pitch classes over time. The mathematical equation for the short-time Fourier transform is as follows:
In this equation:
Once we have the chroma values, we can simply use the frames we obtain from the onset_detect()
function to obtain the maximum chroma value at these frames, and we can use the frames_to_time()
function to calculate the duration of each note. This way, we can have each note’s pitch intensity and time duration.
The following code tracks the notes in a trumpet audio clip:
import librosa# Loading the audio fileaudio_file = '../trumpet.ogg'y, sr = librosa.load(audio_file)# Extracting the chroma features and onsetschroma = librosa.feature.chroma_stft(y=y, sr=sr)onset_frames = librosa.onset.onset_detect(y=y, sr=sr)first = Truenotes = []for onset in onset_frames:chroma_at_onset = chroma[:, onset]note_pitch = chroma_at_onset.argmax()# For all other notesif not first:note_duration = librosa.frames_to_time(onset, sr=sr)notes.append((note_pitch,onset, note_duration - prev_note_duration))prev_note_duration = note_duration# For the first noteelse:prev_note_duration = librosa.frames_to_time(onset, sr=sr)first = Falseprint("Note pitch \t Onset frame \t Note duration")for entry in notes:print(entry[0],'\t\t',entry[1],'\t\t',entry[2])
Lines 4–5: We import the audio file we will use for this task. The audio file is available for free and comes by default with the librosa library. Here is the link to the audio file:
Lines 8–9: We extract the chroma features and onset frames from the audio file. The onset values return an array of frames where a new musical note starts.
Lines 13–15: Here, we go through the onset frames and pick the index of the maximum chroma value at that frame. The note_pitch
value indicates the note with the maximum chroma value, meaning it will give us one of the 12 musical notes.
Lines 17–24: If this is the first note that we track, then we convert the frame to time using the librosa.frames_to_time()
function and set the first
flag as False
. If it is not the first note, we take the most recent note we tracked and calculate the difference between the last and current notes’ duration.
In the output, we get a table of all the notes in the audio. We can see that we have tracked different notes and can distinguish them. Moreover, the output also gives us the onset frame, which tells us at which frame the note starts exactly. Lastly, we also get the duration of the note.
In conclusion, note tracking using librosa is a powerful and versatile tool for analyzing and extracting valuable information from audio data. Throughout this task, we explored how librosa can be applied to various use cases, demonstrating its significance in various fields.
Librosa’s note-tracking capabilities offer versatile applications in music transcription, speech analysis, and environmental monitoring. It aids in music interpretation, speech recognition, and acoustic event detection. Librosa is a valuable tool with broad implications for multiple industries, driving innovation and enhancing data analysis and decision-making.
Free Resources