Automatic Music Transcription Using Machine Learning Techniques
Oral Defence Date:
Monday, July 9, 2018 - 13:30
Profs. Bill Hsu and Arno Puder
Automatic music transcription systems take music recordings as input, and produce notated scores or MIDI files as output. With the tremendous number of audio recordings today, there are not enough music experts to transcribe all unnotated recordings of improvised jazz solos, folk music from oral traditions, etc. In this project, the author created an automatic music transcription system called Tranz, using deep learning techniques. Tranz was designed to transcribe solo piano music. It uses a variable-Q transform to preprocess the input audio feature extraction. Four types of neural networks are implemented: deep neural network acoustic model, convolutional neural network acoustic model, Ted’s architecture using deep neural network, and Ted’s architecture using convolutional neural network. The deep neural network acoustic model and the convolutional neural network acoustic model are re-implementations of acoustic models proposed by Siddharth Sigtia. Ted’s architecture is a new neural network architecture that was proposed in this project. It uses a neural network acoustic model for estimating fundamental frequencies, an onset detector neural network for predicting note onsets, and a time delay neural network for combining the results from the acoustic model and the onset detector into a transcription. The neural networks used in the acoustic model and the onset detector of Ted’s architecture can be either deep neural networks or convolutional neural networks. Valentin Emiya’s MIDI Aligned Piano Sound (MAPS) dataset was used as the training and test dataset. The MAPS dataset includes both synthesized and real piano recordings. The synthesized piano recordings were used as the training dataset. The real piano recordings were used as the test dataset. The neural networks were implemented using Python and TensorFlow. The output of the Tranz system is a MIDI file. Tranz can produce good transcriptions of complex piano performances, with accuracies comparable or superior to published systems.
Automatic Music Transcription, Piano, Variable-Q Transform, VQT, Deep Neural Network, Convolutional Neural Network, Deep Learning