Abstract:
As a challenging task of audio processing, automatic music transcription (AMT) attracts increasing attention recently, which aims to convert a raw audio to a symbolic rep...Show MoreMetadata
Abstract:
As a challenging task of audio processing, automatic music transcription (AMT) attracts increasing attention recently, which aims to convert a raw audio to a symbolic representation. Nowadays, music recordings are usually stereo audio files. Many previous studies simply average the stereo signal to a mono signal when processing data, which sacrifices some useful information. In this paper, we design a stereo feature enhancement (SFE) module based on self-attention mechanism to make full use of stereo information. Moreover, in recent years temporal convolutional network (TCN) has demonstrated great effect on processing temporal data, which overcomes some drawbacks of existing temporal information extraction methods such as HMM, RNN and LSTM. Inspired by this, we propose a temporal convolutional module (TCM) which is suitable to extract temporal context of music. Our proposed network is validated on the MAPS dataset for music transcription, and achieves ideal performance.
Published in: IEEE Signal Processing Letters ( Volume: 28)