リズムの潜在空間へのマッピング。ジャンルごとに集まっているのが見える

Overview - 何がすごい?

MIDIから抽出したリズムパターンの巨大なデータセット(3万パターン以上)を公開
リズムパターンの生成モデルをAutoencoder, VAE, ACAI(adversarially constrained autoencoder interpolations)を使って学習。比較している。

Abstract

This paper addresses the issue of long-scale correlations that is characteristic for symbolic music and is a challenge for modern generative algorithms. It suggests a very simple workaround for this challenge, namely, generation of a drum pattern that could be further used as a foundation for melody generation. The paper presents a large dataset of drum patterns alongside with corresponding melodies. It explores two possible methods for drum pattern generation. Exploring a latent space of drum patterns one could generate new drum patterns with a given music style. Finally, the paper demonstrates that a simple artificial neural network could be trained to generate melodies corresponding with these drum patters used as inputs. Resulting system could be used for end-to-end generation of symbolic music with song-like structure and higher long-scale correlations between the notes.

Motivation

ドラムとメロディがセットになったデータセットがあれば、ドラムを生成するだけでなく、ドラムのパターンからメロディーも生成できるのでは

Dataset

MIDIから抽出したリズムパターンの巨大なデータセット(3万パターン以上)をまず作る

ネット上で集めたMIDIデータからリズムだけを抜き出す
いくつかの条件でゴミデータをフィルタする

14のドラムの種類(キック、スネア...) ✖️ 16部音符単位で2小節 (32ステップ)のマトリクスで表現

リズムのマトリクス表現

Architecture

リズムパターンの埋め込み表現

Autoencoder, VAE, ACAIの三つのモデルを試す (下の図の smoothingにあたる部分)
潜在ベクトルの次元数は4としている。
レイヤーはシンプルなfeedforward + ReLU

リズムの潜在空間

リズムからメロディーの生成モデル

メロディーを演奏する楽器、トーン(??)、オクターブを指定する埋め込みベクトルとドラムパターンから、メロディーを生成する

リズムからメロディーの生成モデル

Results

学習したモデルでリズムパターンを潜在空間にマッピング

四次元の潜在ベクトルを2次元にt-SNEで落とし込むことで二次元のマッピングができた。

リズムの潜在空間

学習したVAE, Autoencoder, ACAIのモデルの潜在空間でサンプリング

生成されたドラムパターン

ランダムに生成したリズムを学習データを集めた時の前処理でフィルタした時にどのくらい残るかを確認

学習データに使ったパターン(Empirical patterns)が高いのはもちろんのこと、次にACAIが良い結果に。VAEが良くないのは意外。

生成したリズムをフィルタした結果

Further Thoughts

論文の書き方は割と雑...
ドラムから生成したメロディーの例を探したのだけど... 公開してない？
ACAIについて理解できてないので次に読む

Links

今回の研究用に集めたリズムのデータセット

altsoph/drum_space

I collected 33K non-trivial unique drum beats from midis available online, used a neural network with to build a space of latent representation of these beats and to generate new ones. Finally, I mapped all of them into 2D space using the t-SNE algorithm.

github.com

データセットの中のリズムパターンを学習済みモデルで潜在空間にマッピングしたインタラクティブな可視化

altsoph.com

This is an interactive explorer of the drum beats latent space. I collected 33K non-trivial unique drum beats from midis available online. Then I used a neural network with VAE architecture to build a 4D-space of latent representation of these beats and to generate new ones.

altsoph.com

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Autoencoders provide a powerful framework for learning compressed representations by encoding all of the information needed to reconstruct a data point in a latent code. In some cases, autoencoders can "interpolate": By decoding the convex combination of the latent codes for two datapoints, the autoencoder can produce an output which semantically mixes characteristics from the datapoints.

arxiv.org

ドラムパターンとメロディの生成 - Artificial Neural Networks Jamming on the Beat

Overview - 何がすごい?

Abstract

Motivation

Dataset

Architecture

Results

Further Thoughts

Links

altsoph/drum_space

altsoph.com

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer