Dual Diffusion

Generative Diffusion SNES/SFC Music Model

DualDiffusion is a generative diffusion model for video game music. The model is still a work in progress. More information about the project and source code is available here.

These are samples generated by the current model within 1 week of training on a single desktop GPU. The current model is being trained on video game music from the mid-90s to present day. The dataset is mostly instrumental and there are no captions, conditioning, or metadata for lyrics/vocals.

400k steps
265k steps
165k steps
160k steps

Below are samples generated by an older model trained exclusively on SNES/SFC music. Captions indicate which game(s) were used as conditioning for each sample, "miscellaneous" or "et al" samples use too many (low-weighted) games to list specifically. Samples are in 32khz stereo to match the capabilities of the SPC700.