DualDiffusion is a generative diffusion model for video game music. The model is still a work in progress. More information about the project and source code is available here.
These are samples generated by the current model within 1 week of training on a single desktop GPU. The current model is being trained on video game music from the mid-90s to present day. The dataset is mostly instrumental and there are no captions, conditioning, or metadata for lyrics/vocals.
Below are samples generated by an older model trained exclusively on SNES/SFC music. Captions indicate which game(s) were used as conditioning for each sample, "miscellaneous" or "et al" samples use too many (low-weighted) games to list specifically. Samples are in 32khz stereo to match the capabilities of the SPC700.