Improving new AI music models qa_mdt and AudioSR for a small project I have in mind.For the first time in a long time I'm writing code, including revision of other people's sources. A year ago, working with code in GPT was like working with a schoolboy: I wrote some prototypes quickly, but everything had to be double-checked and given to a real programmer for revision and re-write.Today GPT-4o is already a full-fledged assistant, similar to a normal developer: it explains details, understands poorly formulated tasks, keeps changes in its head, and writes code normally. Problems arise from incompleteness of available information, but are solved by loading and analyzing specific files.Having a little experience with picture generators (in fact, I only know how to read code in Python and roughly understand the scheme of work of different functions and components of neural networks, for example, samplers), in a week I finished the small features I needed for sound generation. And then I sliced and normalized audio files, configured the software on the runpod server, assigned tokens to the samples and uploaded them to the database for further additional training of the basic model (the authors, by the way, write that it can be complex and they don't recommend it). Well, not me, but GPT, of course — I just wrote "I want to do all this, help me please". And it works.For example, I found that the generation in QA_MDT always uses a fixed seed. I went into the files and made the generation random, as is usually done in Stable Diffusion (yes, it's technically SD-based sound frequency picture generation, paper). Then put the control of the seed and other parameters in the interface for easy management and model testing.ModificationsExample modification of infer_mos5.pyimport redef sanitize_filename(name: str) -> str: return re.sub(r'', '_', name).replace(" ", "_")def infer(dataset_key, configs, config_yaml_path, exp_group_name, exp_name, seed=0, output_filename=None, prompt=None): # If pipeline.py already add name, use it if output_filename is None: sanitized_prompt = sanitize_filename(prompt) output_filename = f"{sanitized_prompt}_{seed}.wav" print(f"Infer: Will save to {output_filename}") latent_diffusion.generate_sample( val_loader, unconditional_guidance_scale=guidance_scale, ddim_steps=ddim_sampling_steps, n_gen=n_candidates_per_samples, name=output_filename ) ...Change ddpm.py to save output files with different names instead of 'awesome.wav'if name is None: name = "awesome.wav"self.save_waveform(waveform, savepath="./", name=name)output_filename = f"{sanitized_prompt}_{seed}.wav"latent_diffusion.generate_sample( batchs=batchs, ddim_steps=ddim_steps, unconditional_guidance_scale=guidance_scale, name=output_filename, ...)Example modification of pipeline.py for random seed and other parametersimport randomdef __call__(self, prompt: str, seed: int = None, cfg: float = None, steps: int = None): # If no seed, use random if seed is None: seed = random.randint(0, 999999) print(f"Using seed = {seed}") # Make filename from parameters cfg_str = f"cfg{cfg}" if cfg is not None else "cfg?" steps_str = f"steps{steps}" if steps is not None else "steps?" filename = f"{sanitized_prompt}_{cfg_str}_{steps_str}_{seed}.wav" # Add cfg / steps to self.configs if cfg is not None: self.configs = cfg if steps is not None: self.configs = steps dataset_key = self.build_dataset_json_from_prompt(prompt) infer( dataset_key=dataset_key, configs=self.configs, config_yaml_path=self.config_yaml, exp_group_name="qa_mdt", exp_name="mos_as_token", seed=seed, output_filename=filename, # Set filename prompt=prompt ) return filenamepipe = MOSDiffusionPipeline()result = pipe("A modern synthesizer creating futuristic soundscapes.", seed=1234, cfg=10.0, steps=100)print(f"Generated file: {result}")Example of inference with parametersfrom qa_mdt.pipeline import MOSDiffusionPipelinepipe = MOSDiffusionPipeline()filename = pipe(prompt="smoke_on_water", seed=42, cfg=10.0, steps=100)print("Generation done. File:", filename)Which results in someting like this:Infer: Will save to smoke_on_water_cfg10.0_steps100_42.wavWaveform saved at -> smoke_on_water_cfg10.0_steps100_42.wavI also started to implement negative prompts and weights change from https://huggingface.co/blog/audioldm2.How to run QA_MDT (OpenMusic) on RunpodMinimum requirements: 36-48gb video ram, 30gb system and 100gb files SSD.Installation Instructions# 0. Docker commandsbash -c "apt update;apt install -y wget;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;apt-get install -y magic-wormhole;apt-get install -y nano;apt-get install -y curl;apt-get install -y git;apt-get install -y git-lfs;apt-get install -y ffmpeg;apt-get install -y unzip;cd home;curl -O https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh;chmod 777 Anaconda3-2024.10-1-Linux-x86_64.sh;cd ..;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo YOUR_PUBLIC_KEY > authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity"# 1. Condacd /homebash Anaconda3-2024.10-1-Linux-x86_64.sh;source ~/.bashrcconda create -n oml python=3.11 -y && conda activate oml# 2. QA_MDT installgit clone https://huggingface.co/jadechoghari/openmusic qa_mdtls -ltrpip install diffuserspip install matplotlibpip install pandaspip install einopspip install h5pypip install gdownpip install xformers==0.0.26.post1pip install torchlibrosa==0.0.9 librosa==0.9.2pip install -q pytorch_lightning==2.1.3 torchlibrosa==0.0.9 librosa==0.9.2 ftfy==6.1.1 braceexpandpip install torch==2.3.0+cu121 torchvision==0.18.0+cu121 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121pip install -r qa_mdt/requirements.txt# 3. Now create a file with interference commandnano new.py# from qa_mdt.pipeline import MOSDiffusionPipeline## pipe = MOSDiffusionPipeline()# pipe("A modern synthesizer creating futuristic soundscapes.")# 4. Runpython new.pyResultAudioSR RestorationI also ran AudioSR locally on my macbook, solving minor technical problems (as usual in open source). Comparing the restored files with the original ones I noticed more emphasis on high frequencies. To be fair, it is enough to restore drums and voice, and to make low-quality and generative samples a bit better.Installation instructionsapt-get updateapt-get install magic-wormholeapt-get install nanoapt-get install curlapt-get install gitapt-get install ffmpegcurl -O https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.shchmod 777 Anaconda3-2024.10-1-Linux-x86_64.shbash Anaconda3-2024.10-1-Linux-x86_64.shsource ~/.bashrcconda -Vconda create -n audiosr python=3.9; conda activate audiosrgit clone https://github.com/haoheliu/versatile_audio_super_resolution/pip3 install audiosr==0.0.7audiosr -i example/music.wavaudiosr -i INPUT_AUDIO_FILEResult before/afterThe progress over the year is awesome. But I wanted to train my own model, and so I did. Read the full article













