Technology

Mulan

Mulan is a joint acoustic-semantic embedding model that links music recordings to natural language descriptions using a dual-encoder architecture.

Developed by Google Research, Mulan maps unaligned audio and text into a shared 128-dimensional embedding space. The model leverages two distinct towers (a ResNet-50 for audio and a BERT-base for text) trained on 44 million music clips and 370,000 hours of audio. By utilizing contrastive learning, Mulan enables zero-shot music tagging and cross-modal retrieval without requiring manual annotations. This technology powers advanced music understanding tasks, allowing systems to identify complex genres or moods (e.g., 'lo-fi hip hop for studying') directly from raw waveforms.

https://google-research.github.io/seanet/mulan/

1 project · 1 city

Related technologies

Apache Spark 1 BERT 179 BLOOM 115 ECC 1 fastXML 1 GitHub 68 GPT-3 191 GPT-4 528 In-Context Learning 2 Llama-2 227 PaLM 2 116 prompt space 1 RAkEL 1 RoBERTa 118 scikit-learn 82 TensorFlow 90 XML-CNN 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Extreme Multi-Label ICL

Chicago Jul 23

GitHub GPT-4