Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,8 @@ library_name: transformers
|
|
15 |
# CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning
|
16 |
📑 [**Read the full paper (to be presented at ISMIR 2025)**](...TODO)
|
17 |
|
18 |
-
**CultureMERT-TA-95M** is a
|
|
|
19 |
|
20 |
| Dataset | Music Tradition | Hours Used |
|
21 |
|-----------------|-----------------------------|------------|
|
@@ -24,9 +25,10 @@ library_name: transformers
|
|
24 |
| [*Hindustani*](https://dunya.compmusic.upf.edu/hindustani/) | North Indian classical | 200h |
|
25 |
| [*Carnatic*](https://dunya.compmusic.upf.edu/carnatic/) | South Indian classical | 200h |
|
26 |
|
27 |
-
> 🧪 The final model was merged using a scaling factor of **λ = 0.2**, which yielded the best performance across all variants
|
|
|
28 |
|
29 |
-
🔀 This is an alternative variant of [**CultureMERT-95M**](https://huggingface.co/ntua-slp/CultureMERT-95M), where culturally specialized models are merged in weight space via task arithmetic to form a unified multi-cultural model.
|
30 |
|
31 |
---
|
32 |
|
@@ -34,6 +36,7 @@ library_name: transformers
|
|
34 |
|
35 |
We follow the exact same evaluation protocol as in [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M). Below are the evaluation results, along with comparisons to both [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M) and the original [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M):
|
36 |
|
|
|
37 |
## ROC-AUC / mAP
|
38 |
|
39 |
| Model | Turkish-makam | Hindustani | Carnatic | Lyra | FMA-medium | MTAT | **Avg.** |
|
@@ -52,7 +55,7 @@ We follow the exact same evaluation protocol as in [CultureMERT-95M](https://hug
|
|
52 |
| **CultureMERT-TA-95M** | 76.9% / 45.4% | 74.2% / 45.0% | 82.5% / 32.1% | 73.0% / **45.3%** | **59.1%** / **38.2%** | 35.7% / 21.5% | 52.4% |
|
53 |
|
54 |
|
55 |
-
📈 **CultureMERT-TA-95M** performs comparably to [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M) on non-Western datasets
|
56 |
|
57 |
---
|
58 |
|
|
|
15 |
# CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning
|
16 |
📑 [**Read the full paper (to be presented at ISMIR 2025)**](...TODO)
|
17 |
|
18 |
+
**CultureMERT-TA-95M** is a 95M-parameter music foundation model adapted to diverse musical cultures through **task arithmetic**. Instead of direct continual pre-training on a multi-cultural mixture, as in [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M), this model merges multiple **single-culture adapted** variants of [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M)—each continually pre-trained via our two-stage strategy on a distinct musical tradition:
|
19 |
+
|
20 |
|
21 |
| Dataset | Music Tradition | Hours Used |
|
22 |
|-----------------|-----------------------------|------------|
|
|
|
25 |
| [*Hindustani*](https://dunya.compmusic.upf.edu/hindustani/) | North Indian classical | 200h |
|
26 |
| [*Carnatic*](https://dunya.compmusic.upf.edu/carnatic/) | South Indian classical | 200h |
|
27 |
|
28 |
+
> 🧪 The final model was merged using a scaling factor of **λ = 0.2**, which yielded the best overall performance across all task arithmetic variants evaluated.
|
29 |
+
|
30 |
|
31 |
+
🔀 This is an alternative variant of [**CultureMERT-95M**](https://huggingface.co/ntua-slp/CultureMERT-95M), where culturally specialized models are merged in weight space via task arithmetic to form a unified multi-cultural model. It builds directly on [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M), using the same two-stage continual pre-training strategy applied individually to each musical tradition before merging.
|
32 |
|
33 |
---
|
34 |
|
|
|
36 |
|
37 |
We follow the exact same evaluation protocol as in [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M). Below are the evaluation results, along with comparisons to both [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M) and the original [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M):
|
38 |
|
39 |
+
|
40 |
## ROC-AUC / mAP
|
41 |
|
42 |
| Model | Turkish-makam | Hindustani | Carnatic | Lyra | FMA-medium | MTAT | **Avg.** |
|
|
|
55 |
| **CultureMERT-TA-95M** | 76.9% / 45.4% | 74.2% / 45.0% | 82.5% / 32.1% | 73.0% / **45.3%** | **59.1%** / **38.2%** | 35.7% / 21.5% | 52.4% |
|
56 |
|
57 |
|
58 |
+
📈 **CultureMERT-TA-95M** performs comparably to [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M) on non-Western datasets, while surpassing it on *Lyra* and Western benchmarks. It also outperforms [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) on Western tasks (MTAT and FMA-medium) by an average margin of **+0.7%** across all metrics.
|
59 |
|
60 |
---
|
61 |
|