Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,45 @@ library_name: transformers
|
|
15 |
# CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning
|
16 |
馃搼 [**Read the full paper (to be presented at ISMIR 2025)**](...TODO)
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
---
|
19 |
|
20 |
# 馃敡 Model Usage
|
|
|
15 |
# CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning
|
16 |
馃搼 [**Read the full paper (to be presented at ISMIR 2025)**](...TODO)
|
17 |
|
18 |
+
**CultureMERT-TA-95M** is a multi-culturally adapted 95M-parameter music foundation model created by merging single-culture adaptations of [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) through **task arithmetic**. Instead of continual pre-training on a multi-cultural data mix, as done in [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M), this model merges multiple single-culture adapted models, each continually pre-trained via our two-stage strategy on a distinct musical tradition:
|
19 |
+
|
20 |
+
| Dataset | Music Tradition | Hours Used |
|
21 |
+
|-----------------|-----------------------------|------------|
|
22 |
+
| [*Lyra*](https://github.com/pxaris/lyra-dataset) | Greek traditional/folk | 50h |
|
23 |
+
| [*Turkish-makam*](https://dunya.compmusic.upf.edu/makam/) | Turkish/Ottoman classical | 200h |
|
24 |
+
| [*Hindustani*](https://dunya.compmusic.upf.edu/hindustani/) | North Indian classical | 200h |
|
25 |
+
| [*Carnatic*](https://dunya.compmusic.upf.edu/carnatic/) | South Indian classical | 200h |
|
26 |
+
|
27 |
+
> The final model was merged using a scaling factor of **位 = 0.2**, which yielded the best performance across all variants tested.
|
28 |
+
|
29 |
+
馃攢 This is an alternative variant of [**CultureMERT-95M**](https://huggingface.co/ntua-slp/CultureMERT-95M), where culturally specialized models are merged in weight space to form a unified multi-cultural model.
|
30 |
+
|
31 |
+
---
|
32 |
+
|
33 |
+
# 馃搳 Evaluation
|
34 |
+
|
35 |
+
We follow the exact same evaluation protocol as in [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M). Below are the evaluation results, along with comparisons to both [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M) and the original [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M):
|
36 |
+
|
37 |
+
## ROC-AUC / mAP
|
38 |
+
|
39 |
+
| Model | Turkish-makam | Hindustani | Carnatic | Lyra | FMA-medium | MTAT | **Avg.** |
|
40 |
+
|--------------------|:-------------:|:----------:|:--------:|:----:|:---:|:----:|:--------:|
|
41 |
+
| **MERT-v1-95M** | 83.2% / 53.3% | 82.4% / 52.9% | 74.9% / 39.7% | 85.7% / 56.5% | 90.7% / 48.1% | 89.6% / 35.9% | 66.1% |
|
42 |
+
| **CultureMERT-95M** | **89.6%** / 60.6% | **88.2%** / **63.5%** | **79.2%** / 43.1% | 86.9% / 56.7% | 90.7% / 48.1% | 89.4% / 35.9% | **69.3%** |
|
43 |
+
| **CultureMERT-TA-95M** | 89.0% / **61.0%** | 87.5% / 59.3% | 79.1% / **43.3%** | **87.3%** / **57.3%** | **90.8%** / **49.1%** | 89.6% / **36.4%** | 69.1% |
|
44 |
+
|
45 |
+
|
46 |
+
## Micro-F1 / Macro-F1
|
47 |
+
|
48 |
+
| Model | Turkish-makam | Hindustani | Carnatic | Lyra | FMA-medium | MTAT | **Avg.** |
|
49 |
+
|--------------------|:-------------:|:----------:|:--------:|:----:|:---:|:----:|:--------:|
|
50 |
+
| **MERT-v1-95M** | 73.0% / 38.9% | 71.1% / 33.2% | 80.1% / 30.0% | 72.4% / 42.6% | 57.0% / 36.9% | 35.7% / 21.2% | 49.3% |
|
51 |
+
| **CultureMERT-95M** | **77.4%** / **45.8%** | **77.8%** / **50.4%** | **82.7%** / **32.5%** | **73.1%** / 43.1% | 58.3% / 36.6% | 35.6% / **22.9%** | **52.9%** |
|
52 |
+
| **CultureMERT-TA-95M** | 76.9% / 45.4% | 74.2% / 45.0% | 82.5% / 32.1% | 73.0% / **45.3%** | **59.1%** / **38.2%** | 35.7% / 21.5% | 52.4% |
|
53 |
+
|
54 |
+
|
55 |
+
**CultureMERT-TA-95M** performs comparably to [CultureMERT-95M](https://huggingface.co/ntua-slp/CultureMERT-95M) on non-Western tasks and even surpasses it on *Lyra* and Western benchmarks. Notably, it also outperforms the [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) on Western tasks with an average improvement of **+0.7%**.
|
56 |
+
|
57 |
---
|
58 |
|
59 |
# 馃敡 Model Usage
|