open-llm-leaderboard/open_llm_leaderboard · Announcement: Flagging merged models with incorrect metadata

Open LLM Leaderboard org Jan 3, 2024

Hi!
As some users removed the merge tag from their model's metadata to appear in the main view of the leaderboard, we are adding a mechanism to automatically flag all the models identified as merges where the metadata is incorrect.

If your model is a merge and you want to remove its flag, you just need to add the following in its model card.

 tags:
- merge

The leaderboard is rebuilt every hour, and re-reads this info each time.

kyujinpy

Jan 3, 2024

Hello!

https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct
I added themerge tag in my readme!

Could it be recovered?
Thanks! :)

clefourrier

Open LLM Leaderboard org Jan 3, 2024

Hi, well done!
It's been updated automatically when the leaderboard restarted, see below :)

kyujinpy

Jan 3, 2024

Oh, I see..!
Thank you for your reply!

JusticeDike

Jan 3, 2024

•

edited Jan 3, 2024

Nice work!
But what is the criteria for "merge"?
If the criterion is that a new model is created using two or more existing models, then it seems fair that the following models should also be tagged with "merge".

https://huggingface.co/DopeorNope/SOLARC-MOE-10.7Bx6
"MOE" is also possible via "merge". (https://github.com/cg123/mergekit/blob/mixtral/moe.md)
https://huggingface.co/DopeorNope/SOLARC-MOE-10.7Bx4
https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0
This model was sliced and merged from Mistral 7B (https://arxiv.org/abs/2312.15166)

Also, should models fine-tuned using merged models be tagged with "merge"?

clefourrier

Open LLM Leaderboard org Jan 4, 2024

Hi, good question about edge cases!

Are MOE merges?
We considered that merges are models which combine several other models in a way that does not keep the individual weights of the original models (like fusions).
For this reason, I would not consider MOEs to be merges (if they keep the individual weights separate), but I'm open to discussion on this - If I had to choose, I would probably suggest using a different tag for MOEs
Should the fine-tune of a merged model be tagged with merge?
Imo yes

Mikael110

Jan 4, 2024

•

edited Jan 4, 2024

I personally disagree with point one. My definition of a merge matches that of JusticeDike. If two or more separate models are combined to form a new model, in any way, then it's a merge as far as I'm concerned. And I feel like this is the most common understanding of the term. I would also argue that for the sake of consistency everything produced by Mergekit should be considered a merge, regardless of which technique is used. And most (all?) of the current MoE models (Other than Mixtral and it's finetunes) has been created using Mergekit.

There's also the fact that Mergekit MoE's can (and often do) contain models that are themselves merges. For instance Mixtral MOE 2x10.7B which is currently third on the filtered leaderboard contains two merged models.

I agree with point two. If finetuning a merged model removed the merge label then it would be trivial to cheat the system.

clefourrier changed discussion title from Flagging merged models with incorrect metadata to Announcement: Flagging merged models with incorrect metadata Jan 5, 2024

Mihaiii

Jan 6, 2024

•

edited Jan 7, 2024

Are frankenmerges considered merged models?

IMO they shouldn't because it just implies duplicating some layers of a model, without the involvement of another model.

I'm asking because I have a pretty decent frankenmerge model myself and I created it using mergekit, but I also could have duplicate the layers myself, with some custom code.

For clarity/context, these are frankenmerges: https://github.com/cg123/mergekit?tab=readme-ov-file#passthrough

83 hidden messages

Expand all

clefourrier

Open LLM Leaderboard org Jan 24, 2024

Going to close this discussion, as it's become quite long, and I assume it's going to send notifs to anyone who took part in it ^^.

If you think there is a problem with the tagging/flagging or your model, please open a specific discussion for it!

clefourrier changed discussion status to closed Jan 24, 2024

Walmart-the-bag

Apr 17, 2024

@clefourrier I found some other models near the top of the leaderboard that are very likely to be merges, yet they are not tagged as such or flagged. Can you please flag them so that their authors will have to correct tags before re-appearing on a leaderboard? If I do it myself it will probably just open 40 different discussions and it will be a mess to manage.

merge of kyujinpy/Sakura-SOLAR-Instruct (a merge in itself) and jeonsworld/CarbonVillain-en-10.7B-v1 (a merge too)
https://huggingface.co/datasets/open-llm-leaderboard/details_DopeorNope__SOLARC-M-10.7B
https://huggingface.co/DopeorNope/SOLARC-M-10.7B

MoE containing merge kyujinpy/Sakura-SOLAR-Instruct
https://huggingface.co/datasets/open-llm-leaderboard/details_DopeorNope__SOLARC-MOE-10.7Bx6
https://huggingface.co/DopeorNope/SOLARC-MOE-10.7Bx6

MoE containing merge kyujinpy/Sakura-SOLAR-Instruct
https://huggingface.co/datasets/open-llm-leaderboard/details_DopeorNope__SOLARC-MOE-10.7Bx4
https://huggingface.co/DopeorNope/SOLARC-MOE-10.7Bx4

Merge of VAGOsolutions/SauerkrautLM-SOLAR-Instruct and kyujinpy/Sakura-SOLAR-Instruct (a merge in itself)
https://huggingface.co/datasets/open-llm-leaderboard/details_gagan3012__MetaModelv2
https://huggingface.co/gagan3012/MetaModelv2

merge of jeonsworld/CarbonVillain-en-10.7B-v4 and jeonsworld/CarbonVillain-en-10.7B-v2
https://huggingface.co/datasets/open-llm-leaderboard/details_gagan3012__MetaModelv3
https://huggingface.co/gagan3012/MetaModelv3

SEE EDIT BELOW

Not sure what merge method was used for this but model card suggest it's a merge

Merges:
Fan in: 0:2
Fan out: -4:
Intermediary layers: 1/1/1/0/1/1/0/1/0/1/1/0/1/1/0 use the On/Off as a way of regularise.

https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__UNA-SOLAR-10.7B-Instruct-v1.0
https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0

SEE EDIT BELOW

Fine-tune of UNA-SOLAR-10.7B-Instruct-v1.0 which is likely a merge
https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__UNA-POLAR-10.7B-InstructMath-v2
https://huggingface.co/fblgit/UNA-POLAR-10.7B-InstructMath-v2

according to https://github.com/KyujinHan/Sakura-SOLAR-DPO, this model is based on model that is a merge (kyujinpy/Sakura-SOLAR-Instruct)
https://huggingface.co/datasets/open-llm-leaderboard/details_kyujinpy__Sakura-SOLRCA-Math-Instruct-DPO-v2
https://huggingface.co/kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2

according to https://github.com/KyujinHan/Sakura-SOLAR-DPO, this model is based on model that is a merge (kyujinpy/Sakura-SOLAR-Instruct)
https://huggingface.co/datasets/open-llm-leaderboard/details_kyujinpy__Sakura-SOLAR-Instruct-DPO-v2
https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct-DPO-v2

according to https://github.com/KyujinHan/Sakura-SOLAR-DPO, this model is based on model that is a merge (kyujinpy/Sakura-SOLAR-Instruct)
https://huggingface.co/datasets/open-llm-leaderboard/details_kyujinpy__Sakura-SOLRCA-Math-Instruct-DPO-v1
https://huggingface.co/kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v1

according to https://github.com/KyujinHan/Sakura-SOLAR-DPO, this model is based on model that is a merge (kyujinpy/Sakura-SOLAR-Instruct)
https://huggingface.co/datasets/open-llm-leaderboard/details_kyujinpy__Sakura-SOLRCA-Instruct-DPO
https://huggingface.co/kyujinpy/Sakura-SOLRCA-Instruct-DPO

looks like a merge of https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0 and https://huggingface.co/VAGOsolutions/SauerkrautLM-SOLAR-Instruct
https://huggingface.co/fblgit/LUNA-SOLARkrautLM-Instruct

it's based on zyh3826/GML-Mistral-merged-v1 which is a merge of quantum-v0.01 and mistral-7b-dpo-v5
https://huggingface.co/CultriX/MistralTrix-v1

merge of merges (cookinai/CatMacaroni-Slerp and viethq188/LeoScorpius-7B)
https://huggingface.co/datasets/open-llm-leaderboard/details_samir-fama__SamirGPT-v1
https://huggingface.co/samir-fama/SamirGPT-v1

merge of merges (cookinai/CatMacaroni-Slerp and shadowml/Marcoro14-7B-slerp)
https://huggingface.co/datasets/open-llm-leaderboard/details_samir-fama__FernandoGPT-v1
https://huggingface.co/samir-fama/FernandoGPT-v1

it's based on Q-bert/MetaMath-Cybertron-Starling which is a merge of Q-bert/MetaMath-Cybertron and berkeley-nest/Starling-LM-7B-alpha
https://huggingface.co/datasets/open-llm-leaderboard/details_perlthoughts__Marcoroni-8x7B-v3-MoE
https://huggingface.co/perlthoughts/Marcoroni-8x7B-v3-MoE

fine tune over go-bruins, which is based on Q-bert/MetaMath-Cybertron-Starling and therefore a merge of Q-bert/MetaMath-Cybertron and berkeley-nest/Starling-LM-7B-alpha
https://huggingface.co/datasets/open-llm-leaderboard/details_rwitz__go-bruins-v2
https://huggingface.co/rwitz/go-bruins-v2

DPO fine tune over Q-bert/MetaMath-Cybertron-Starling, therefore a merge of Q-bert/MetaMath-Cybertron and berkeley-nest/Starling-LM-7B-alpha
https://huggingface.co/datasets/open-llm-leaderboard/details_rwitz__go-bruins
https://huggingface.co/rwitz/go-bruins

fine tune of kyujinpy/Sakura-SOLAR-Instruct, which in itself is a merge
https://huggingface.co/datasets/open-llm-leaderboard/details_Walmart-the-bag__Solar-10.7B-Cato
https://huggingface.co/Walmart-the-bag/Solar-10.7B-Cato

looks like a merge of mistral base, neural-chat and marcoroni
https://huggingface.co/datasets/open-llm-leaderboard/details_aqweteddy__mistral_tv-neural-marconroni
https://huggingface.co/aqweteddy/mistral_tv-neural-marconroni

it's based on https://huggingface.co/viethq188/LeoScorpius-7B-Chat-DPO which has been already flagged for dataset contamination in https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/474
LeoScorpius-7B that the LeoScorpius-7B-Chat-DPO is based on is a merge of AIDC-ai-business/Marcoroni-7B-v3 and Q-bert/MetaMath-Cybertron-Starling
https://huggingface.co/datasets/open-llm-leaderboard/details_NExtNewChattingAI__shark_tank_ai_7_b
https://huggingface.co/NExtNewChattingAI/shark_tank_ai_7_b

merge of fblgit/una-cybertron-7b-v2-bf16 and meta-math/MetaMath-Mistral-7B
https://huggingface.co/datasets/open-llm-leaderboard/details_Q-bert__MetaMath-Cybertron
https://huggingface.co/Q-bert/MetaMath-Cybertron

merge of teknium/OpenHermes-2.5-Mistral-7B, Intel/neural-chat-7b-v3-3, meta-math/MetaMath-Mistral-7B, and openchat/openchat-3.5-1210
https://huggingface.co/OpenPipe/mistral-ft-optimized-1227
https://huggingface.co/datasets/open-llm-leaderboard/details_OpenPipe__mistral-ft-optimized-1227

merge between Chupacabra 7b v2.04 and dragon-mistral-7b-v0
https://huggingface.co/datasets/open-llm-leaderboard/details_perlthoughts__Falkor-7b
https://huggingface.co/perlthoughts/Falkor-7b

fine-tune of v1olet/v1olet_marcoroni-go-bruins-merge-7B which is in itself a merge of AIDC-ai-business/Marcoroni-7B-v3 and rwitz/go-bruins-v2. There are few generations of merges in this one.
https://huggingface.co/datasets/open-llm-leaderboard/details_v1olet__v1olet_merged_dpo_7B
https://huggingface.co/v1olet/v1olet_merged_dpo_7B

merge of OpenHermes-2.5-neural-chat-7b-v3-1 and Bruins-V2
https://huggingface.co/datasets/open-llm-leaderboard/details_Ba2han__BruinsV2-OpHermesNeu-11B
https://huggingface.co/Ba2han/BruinsV2-OpHermesNeu-11B

merge of kyujinpy/Sakura-SOLAR-Instruct and Weyaxi/SauerkrautLM-UNA-SOLAR-Instruct - both of which are also merges..
https://huggingface.co/datasets/open-llm-leaderboard/details_DopeorNope__You_can_cry_Snowman-13B
https://huggingface.co/DopeorNope/You_can_cry_Snowman-13B

merge of Q-bert/MetaMath-Cybertron-Starling and maywell/Synatra-7B-v0.3-RP
https://huggingface.co/datasets/open-llm-leaderboard/details_DopeorNope__You_can_cry_Snowman-13B
https://huggingface.co/PistachioAlt/Synatra-MCS-7B-v0.3-RP-Slerp

merge of meta-math/MetaMath-Mistral-7B and fblgit/una-cybertron-7b-v2-bf16
https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__MetaMath-una-cybertron-v2-bf16-Ties
https://huggingface.co/Weyaxi/MetaMath-una-cybertron-v2-bf16-Ties

Merge of teknium/OpenHermes-2.5-Mistral-7B and Intel/neural-chat-7b-v3-2 using ties merge.
https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__OpenHermes-2.5-neural-chat-7b-v3-2-7B
https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-7b-v3-2-7B

Model merge between Chupacabra, openchat, and dragon-mistral-7b-v0.
https://huggingface.co/datasets/open-llm-leaderboard/details_perlthoughts__Falkor-8x7B-MoE
https://huggingface.co/perlthoughts/Falkor-8x7B-MoE

merge of Chronos-70b-v2 and model 007 at a ratio of 0.3 using the SLERP method
https://huggingface.co/elinas/chronos007-70b
https://huggingface.co/datasets/open-llm-leaderboard/details_elinas__chronos007-70b

merge of meta-math/MetaMath-Mistral-7B and mlabonne/NeuralHermes-2.5-Mistral-7B
https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__MetaMath-NeuralHermes-2.5-Mistral-7B-Linear
https://huggingface.co/Weyaxi/MetaMath-NeuralHermes-2.5-Mistral-7B-Linear

Merge of meta-math/MetaMath-Mistral-7B and Intel/neural-chat-7b-v3-2 using ties merge.
https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__MetaMath-neural-chat-7b-v3-2-Ties
https://huggingface.co/Weyaxi/MetaMath-neural-chat-7b-v3-2-Ties

fine tune of Mistral-7B-Instruct-v0.2 and cookinai/CatMacaroni-Slerp merge
https://huggingface.co/datasets/open-llm-leaderboard/details_diffnamehard__Mistral-CatMacaroni-slerp-uncensored
https://huggingface.co/diffnamehard/Mistral-CatMacaroni-slerp-uncensored-7B

merge of Intel/neural-chat-7b-v3-1 and teknium/OpenHermes-2.5-Mistral-7B
https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__neural-chat-7b-v3-1-OpenHermes-2.5-7B
https://huggingface.co/Weyaxi/neural-chat-7b-v3-1-OpenHermes-2.5-7B

merge of meta-math/MetaMath-Mistral-7B and mlabonne/NeuralHermes-2.5-Mistral-7B
https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__MetaMath-NeuralHermes-2.5-Mistral-7B-Ties
https://huggingface.co/Weyaxi/MetaMath-NeuralHermes-2.5-Mistral-7B-Ties

merge of teknium/OpenHermes-2-Mistral-7B and Open-Orca/Mistral-7B-SlimOrca
https://huggingface.co/datasets/open-llm-leaderboard/details_Walmart-the-bag__Misted-7B
https://huggingface.co/Walmart-the-bag/Misted-7B

merge of garage-bAInd/Platypus2-70B and augtoma/qCammel-70-x
https://huggingface.co/datasets/open-llm-leaderboard/details_garage-bAInd__Camel-Platypus2-70B
https://huggingface.co/garage-bAInd/Camel-Platypus2-70B

merge of HuggingFaceH4/zephyr-7b-alpha and Open-Orca/Mistral-7B-OpenOrca
https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__OpenOrca-Zephyr-7B
https://huggingface.co/Weyaxi/OpenOrca-Zephyr-7B

seems to be a merge of Intel/neural-chat-7b-v3-1, migtissera/SynthIA-7B-v1.3, bhenrym14/mistral-7b-platypus-fp16, jondurbin/airoboros-m-7b-3.1.2, teknium/CollectiveCognition-v1.1-Mistral-7B and uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b
https://huggingface.co/datasets/open-llm-leaderboard/details_uukuguy__speechless-mistral-7b-dare-0.85
https://huggingface.co/uukuguy/speechless-mistral-7b-dare-0.85

EDIT: I have low confidence that 2 models below are merges of two various models. They might be frankenmerges where layers of one model are merged with each other. fblgit's description is not clear enough for me.

fblgit/UNA-SOLAR-10.7B-Instruct-v1.0
fblgit/UNA-POLAR-10.7B-InstructMath-v2

I have never had 'merge' tag in misted-7b. That is a false flag.

Walmart-the-bag

Apr 17, 2024

All commits are here.

https://huggingface.co/Walmart-the-bag/Misted-7B/commits/main

clefourrier

Open LLM Leaderboard org Apr 18, 2024

@Walmart-the-bag This is precisely why your model was flagged: according to your model card, your model is a merge

base_model: teknium/OpenHermes-2-Mistral-7B
models:
      - model: teknium/OpenHermes-2-Mistral-7B
      - model: Open-Orca/Mistral-7B-SlimOrca
merge_method: slerp

yet it is not indicated in the metadata of your model, which should include "merge" as a tag.

picAIso

May 31, 2024

hi, can someone please explain why picAIso/TARS-8B was flagged despite having the merge tag attached?

clefourrier

Open LLM Leaderboard org Jun 3, 2024

Hi @picAIso !
If it's about this tag and the model was submitted before having the tag attached, it was flagged then, but it should be updated automatically.

hyokwan

Jun 24, 2024

https://huggingface.co/hyokwan/hkcode-solar-youtube-merged
I added the merge tag in my readme!

Could it be recovered?
Thanks! :)

clefourrier

Open LLM Leaderboard org Jun 24, 2024

Hi! It should be updated automatically, feel free to ping us again if it's not OK in a couple days