Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

README.md +57 -0
config.json +38 -0
mergekit_config.yml +6 -0
model-00001-of-00003.safetensors +3 -0
model-00002-of-00003.safetensors +3 -0
model-00003-of-00003.safetensors +3 -0
model.safetensors.index.json +0 -0
quantization_config.json +0 -0
recipe.txt +26 -0
special_tokens_map.json +23 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+license: other
+license_name: mrl
+language:
+- en
+tags:
+- chat
+pipeline_tag: text-generation
+library_name: transformers
+---
+# Monstral 123B v2
+A Mistral-Large merge
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a531bc7ec6af0f95c707b1/sf_mh-yR7V7ghi7M8UnPS.png)
+This model is a hybrid merge of Behemoth 1.2, Tess, and Magnum V4.  The intention was to do a three-way slerp merge, which is technically
+not possible.  To simulate the effeect of a menage-a-slerp, I slerped B1.2 with tess, then separately did B1.2 with magnum.  I then did a
+model stock merge of those two slerps using B1.2 as the base.  Somehow, it worked out spectacularly well.  Sometimes dumb ideas pay off.
+Mergefuel:
+- TheDrummer/Behemoth-123B-v1.2
+- anthracite-org/magnum-v4-123b
+- migtissera/Tess-3-Mistral-Large-2-123B
+See recipe.txt for full details.
+Improvements over Monstral v1:  Drummer's 1.2 tune of behemoth is a marked improvement over the original, and the addition ot tess to the
+mix really makes the creativity pop.  I seem to have dialed out the rapey magnum influence, without stripping it of the ability to get mean
+and/or dirty when the situation actually calls for it.  The RP output of this model shows a lot more flowery and "literary" description of
+scenes and activities.  It's more colorful and vibrant.  Repitition is dramatically reduced, as is slop (though to a lesser extent).  The
+annoying tendency to double-describe things with "it was X, almost Y" is virtually gone.  Do you like a slow-burn story that builds over
+time?  Well good fucking news, because v2 excels at that.
+The only complaint I've received is occasional user impersonation with certain cards.  I've not seen this myself on any of my cards, so I
+have to assume it's down to the specific formatting on specific cards.  I don't want to say it's a skill issue, but...
+This model is uncensored and perfectly capable of generating objectionable material.  I have not observed it injecting NSFW content into
+SFW scenarios, but no guarentees can be made.  As with any LLM, no factual claims made by the model should be taken at face value.  You
+know that boilerplate safety disclaimer that most professional models have? Assume this has it too.  This model is for entertainment
+purposes only.
+GGUFs:  https://huggingface.co/MarsupialAI/Monstral-123B-v2_GGUF
+# Prompt Format
+Metharme seems to work flawlessly.  In theory, mistral V3 or possibly even chatml should work to some extent, but meth was providing such
+high quality output that I couldn't even be bothered to test the others.  Just do meth, kids.
+If you really want to kick it up a notch, use Konnect's methception prompt.  It's available as an all-in-one sillytavern preset, and as an
+abridged plaintext prompt to use as a sysprompt or character card insertion. https://huggingface.co/Konnect1221/Methception-Llamaception-SillyTavern-Preset
+# Braggadocio
+As of 1/14/25, this model is #4 on the UGI leaderboard overall, and #2 for open-weight models (just behind a 405b finetune).  Imagine how
+well it would score if I knew what I was doing.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a531bc7ec6af0f95c707b1/Y63OcwnPNrRO2JcBOrLvK.png)

config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+    "_name_or_path": "I:\\raw\\behemoth12",
+    "architectures": [
+        "MistralForCausalLM"
+    ],
+    "attention_dropout": 0.0,
+    "bos_token_id": 1,
+    "eos_token_id": 2,
+    "head_dim": 128,
+    "hidden_act": "silu",
+    "hidden_size": 12288,
+    "initializer_range": 0.02,
+    "intermediate_size": 28672,
+    "max_position_embeddings": 131072,
+    "model_type": "mistral",
+    "num_attention_heads": 96,
+    "num_hidden_layers": 88,
+    "num_key_value_heads": 8,
+    "rms_norm_eps": 1e-05,
+    "rope_theta": 1000000.0,
+    "sliding_window": null,
+    "tie_word_embeddings": false,
+    "torch_dtype": "float16",
+    "transformers_version": "4.46.1",
+    "use_cache": true,
+    "vocab_size": 32768,
+    "quantization_config": {
+        "quant_method": "exl3",
+        "version": "0.0.2",
+        "bits": 1.4,
+        "head_bits": 4,
+        "calibration": {
+            "rows": 100,
+            "cols": 2048
+        },
+        "out_scales": "auto"
+    }
+}

mergekit_config.yml ADDED Viewed

	@@ -0,0 +1,6 @@

+models:
+  - model: I:\raw\monstral2m
+  - model: I:\raw\monstral2t
+merge_method: model_stock
+base_model: I:\raw\behemoth12
+dtype: float16

model-00001-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2362e1a50dd11d69c106058f3df9cea22013c8a528db9434ababc29b6ae9720
+size 8561310976

model-00002-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:56adf0c05c088e8da233e6ae4dfb7ddbb05a24d629353271916506e05cf4d237
+size 8499741600

model-00003-of-00003.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d062920c1187c50440287a114816e65d7b7b3573bfef2478271f8dcf2271a750
+size 5284928512

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

quantization_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

recipe.txt ADDED Viewed

	@@ -0,0 +1,26 @@

+models:
+  - model: behemoth12
+  - model: tess123
+merge_method: slerp
+base_model: behemoth12
+parameters:
+  t: [0.1, 0.3, 0.5, 0.3, 0.1]
+dtype: float16
+name: btess
+---
+models:
+  - model: behemoth12
+  - model: magnum123b_v4
+merge_method: slerp
+base_model: behemoth12
+parameters:
+  t: [0.1, 0.3, 0.5, 0.3, 0.1]
+dtype: float16
+name: bmag
+---
+models:
+  - model: btess
+  - model: bmag
+merge_method: model_stock
+base_model: behemoth12
+dtype: float16

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:59f95e28944c062244741268596badc900df86c7f5ded05088d2da22a7379e06
+size 587583

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff