README.md · kalle07/embedder_collection at 33716d6dce1e49c6af5887805fa3c9564ec79742

File size: 7,152 Bytes

a2a30ce
 
 
 
db490d7
 
 
 
 
ca67a0d
db490d7
f10f3d2
 
 
1807ea4
db490d7
7faf56a
8bb0749
db490d7
707634e
 
 
6c6b419
 
806cfb4
 
81d7818
9ba4185
 
21ae04b
57fe948
0053fa9
cc26a33
0ea6c62
559c1a0
 
 
 
80f0687
0ea6c62
da8a392
0053fa9
81d7818
3e36a38
7ba2d01
 
da8a392
 
19c08a9
13adbac
261a825
d1761d2
19c08a9
d1761d2
0ea6c62
559c1a0
 
 
0ea6c62
0053fa9
3dd8202
19c08a9
81d7818
 
 
da8a392
d1761d2
3e71bef
81d7818
 
 
7ba2d01
81d7818
7ba2d01
81d7818
7ba2d01
81d7818
7ba2d01
81d7818
7ba2d01
81d7818
7ba2d01
eb336ec
7ba2d01
0053fa9
13adbac
19c08a9
81d7818
 
2d290e4
 
225a69e
3e36a38
2d290e4
3e36a38
 
7951727
268611f
 
 
8d5c583
7951727
81d7818
3e36a38
6abb2f6
3e36a38
7951727
6abb2f6
 
a0643c9
3e36a38
6abb2f6
3e36a38
7951727
 
a0643c9
 
81d7818
225a69e
7a11956
 
da8a392
225a69e
3dd8202
0053fa9
 
a0643c9
0053fa9
a0643c9
0053fa9
33ed8b1
e3703b0
 
 
 
a1b1441
 
 
 
0ea6c62
bf9e7bf
 
 
 
 
 
 
 
 
 
 
 
 
 
31f3d48
e467d8b
a1b1441
 
 
 
 
 
5942394
806cfb4

---
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- embedder
- embedding
- models
- GGUF
- Bert
- Nomic
- Gist
- BGE
- text-embeddings-inference
- RAG
misc:
- text-embeddings-inference
language:
- en
- de
architecture:
- GIST
---

# <b>All models tested with ALLM(AnythingLLM) with LM as server, all models should be work with ollama</b>
<b> GPT4All has only one model (nomic), but the setup for local documents described below is the same</b><br>

(sometimes the results are more truthful if the “chat with document only” option is used)<br>
give me a ❤️, if you like  ;)<br>
<br>
<b>My short impression:</b>
<ul style="line-height: 1;">
<li>nomic-embed-text</li>
<li>mxbai-embed-large</li>
<li>mug-b-1.6</li>
<li>Ger-RAG-BGE-M3 (german)</li>
<li>german-roberta</li>
</ul>
Working well, all other its up to you! (jina and qwen based not yet supported)
<br>
<br>
...

# Short hints for using (Example for a large context with many expected hits):
Set your (Max Tokens)context-lenght 16000t main-model, set your embedder-model (Max Embedding Chunk Length) 1024t,set (Max Context Snippets) 14, 
but in ALLM its cutting all in 1024 character parts, so aprox two times or bit more ~20!
<br>

-> Ok what that mean!<br>
You can receive 14-snippets a 1024t (14336t) from your document ~10000words and 1600t left for the answer ~1000words (2 pages)
<br>
You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ...
<ul style="line-height: 1;">
<li>8000t (~6000words) ~0.8GB VRAM usage</li>
<li>16000t (~12000words) ~1.5GB VRAM usage</li>
<li>32000t (~24000words) ~3GB VRAM usage</li>
</ul>
<br>
...
<br>

# How embedding and search works:

You have a txt/pdf file maybe 90000words(~300pages) a book. You ask the model lets say "what is described in chapter called XYZ in relation to person ZYX". 
Now it searches for keywords or similar semantic terms in the document. if it has found them, lets say word and meaning around “XYZ and ZYX” , 
now a piece of text 1024token around this word “XYZ/ZYX” is cut out at this point. 
This text snippet is then used for your answer. <br>
<ul style="line-height: 1;">
<li>If, for example, the word “XYZ” occurs 100 times in one file, not all 100 are found.</li>

<li>If only one snippet corresponds to your question all other snippets can negatively influence your answer because they do not fit the topic (usually 4 to 32 snippet are fine)</li>

<li>If you expect multible search results in your docs try 16-snippets or more, if you expect only 2 than dont use more!</li>

<li>If you use snipets-size ~1024t you receive more content, if you use ~256t you receive more facts.</li>

<li>A question for "summary of the document" is most time not useful, if the document has an introduction or summaries its searching there if you have luck.</li>

<li>If a book has a table of contents or a bibliography, I would delete these pages as they often contain relevant search terms but do not help answer the user's question.</li>

<li>If the documents small like 10-20 Pages, its better you copy the whole text inside the prompt, some options called "pin".</li>
</ul>
<br>
...
<br>

# Nevertheless, the <b>main model is also important</b>! 
Especially to deal with the context length and I don't mean just the theoretical number you can set.
Some models can handle 128k or 1M tokens, but even with 16k or 32k input the response with the same snippets as input is worse than with other well developed models.<br>
<br>
...
# Important -> The Systemprompt (some examples):
<li> The system prompt is weighted with a certain amount of influence around your question. You can easily test it once without or with a nonsensical system prompt.</li>

"You are a helpful assistant who provides an overview of ... under the aspects of ... . 
You use attached excerpts from the collection to generate your answers! 
Weight each individual excerpt in order, with the most important excerpts at the top and the less important ones further down. 
The context of the entire article should not be given too much weight.  
Answer the user's question!  
After your answer, briefly explain why you included excerpts (1 to X) in your response and justify briefly if you considered some of them unimportant!"<br>
<i>(change it for your needs, this example works well when I consult a book about a person and a term related to them, the explanation part was just a test for myself)</i><br>

or:<br>

"You are an imaginative storyteller who crafts compelling narratives with depth, creativity, and coherence. 
Your goal is to develop rich, engaging stories that captivate readers, staying true to the themes, tone, and style appropriate for the given prompt.
You use attached excerpts from the collection to generate your answers!
When generating stories, ensure the coherence in characters, setting, and plot progression. Be creative and introduce imaginative twists and unique perspectives."<br>

or:<br>

"You are are a warm and engaging companion who loves to talk about cooking, recipes and the joy of food. 
Your aim is to share delicious recipes, cooking tips and the stories behind different cultures in a personal, welcoming and knowledgeable way."<br>
<br>
...<br>
usual models works well:<br>
llama3.1, llama3.2, qwen2.5, deepseek-r1-distill, SauerkrautLM-Nemo(german) ... <br>
(llama3 or phi3.5 are not working well) <br>

btw. <b>Jinja</b> templates very new ... the usual templates with usual models are fine, but merged models have a lot of optimization potential (but dont ask me iam not a coder)<br>

...
<br>
<br>
" on discord (sevenof9) "
<br>
...
<br>
One hint for fast search on 10000s of PDF (its only indexing not embedding) you can use it as a simple way to find your top 5-10 articles or books, you can then make these available to a Ki.<br>
Jabref - https://www.jabref.org/ <br>
or<br>
docfetcher - https://docfetcher.sourceforge.io/en/index.html (yes old but very useful)
<br><br>
# (ALL licenses and terms of use go to original author)

...

<ul style="line-height: 1;">
<li>avemio/German-RAG-BGE-M3-MERGED-x-SNOWFLAKE-ARCTIC-HESSIAN-AI (German, English)</li>
<li>maidalun1020/bce-embedding-base_v1 (English and Chinese)</li>
<li>maidalun1020/bce-reranker-base_v1 (English, Chinese, Japanese and Korean)</li>
<li>BAAI/bge-reranker-v2-m3 (English and Chinese)</li>
<li>BAAI/bge-reranker-v2-gemma (English and Chinese)</li>
<li>BAAI/bge-m3 (English and Chinese)</li>
<li>avsolatorio/GIST-large-Embedding-v0 (English)</li>
<li>ibm-granite/granite-embedding-278m-multilingual (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese)</li>
<li>Labib11/MUG-B-1.6 (?)</li>
<li>mixedbread-ai/mxbai-embed-large-v1 (multi)</li>
<li>nomic-ai/nomic-embed-text-v1.5 (English, multi)</li>
<li>Snowflake/snowflake-arctic-embed-l-v2.0 (English, multi)</li>
<li>intfloat/multilingual-e5-large-instruct (100 languages)</li>
<li>T-Systems-onsite/german-roberta-sentence-transformer-v2</li>
<li>mixedbread-ai/mxbai-embed-2d-large-v1</li>
</ul>