--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - embedder - embedding - models - GGUF - Bert - Nomic - Gist - Granite - BGE - Jina - Qwen - text-embeddings-inference - RAG - Rerank - similarity - PDF - Parsing - Parser misc: - text-embeddings-inference language: - en - de architecture: --- # All models tested with ALLM(AnythingLLM) with LM-Studio as server, all models should be work with ollama the setup for local documents described below is allmost the same, GPT4All has only one model (nomic), and koboldcpp is not build in right now but in development
(sometimes the results are more truthful if the “chat with document only” option is used)
BTW embedder is only a part of a good RAG
⇨ give me a ❤️, if you like ;)

My short impression:

nomic-embed-text (up to 2048t context length)
mxbai-embed-large
mug-b-1.6
snowflake-arctic-embed-l-v2.0 (up to 8192t context length)
Ger-RAG-BGE-M3 (german, up to 8192t context length)
german-roberta
bge-m3 (up to 8192t context length)

Working well, all other its up to you! Some models are very similar! (jina and qwen based you can add manual to LM-Studio, set model "gear wheel" below "overide domain type")
With the same setting, these embedders found same 6-7 snippets out of 10 from a book. This means that only 3-4 snippets were different, but I didn't test it extensively.
Further tests have shown that the following models are suitable for complex tasks (German-text, but should be similar in English). Jina-DE, nomic was not that good.

GTE large
cross-en-de-es-roberta

1200t (~1000 words ~5000 chracter) ~0.1GB, this is aprox one page with small font
8000t (~6000 words) ~0.8GB VRAM usage
16000t (~12000 words) ~1.5GB VRAM usage
32000t (~24000 words) ~3GB VRAM usage

https://quizgecko.com/tools/token-counter

https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

If, for example, the word “XYZ” occurs 50 times in one file, not all 50 are used for answer, only the number of snippets with a fast ranking are used
If only one snippet corresponds to your question all other snippets can negatively influence your answer because they do not fit the topic (usually 4 to 32 snippet are fine)
If you expect multible search results in your docs try 16-snippets or more, if you expect only 2 than dont use more!
If you use chunk-length ~2048(chars) you receive more content, if you use ~512chars you receive more facts BUT lower chunk-length are more chunks and need much longer time.
A question for "summary of the document" is most time not useful, if the document has an introduction or summaries its searching there if you have luck.
If a book has a table of contents or a bibliography, I would delete these pages as they often contain relevant search terms but do not help answer your question.
If the documents small like 10-20 Pages, its better you copy the whole text inside the CHAT, some options called "pin".

main model is also important

⇨

The system prompt is weighted with a certain amount of influence around your question. You can easily test it once without or with a nonsensical system prompt.

(change it for your needs, this example works well when I consult a book about a person and a term related to them, the explanation part was just a test for myself)

Jinja

pdfplumber
fitz/PyMuPDF
Camelot

https://huggingface.co/kalle07/pdf2txt_parser_converter

docling - (opensource on github)

https://github.com/docling-project/docling/tree/main/docs/examples

fitz

Parsemy PDF

https://github.com/genieincodebottle/parsemypdf

sevenof9

avemio/German-RAG-BGE-M3-MERGED-x-SNOWFLAKE-ARCTIC-HESSIAN-AI (German, English)
maidalun1020/bce-embedding-base_v1 (English and Chinese)
maidalun1020/bce-reranker-base_v1 (English, Chinese, Japanese and Korean)
BAAI/bge-reranker-v2-m3 (English and Chinese)
BAAI/bge-reranker-v2-gemma (English and Chinese)
BAAI/bge-m3 (English and Chinese)
avsolatorio/GIST-large-Embedding-v0 (English)
ibm-granite/granite-embedding-278m-multilingual (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese)
ibm-granite/granite-embedding-125m-english
Labib11/MUG-B-1.6 (?)
mixedbread-ai/mxbai-embed-large-v1 (multi)
nomic-ai/nomic-embed-text-v1.5 (English, multi)
Snowflake/snowflake-arctic-embed-l-v2.0 (English, multi)
intfloat/multilingual-e5-large-instruct (100 languages)
T-Systems-onsite/german-roberta-sentence-transformer-v2
T-Systems-onsite/cross-en-de-es-roberta-sentence-transformer (English, German, Spanish)
mixedbread-ai/mxbai-embed-2d-large-v1
jinaai/jina-embeddings-v2-base-en
Qwen/Qwen3-Embedding-0.6B
HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5
thenlper/gte-large