jinaai
/

jina-code-embeddings-1.5b-GGUF

GGUF

🇪🇺 Region: EU

Model card Files Files and versions

xet

Community

ajinauser commited on Sep 3

Commit

4f48f9b

verified ·

1 Parent(s): 67f160a

Create README.md

Browse files

Files changed (1) hide show

README.md +186 -0

README.md ADDED Viewed

	@@ -0,0 +1,186 @@

+---
+base_model:
+- jinaai/jina-code-embeddings-1.5b
+base_model_relation: quantized
+license: cc-by-nc-4.0
+---
+<p align="center">
+ <img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
+</p>
+<p align="center">
+ <b>The GGUF version of the code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
+</p>
+# Jina Code Embeddings: A Small but Performant Code Embedding Model
+## Intended Usage & Model Info
+`jina-code-embeddings-1.5b-GGUF` is the **GGUF export** of our [jina-code-embeddings-1.5b](https://huggingface.co/jinaai/jina-code-embeddings-1.5b), built on [Qwen/Qwen2.5-Coder-1.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B).
+The model supports code retrieval and technical QA across **15+ programming languages** and multiple domains, including web development, software development, machine learning, data science, and educational coding problems.
+### Key Features
+| Feature                | Jina Code Embeddings 1.5B GGUF |
+|------------------------|--------------------------------|
+| Base Model             | Qwen2.5-Coder-1.5B             |
+| Supported Tasks        | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` |
+| Max Sequence Length    | 32768 (**recommended ≤ 8192**) |
+| Embedding Vector Dim   | **896**                        |
+| Matryoshka Dimensions  | 64, 128, 256, 512, 896 (**client-side slice**) |
+| Pooling Strategy       | **MUST use `--pooling last`** (EOS) |
+> **Matryoshka note:** `llama.cpp` always returns **896-d** embeddings for this model. To use 64/128/256/512, **slice client-side** (e.g., take the first *k* elements).
+---
+## Task Instructions
+Prefix inputs with task-specific instructions:
+```python
+INSTRUCTION_CONFIG = {
+  "nl2code": {
+    "query": "Find the most relevant code snippet given the following query:\n",
+    "passage": "Candidate code snippet:\n"
+  },
+  "qa": {
+    "query": "Find the most relevant answer given the following question:\n",
+    "passage": "Candidate answer:\n"
+  },
+  "code2code": {
+    "query": "Find an equivalent code snippet given the following code snippet:\n",
+    "passage": "Candidate code snippet:\n"
+  },
+  "code2nl": {
+    "query": "Find the most relevant comment given the following code snippet:\n",
+    "passage": "Candidate comment:\n"
+  },
+  "code2completion": {
+    "query": "Find the most relevant completion given the following start of code snippet:\n",
+    "passage": "Candidate completion:\n"
+  }
+}
+````
+Use the appropriate prefix for **queries** and **passages** at inference time.
+---
+## Install `llama.cpp`
+Follow the official instructions: **[https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)**
+---
+## Model files
+Hugging Face repo (GGUF): **[https://huggingface.co/jinaai/jina-code-embeddings-1.5b-GGUF](https://huggingface.co/jinaai/jina-code-embeddings-1.5b-GGUF)**
+Pick a file (e.g., `jina-code-embeddings-1.5b-F16.gguf`). You can either:
+* **auto-download** by passing the **repo and file directly** to `llama.cpp`
+* **use a local path** with `-m`
+---
+## A) CLI embeddings with `llama-embedding`
+### Auto-download from Hugging Face (repo + file)
+```bash
+./llama-embedding \
+  --hf-repo jinaai/jina-code-embeddings-1.5b-GGUF \
+  --hf-file jina-code-embeddings-1.5b-F16.gguf \
+  --pooling last \
+  -p "Find the most relevant code snippet given the following query:
+print hello world in python"
+```
+### Local file
+```bash
+./llama-embedding \
+  -m /path/to/jina-code-embeddings-1.5b-F16.gguf \
+  --pooling last \
+  -p "Find the most relevant code snippet given the following query:
+print hello world in python"
+```
+> Outputs a single **896-d** vector to stdout. For smaller sizes, slice client-side.
+---
+## B) HTTP service with `llama-server`
+### Auto-download from Hugging Face (repo + file)
+```bash
+./llama-server \
+  --embedding \
+  --hf-repo jinaai/jina-code-embeddings-1.5b-GGUF \
+  --hf-file jina-code-embeddings-1.5b-F16.gguf \
+  --host 0.0.0.0 \
+  --port 8080 \
+  --ctx-size 32768 \
+  --ubatch-size 8192 \
+  --pooling last
+```
+### Local file
+```bash
+./llama-server \
+  --embedding \
+  -m /path/to/jina-code-embeddings-1.5b-F16.gguf \
+  --host 0.0.0.0 \
+  --port 8080 \
+  --ctx-size 32768 \
+  --ubatch-size 8192 \
+  --pooling last
+```
+> Tips: `-ngl <N>` to offload layers to GPU. Max context is 32768 but stick to `--ubatch-size` ≤ 8192 for best results.
+---
+## Query examples (HTTP)
+### Native endpoint (`/embedding`)
+```bash
+curl -X POST http://localhost:8080/embedding \
+  -H "Content-Type: application/json" \
+  -d '{
+        "content": [
+          "Find the most relevant code snippet given the following query:\nprint hello world in python",
+          "Candidate code snippet:\nprint(\"Hello World!\")"
+        ]
+      }'
+```
+### OpenAI-compatible (`/v1/embeddings`)
+```bash
+curl http://localhost:8080/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -d '{
+        "input": [
+          "Find the most relevant code snippet given the following query:\nprint hello world in python",
+          "Candidate code snippet:\nprint(\"Hello World!\")"
+        ]
+      }'
+```
+---
+## Training & Evaluation
+See our technical report: **[https://arxiv.org/abs/2508.21290](https://arxiv.org/abs/2508.21290)**
+---
+## Contact
+Join our Discord: **[https://discord.jina.ai](https://discord.jina.ai)**