Qodo-Embed-1
Qodo-Embed-1 is a state-of-the-art code embedding model designed for retrieval tasks in the software development domain. It is offered in two sizes: lite (1.5B) and medium (7B). The model is optimized for natural language-to-code and code-to-code retrieval, making it highly effective for applications such as code search, retrieval-augmented generation (RAG), and contextual understanding of programming languages. This model outperforms all previous open-source models in the COIR and MTab leaderboards, achieving best-in-class performance with a significantly smaller size compared to competing models.
Languages Supported:
- Python
- C++
- C#
- Go
- Java
- Javascript
- PHP
- Ruby
- Typescript
Model Information
- Model Size: 1.5B
- Embedding Dimension: 1536
- Max Input Tokens: 32k
Requirements
transformers>=4.39.2
flash_attn>=2.5.6
Usage
Sentence Transformers
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Qodo/Qodo-Embed-1-Lite")
# Run inference
sentences = [
'accumulator = sum(item.value for item in collection)',
'result = reduce(lambda acc, curr: acc + curr.amount, data, 0)',
'matrix = [[i*j for j in range(n)] for i in range(n)]'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1536]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Transformers
import torch
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel
def last_token_pool(last_hidden_states: Tensor,
attention_mask: Tensor) -> Tensor:
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
if left_padding:
return last_hidden_states[:, -1]
else:
sequence_lengths = attention_mask.sum(dim=1) - 1
batch_size = last_hidden_states.shape[0]
return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
# Each query must come with a one-sentence instruction that describes the task
queries = [
'how to handle memory efficient data streaming',
'implement binary tree traversal'
]
documents = [
"""def process_in_chunks():
buffer = deque(maxlen=1000)
for record in source_iterator:
buffer.append(transform(record))
if len(buffer) >= 1000:
yield from buffer
buffer.clear()""",
"""class LazyLoader:
def __init__(self, source):
self.generator = iter(source)
self._cache = []
def next_batch(self, size=100):
while len(self._cache) < size:
try:
self._cache.append(next(self.generator))
except StopIteration:
break
return self._cache.pop(0) if self._cache else None""",
"""def dfs_recursive(root):
if not root:
return []
stack = []
stack.extend(dfs_recursive(root.right))
stack.append(root.val)
stack.extend(dfs_recursive(root.left))
return stack"""
]
input_texts = queries + documents
tokenizer = AutoTokenizer.from_pretrained('Qodo/Qodo-Embed-1-Lite', trust_remote_code=True)
model = AutoModel.from_pretrained('Qodo/Qodo-Embed-1-Lite', trust_remote_code=True)
max_length = 8192
# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt')
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T) * 100
print(scores.tolist())
license
QodoAI-Open-RAIL-M
- Downloads last month
- 23
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for Qodo/Qodo-Embed-1-1.5B
Base model
Alibaba-NLP/gte-Qwen2-1.5B-instruct