CPRetriever-Code
CPRetriever-Code is a code embedding model trained via contrastive learning for code-related retrieval tasks in competitive programming. It achieves strong performance on tasks such as:
- Text-to-Code retrieval (problem description β relevant code)
- Code-to-Code retrieval (find alternate solutions to the same problem)
This model is part of the CPRet suite for competitive programming retrieval research.
π§ Usage
You can load this model using the sentence-transformers
library:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("coldchair16/CPRetriever-Code")
embeddings = model.encode([
"def mex_query(arr):\n n = len(arr)\n seen = set()\n for i in range(n):\n seen.add(arr[i])\n i = 0\n while True:\n if i not in seen:\n return i\n i += 1"
])
π‘ Applications
This model is optimized for code-level semantic retrieval in competitive programming settings:
- Text-to-Code: Retrieve relevant code snippets given a natural language problem description.
- Code-to-Code: Retrieve alternative implementations of the same problem.
It is particularly effective for analyzing programming contest submissions, searching solution variants, and building educational tools for code understanding.
π Training and Evaluation
CPRetriever-Code is trained via contrastive learning using positive and hard negative code pairs derived from CPRet-data.
For the training pipeline, see the full project: π CPRet on GitHub
π¦ Model Card
- Architecture:
Salesforce/SFR-Embedding-Code-2B_R
(encoder backbone) - Training: Contrastive objective on code/code and text/code pairs
- Format: Compatible with
sentence-transformers
- Downloads last month
- 39