CodeSearch-ModernBERT-Owl-Plus / README.md

Shuu12121

Update README.md

cb0edfe verified 5 months ago

preview code

raw

history blame contribute delete

3.03 kB

metadata

tags:
  - sentence-transformers
  - sentence-similarity
  - code
  - python
  - php
  - javascript
  - ruby
  - rust
  - go
  - java
base_model: Shuu12121/CodeModernBERT-Owl-1.0
pipeline_tag: sentence-similarity
library_name: sentence-transformers
license: apache-2.0
language:
  - en
datasets:
  - Shuu12121/python-codesearch-filtered
  - Shuu12121/java-codesearch-filtered
  - Shuu12121/javascript-codesearch-filtered
  - Shuu12121/go-codesearch-filtered
  - Shuu12121/php-codesearch-filtered
  - Shuu12121/ruby-codesearch-filtered
  - Shuu12121/rust-codesearch-filtered
  - code-search-net/code_search_net

🦉 CodeSearch-ModernBERT-Owl-Plus: High-Performance Sentence-BERT for Code Search

CodeSearch-ModernBERT-Owl-Plus is a high-performance code search model fine-tuned in a Sentence-BERT architecture, based on the pretrained CodeModernBERT-Owl v1.0.

This model is optimized for function-level search within codebases and natural language queries, achieving state-of-the-art results on the MTEB benchmark.

🛠 Features

✅ Fine-tuned in Sentence-BERT format from CodeModernBERT-Owl
✅ Supports multiple languages (Python, Java, JavaScript, etc.)
✅ Specialized encoder for high-accuracy code search
✅ Ideal for multi-stage (dual encoder) retrieval setups
✅ Generates rich semantic embeddings for code and queries

📊 Evaluation on MTEB Benchmark

🏆 Main Scores in MTEB

This model achieved the following main scores (based on NDCG@10):

CodeSearchNetRetrieval: main_score = 0.8918
COIR-CodeSearchNetRetrieval: main_score = 0.8013

🧪 CodeSearchNetRetrieval (MTEB)

Metric	Score
MRR@10	0.8704
NDCG@10	0.8918
MAP@10	0.8704
Recall@10	0.9563
Precision@10	0.0956

This model achieves strong performance across all ranking metrics and demonstrates balanced retrieval capability.

🧪 COIR-CodeSearchNetRetrieval (MTEB)

Metric	Score
MRR@10	0.7751
NDCG@10	0.8013
MAP@10	0.7751
Recall@10	0.8826
Precision@10	0.0883

Robust and consistent performance is also maintained on the COIR dataset, demonstrating strong generalization.

📥 Usage Example

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Shuu12121/CodeSearch-ModernBERT-Owl-Plus")
embeddings = model.encode(["binary search function", "def binary_search(arr, target): ..."])

📝 Conclusion

✅ An optimized Sentence-BERT model based on CodeModernBERT-Owl
✅ Achieves MRR@10 > 0.87 on MTEB CodeSearchNetRetrieval
✅ Ready for integration in production-level code search systems

📜 License

📄 Apache-2.0

📧 Contact

For questions or inquiries, feel free to reach out: 📧 [email protected]