Shuu12121's picture
Update README.md
cb0edfe verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - code
  - python
  - php
  - javascript
  - ruby
  - rust
  - go
  - java
base_model: Shuu12121/CodeModernBERT-Owl-1.0
pipeline_tag: sentence-similarity
library_name: sentence-transformers
license: apache-2.0
language:
  - en
datasets:
  - Shuu12121/python-codesearch-filtered
  - Shuu12121/java-codesearch-filtered
  - Shuu12121/javascript-codesearch-filtered
  - Shuu12121/go-codesearch-filtered
  - Shuu12121/php-codesearch-filtered
  - Shuu12121/ruby-codesearch-filtered
  - Shuu12121/rust-codesearch-filtered
  - code-search-net/code_search_net

πŸ¦‰ CodeSearch-ModernBERT-Owl-Plus: High-Performance Sentence-BERT for Code Search

CodeSearch-ModernBERT-Owl-Plus is a high-performance code search model fine-tuned in a Sentence-BERT architecture, based on the pretrained CodeModernBERT-Owl v1.0.

This model is optimized for function-level search within codebases and natural language queries, achieving state-of-the-art results on the MTEB benchmark.


πŸ›  Features

  • βœ… Fine-tuned in Sentence-BERT format from CodeModernBERT-Owl
  • βœ… Supports multiple languages (Python, Java, JavaScript, etc.)
  • βœ… Specialized encoder for high-accuracy code search
  • βœ… Ideal for multi-stage (dual encoder) retrieval setups
  • βœ… Generates rich semantic embeddings for code and queries

πŸ“Š Evaluation on MTEB Benchmark

πŸ† Main Scores in MTEB

This model achieved the following main scores (based on NDCG@10):

  • CodeSearchNetRetrieval: main_score = 0.8918
  • COIR-CodeSearchNetRetrieval: main_score = 0.8013

πŸ§ͺ CodeSearchNetRetrieval (MTEB)

Metric Score
MRR@10 0.8704
NDCG@10 0.8918
MAP@10 0.8704
Recall@10 0.9563
Precision@10 0.0956

This model achieves strong performance across all ranking metrics and demonstrates balanced retrieval capability.


πŸ§ͺ COIR-CodeSearchNetRetrieval (MTEB)

Metric Score
MRR@10 0.7751
NDCG@10 0.8013
MAP@10 0.7751
Recall@10 0.8826
Precision@10 0.0883

Robust and consistent performance is also maintained on the COIR dataset, demonstrating strong generalization.


πŸ“₯ Usage Example

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Shuu12121/CodeSearch-ModernBERT-Owl-Plus")
embeddings = model.encode(["binary search function", "def binary_search(arr, target): ..."])

πŸ“ Conclusion

  • βœ… An optimized Sentence-BERT model based on CodeModernBERT-Owl
  • βœ… Achieves MRR@10 > 0.87 on MTEB CodeSearchNetRetrieval
  • βœ… Ready for integration in production-level code search systems

πŸ“œ License

πŸ“„ Apache-2.0

πŸ“§ Contact

For questions or inquiries, feel free to reach out: πŸ“§ [email protected]