sts-bert-hash-pico / README.md
dleemiller's picture
Update README.md
09ff6c9 verified
metadata
license: mit
datasets:
  - dleemiller/wiki-sim
  - sentence-transformers/stsb
language:
  - en
metrics:
  - spearmanr
  - pearsonr
base_model:
  - NeuML/bert-hash-pico
pipeline_tag: text-ranking
library_name: sentence-transformers
tags:
  - cross-encoder
  - modernbert
  - sts
  - stsb
  - stsbenchmark-sts
model-index:
  - name: CrossEncoder based on NeuML/bert-hash-pico
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.7594692671867559
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.747410618220483
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.8216995594169731
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8226789104514981
            name: Spearman Cosine

BERT Hash Cross-Encoder: Semantic Similarity (STS)

Cross encoders are high performing encoder models that compare two texts and output a 0-1 score. I've found the cross-encoders/roberta-large-stsb model to be very useful in creating evaluators for LLM outputs. They're simple to use, fast and very accurate.

The BERT hash uses a bucketing technique with projection to decrease the size of the embedding parameters (all <1M parameters). These models are very small and good for inference at the edge.


Features

  • Performance: Achieves Pearson: 0.7595 and Spearman: 0.7474 on the STS-Benchmark test set.
  • Efficient architecture: Based on the BERT Hash model architecture, offering lightweight models.
  • Extended context length: Processes sequences up to 8192 tokens, great for LLM output evals.
  • Diversified training: Pretrained on dleemiller/wiki-sim and fine-tuned on sentence-transformers/stsb.

Performance

Model STS-B Test Pearson STS-B Test Spearman Context Length Parameters Speed
dleemiller/ModernCE-large-sts 0.9256 0.9215 8192 395M Medium
dleemiller/CrossGemma-sts-300m 0.9175 0.9135 2048 303M Medium
dleemiller/ModernCE-base-sts 0.9162 0.9122 8192 149M Fast
cross-encoder/stsb-roberta-large 0.9147 - 512 355M Slow
dleemiller/EttinX-sts-m 0.9143 0.9102 8192 149M Fast
dleemiller/NeoCE-sts 0.9124 0.9087 4096 250M Fast
dleemiller/EttinX-sts-s 0.9004 0.8926 8192 68M Very Fast
cross-encoder/stsb-distilroberta-base 0.8792 - 512 82M Fast
dleemiller/EttinX-sts-xs 0.8763 0.8689 8192 32M Very Fast
dleemiller/EttinX-sts-xxs 0.8414 0.8311 8192 17M Very Fast
dleemiller/sts-bert-hash-nano 0.7904 0.7743 8192 0.97M Very Fast
dleemiller/sts-bert-hash-pico 0.7595 0.7474 8192 0.45M Very Fast

Usage

To use sts-bert-hash for semantic similarity tasks, you can load the model with the Hugging Face sentence-transformers library:

from sentence_transformers import CrossEncoder

# Load CrossEncoder model
model = CrossEncoder("dleemiller/sts-bert-hash-nano", trust_remote_code=True)

# Predict similarity scores for sentence pairs
sentence_pairs = [
    ("It's a wonderful day outside.", "It's so sunny today!"),
    ("It's a wonderful day outside.", "He drove to work earlier."),
]
scores = model.predict(sentence_pairs)

print(scores)  # Outputs: array([0.9184, 0.0123], dtype=float32)

Output

The model returns similarity scores in the range [0, 1], where higher scores indicate stronger semantic similarity.


Training Details

Pretraining

The model was pretrained on the pair-score-sampled subset of the dleemiller/wiki-sim dataset. This dataset provides diverse sentence pairs with semantic similarity scores, helping the model build a robust understanding of relationships between sentences.

  • Classifier Dropout: a somewhat large classifier dropout of 0.15, to reduce overreliance on teacher scores.
  • Objective: STS-B scores from dleemiller/MocernCE-large-sts.

Fine-Tuning

Fine-tuning was performed on the sentence-transformers/stsb dataset.

Validation Results

The model achieved the following test set performance after fine-tuning:

  • Pearson Correlation: 0.7595
  • Spearman Correlation: 0.7474

Model Card

  • Architecture: bert-hash-nano
  • Tokenizer: Custom tokenizer trained with modern techniques for long-context handling.
  • Pretraining Data: dleemiller/wiki-sim (pair-score-sampled)
  • Fine-Tuning Data: sentence-transformers/stsb

Thank You

Thanks to the NeuML team for providing the BERT Hash models, and the Sentence Transformers team for their leadership in transformer encoder models.


Citation

If you use this model in your research, please cite:

@misc{stsnano2025,
  author = {Miller, D. Lee},
  title = {Bert Hash STS: An STS cross encoder model},
  year = {2025},
  publisher = {Hugging Face Hub},
  url = {https://huggingface.co/dleemiller/sts-bert-hash-pico},
}

License

This model is licensed under the MIT License.