Sniff-0.6B / README.md
marcuscedricridia's picture
Update README.md
717d5a5 verified
metadata
license: apache-2.0
language:
  - en
base_model:
  - Qwen/Qwen3-0.6B-Base

Sniff-0.6B by Noumenon Labs

Sniff-0.6B is an AI-generated text detection model built by Noumenon Labs, fine-tuned from Qwen3-0.6B. It’s trained to classify text as either AI-Generated or Human-Written.

Sniff-0.6B achieves 76.2% accuracy on our internal benchmark of 500 mixed samples. However, its performance tells a specific story:

  • AI Recall: 1.00 – The model catches every single AI-generated text.
  • Human Precision: 1.00 – When it predicts “Human-Written,” it is always correct.
  • But...
    • Human Recall is only 0.58 – 42% of human-written texts are incorrectly flagged as AI.
    • AI Precision is 0.65 – 35% of texts flagged as AI were actually written by humans.

Interpretation

Sniff is highly conservative. It rarely makes false negatives (it won’t miss AI), but it generates many false positives (flagging human texts as AI). This behavior is useful in low-risk environments where it's better to overflag than underflag — such as filtering bots or moderation tasks.

However, Sniff is not recommended for high-stakes use cases like education or academic integrity tools, where a single false accusation can have serious consequences.


Classification Report

          CLASSIFICATION REPORT
==================================================
Overall Accuracy: 0.7619

               precision    recall  f1-score   support

 AI-Generated       0.65      1.00      0.78        82
Human-Written       1.00      0.58      0.73       107

     accuracy                           0.76       189
    macro avg       0.82      0.79      0.76       189
 weighted avg       0.85      0.76      0.76       189

Model Use Case Recommendation

Goal Fit
Flagging suspected AI content in forums
Pre-filtering submissions for human review
Detecting academic dishonesty
Certifying authorship or originality

Next Steps for Future Versions

  • Improve human-text recall by increasing diversity and complexity in training data.
  • Balance aggressive detection with higher tolerance for creative or simple human writing.
  • Explore prompt tuning and deeper fine-tuning to soften the rigid behavior.