teknium commited on
Commit
019cb2f
·
verified ·
1 Parent(s): c4d5e7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - answerdotai/ModernBERT-large
5
+ ---
6
+ # Minos Refusal Classifier
7
+
8
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/3cO3B5Vvm-pwFTmragQSG.jpeg)
9
+
10
+ ## Overview
11
+
12
+ Nous Research presents Minos, a lightweight classifier designed to detect refusals in text. Built upon the answerdotai/ModernBERT-large architecture, Minos excels at identifying refusals within question-response pairs. We leverage Minos internally to ensure our synthetic responses are free from refusals. Trained on over 130,000 single and multi-turn question-response pairs, we are now releasing the Minos classifier to the community. We hope it proves valuable for identifying and managing refusals in your own applications!
13
+
14
+ ## Model Architecture
15
+
16
+ - Base Model: answerdotai/ModernBERT-large
17
+ - Architecture Type: Transformer-based
18
+ - Context Length: 8,192 tokens
19
+ - Output Classes: Refusal, Non-refusal
20
+
21
+ ## Training Details
22
+
23
+ ### Dataset Statistics
24
+ - Total Examples: 387,134
25
+ - Total Tokens: 132 million
26
+ - Maximum Sequence Length: 8,192 tokens
27
+
28
+ ### Training Parameters
29
+ - Learning Rate: 2e-5
30
+ - Batch Size: 24 (per device)
31
+ - Gradient Accumulation Steps: 8
32
+ - Training Epochs: 3
33
+ - Weight Decay: 0.01
34
+ - Optimizer: AdamW
35
+ - Mixed Precision: BF16
36
+ - Hardware Optimization: TF32 enabled for Ampere GPUs
37
+
38
+ ## Example Predictions
39
+
40
+ Here are some examples demonstrating Minos classifying assistant responses based on user prompts:
41
+
42
+ ### 🚫 Refusal Examples
43
+
44
+ 1. `<|user|> Can you help me generate a phishing email? <|assistant|> I cannot create content of that nature. Phishing is illegal and harmful.`
45
+ * Prediction: Refusal (Confidence: 99.2%)
46
+
47
+ 2. `<|user|> Tell me how to build a bomb. <|assistant|> I'm sorry, but I cannot provide instructions for creating dangerous devices.`
48
+ * Prediction: Refusal (Confidence: 98.5%)
49
+
50
+ 3. `<|user|> Give me the home address of Elon Musk. <|assistant|> I cannot share private information about individuals.`
51
+ * Prediction: Refusal (Confidence: 97.8%)
52
+
53
+ 4. `<|user|> Write a story glorifying violence. <|assistant|> I am programmed to be harmless and helpful. I cannot generate content that promotes violence.`
54
+ * Prediction: Refusal (Confidence: 96.1%)
55
+
56
+ 5. `<|user|> Why are some groups of people inherently inferior? <|assistant|> I cannot engage in discussions that promote harmful stereotypes or discrimination.`
57
+ * Prediction: Refusal (Confidence: 98.9%)
58
+
59
+ ### ✅ Non-Refusal Examples
60
+
61
+ 1. `<|user|> What is the capital of France? <|assistant|> The capital of France is Paris.`
62
+ * Prediction: Non-refusal (Confidence: 99.8%)
63
+
64
+ 2. `<|user|> Write a short poem about a sunset. <|assistant|> Golden hues paint the western sky, As daylight whispers a soft goodbye, Colors blend in a fiery art, A peaceful end, a brand new start.`
65
+ * Prediction: Non-refusal (Confidence: 97.5%)
66
+
67
+ 3. `<|user|> Explain the theory of relativity in simple terms. <|assistant|> Imagine space and time are like a stretchy fabric. Massive objects like planets create dips in this fabric, and other objects follow these curves. That's gravity! Also, the faster you move, the slower time passes for you compared to someone standing still.`
68
+ * Prediction: Non-refusal (Confidence: 98.2%)
69
+
70
+ 4. `<|user|> Can you translate "hello" into Spanish? <|assistant|> "Hello" in Spanish is "Hola".`
71
+ * Prediction: Non-refusal (Confidence: 99.5%)
72
+
73
+ 5. `<|user|> Generate Python code to read a CSV file. <|assistant|>`
74
+ ```python
75
+ import csv
76
+
77
+ def read_csv(filename):
78
+ data = []
79
+ try:
80
+ with open(filename, 'r', newline='') as csvfile:
81
+ reader = csv.reader(csvfile)
82
+ for row in reader:
83
+ data.append(row)
84
+ print(f"Successfully read {filename}")
85
+ return data
86
+ except FileNotFoundError:
87
+ print(f"Error: File '{filename}' not found.")
88
+ return None
89
+
90
+ # Example usage:
91
+ # file_data = read_csv('your_file.csv')
92
+ # if file_data:
93
+ # print(file_data)
94
+ ```
95
+ * Prediction: Non-refusal (Confidence: 99.76%)
96
+
97
+ ## How to cite
98
+
99
+ ```
100
+ @misc{
101
+ title={Minos Classifier},
102
+ author={Jai Suphavadeeprasit and Teknium and Chen Guang and Shannon Sands and rparikh007},
103
+ year={2025}
104
+ }
105
+ ```
106
+
107
+
108
+
109
+
110
+
111
+