avanishd commited on
Commit
45a3d2d
·
verified ·
1 Parent(s): b363ebd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -44
README.md CHANGED
@@ -1,77 +1,81 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
 
 
 
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
  ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
 
 
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
 
52
  ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
 
 
63
 
64
  ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
  ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
 
73
 
74
- [More Information Needed]
 
 
 
 
 
 
 
75
 
76
  ## Training Details
77
 
@@ -79,7 +83,7 @@ Use the code below to get started with the model.
79
 
80
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
83
 
84
  ### Training Procedure
85
 
@@ -87,12 +91,32 @@ Use the code below to get started with the model.
87
 
88
  #### Preprocessing [optional]
89
 
90
- [More Information Needed]
 
 
91
 
 
 
 
 
 
92
 
93
  #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  #### Speeds, Sizes, Times [optional]
98
 
@@ -196,4 +220,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
5
+ tags:
6
+ - generated_from_trainer
7
+ - conversational
8
+ - instruction-tuned
9
+ - smoltalk
10
+ datasets:
11
+ - HuggingFaceTB/smoltalk
12
+ language:
13
+ - en
14
  ---
15
 
16
+ # Model Card for DeepSeek-R1-SmolTalk
 
 
 
17
 
18
+ This model is a fine-tuned version of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) on the [SmolTalk dataset](https://huggingface.co/datasets/HuggingFaceTB/smoltalk). It is optimized for small-scale, friendly, and engaging instruction-following dialogue.
19
 
20
  ## Model Details
21
 
22
  ### Model Description
23
 
24
+ This model builds on DeepSeek's distilled Qwen-1.5B architecture and is trained for conversational tasks using the SmolTalk dataset. The goal is to create a lightweight, instruction-following model suitable for use in chatbots or lightweight assistants with limited hardware resources.
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
+ - **Model type:** Instruction-tuned causal decoder (chat)
27
+ - **Language(s):** English
28
+ - **License:** MIT
29
+ - **Finetuned from model:** deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
30
 
 
 
 
31
 
32
  ## Uses
33
 
 
 
34
  ### Direct Use
35
 
36
+ This model can be used as a lightweight assistant or chatbot in applications such as:
37
 
38
+ - Embedded conversational interfaces
39
+ - Educational or toy assistants
40
+ - Small devices or local applications
41
 
42
+ ### Downstream Use
43
 
44
+ The model can be further fine-tuned or integrated into larger conversational systems, especially where resource efficiency is crucial.
 
 
45
 
46
  ### Out-of-Scope Use
47
 
48
+ - Not suitable for tasks requiring deep factual accuracy or reasoning
49
+ - Should not be used for sensitive or high-stakes decision making
50
+ - Not designed for multilingual use
51
 
52
  ## Bias, Risks, and Limitations
53
 
54
+ Due to the small model size and dataset limitations:
55
 
56
+ - May produce generic or incorrect outputs
57
+ - Can reflect biases present in the training dataset
58
+ - Not guaranteed to be safe for all user demographics or use cases
59
 
60
  ### Recommendations
61
 
62
+ - Use in controlled or sandboxed environments
63
+ - Consider integrating content moderation or rule-based filtering
64
+ - Do not deploy in contexts requiring factual correctness or ethical judgment
65
 
66
  ## How to Get Started with the Model
67
 
68
+ ```Python
69
+ from transformers import AutoModelForCausalLM, AutoTokenizer
70
 
71
+ model = AutoModelForCausalLM.from_pretrained("your-username/your-model-id")
72
+ tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-id")
73
+
74
+ input_text = "Hi there! What can you do?"
75
+ inputs = tokenizer(input_text, return_tensors="pt")
76
+ outputs = model.generate(**inputs, max_new_tokens=100)
77
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
78
+ ```
79
 
80
  ## Training Details
81
 
 
83
 
84
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
 
86
+ Used [SmolTalk dataset](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), a dataset of lightweight, instruction-style conversations. The dataset is designed to help models learn concise, friendly, and helpful interactions.
87
 
88
  ### Training Procedure
89
 
 
91
 
92
  #### Preprocessing [optional]
93
 
94
+ Used the DeepSeek tokenizer
95
+
96
+ #### LoRA Configuration
97
 
98
+ - rank: 6
99
+ - alpha: 12
100
+ - dropout: 0.05
101
+ - bias: none
102
+ - target: linear
103
 
104
  #### Training Hyperparameters
105
 
106
+ The following hyperparameters were used during training:
107
+
108
+ - learning_rate: 2e-04
109
+ - train_batch_size: 2
110
+ - eval_batch_size: 2
111
+ - seed: 42
112
+ - gradient_accumulation_steps: 2
113
+ - gradient_clipping: 0.3
114
+ - total_train_batch_size: 128
115
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED
116
+ - lr_scheduler_type: constant
117
+ - lr_scheduler_warmup_ratio: 0.03
118
+ - num_epochs: 1
119
+ - mixed_precision_training: bf16
120
 
121
  #### Speeds, Sizes, Times [optional]
122
 
 
220
 
221
  ## Model Card Contact
222
 
223
+ [More Information Needed]
224
+
225
+ fill this model card