Update README.md

5bef2cd verified 3 months ago

6.63 kB

	---
	library_name: transformers
	license: mit
	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	tags:
	- generated_from_trainer
	- conversational
	- instruction-tuned
	- smoltalk
	datasets:
	- HuggingFaceTB/smoltalk
	metrics:
	- MMLU
	language:
	- en

	model-index:
	- name: DeepSeek-R1-Distill-Qwen-1.5B-finetuned-smoltalk-everyday-conversations
	results:
	- task:
	name: Text Generation
	type: text-generation
	dataset:
	name: HuggingFaceTB/smoltalk
	type: HuggingFaceTB/smoltalk
	metrics:
	- name: MMLU-PEM (0-shot)
	type: MMLU-PEM (0-shot)
	value: 0.2749
	---

	# Model Card for DeepSeek-R1-SmolTalk

	This model is a fine-tuned version of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) on the [SmolTalk dataset](https://huggingface.co/datasets/HuggingFaceTB/smoltalk). It is optimized for small-scale, friendly, and engaging instruction-following dialogue.

	## Model Details

	### Model Description

	This model builds on DeepSeek's distilled Qwen-1.5B architecture and is trained for conversational tasks using the SmolTalk dataset. The goal is to create a lightweight, instruction-following model suitable for use in chatbots or lightweight assistants with limited hardware resources.

	- Model type: Instruction-tuned causal decoder (chat)
	- Language(s): English
	- License: MIT
	- Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B


	## Uses

	### Direct Use

	This model can be used as a lightweight assistant or chatbot in applications such as:

	- Embedded conversational interfaces
	- Educational or toy assistants
	- Small devices or local applications

	### Downstream Use

	The model can be further fine-tuned or integrated into larger conversational systems, especially where resource efficiency is crucial.

	### Out-of-Scope Use

	- Not suitable for tasks requiring deep factual accuracy or reasoning
	- Should not be used for sensitive or high-stakes decision making
	- Not designed for multilingual use

	## Bias, Risks, and Limitations

	Due to the small model size and dataset limitations:

	- May produce generic or incorrect outputs
	- Can reflect biases present in the training dataset
	- Not guaranteed to be safe for all user demographics or use cases

	### Recommendations

	- Use in controlled or sandboxed environments
	- Consider integrating content moderation or rule-based filtering
	- Do not deploy in contexts requiring factual correctness or ethical judgment

	## How to Get Started with the Model

	```Python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("avanishd/DeepSeek-R1-Distill-Qwen-1.5B-finetuned-smoltalk-everyday-conversations")
	tokenizer = AutoTokenizer.from_pretrained("avanishd/DeepSeek-R1-Distill-Qwen-1.5B-finetuned-smoltalk-everyday-conversations")

	input_text = "Hi there! What can you do?"
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	Used [SmolTalk dataset](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), a dataset of lightweight, instruction-style conversations. The dataset is designed to help models learn concise, friendly, and helpful interactions.

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing [optional]

	Used the DeepSeek tokenizer

	#### LoRA Configuration

	- rank: 6
	- alpha: 12
	- dropout: 0.05
	- bias: none
	- target: linear

	#### Training Hyperparameters

	The following hyperparameters were used during training:

	- learning_rate: 2e-04
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 2
	- gradient_clipping: 0.3
	- total_train_batch_size: 128
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_ratio: 0.03
	- num_epochs: 1
	- mixed_precision_training: bf16

	#### Speeds, Sizes, Times [optional]

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	[More Information Needed]

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	[More Information Needed]

	#### Factors

	<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

	[More Information Needed]

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]

	#### Summary



	## Model Examination [optional]

	<!-- Relevant interpretability work for the model goes here -->

	[More Information Needed]

	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: [More Information Needed]
	- Hours used: [More Information Needed]
	- Cloud Provider: [More Information Needed]
	- Compute Region: [More Information Needed]
	- Carbon Emitted: [More Information Needed]

	## Technical Specifications [optional]

	### Model Architecture and Objective

	[More Information Needed]

	### Compute Infrastructure

	[More Information Needed]

	#### Hardware

	[More Information Needed]

	#### Software

	[More Information Needed]

	## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed]

	## Glossary [optional]

	<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

	[More Information Needed]

	## More Information [optional]

	[More Information Needed]

	## Model Card Authors [optional]

	[More Information Needed]

	## Model Card Contact

	[More Information Needed]

	fill this model card