SmolLM2_135M_Grpo_Gsm8k

SmolLM2_135M_Grpo_Gsm8k is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.

SmolLM2 135M Grpo Fine-tuning

Resource	Link
Fine-tuning Script	SmolLM_x_Grpo.ipynb
Fine-tuned Model	SmolLM2_135M_Grpo_Gsm8k
Fine-tuned Checkpoint	SmolLM2_135M_Grpo_Checkpoint

How to use

Transformers

pip install transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "prithivMLmods/SmolLM2_135M_Grpo_Gsm8k"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is gravity?"}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Limitations of SmolLM2_135M_Grpo_Gsm8k

Model Size: The model is based on the 135M parameter size, which, while powerful, still limits its ability to handle extremely complex tasks or long-context dependencies compared to larger models. It may struggle with tasks requiring deep understanding of intricate details or long-range reasoning.
Bias and Inaccuracy: Although fine-tuned on diverse datasets, the model may still generate biased, inaccurate, or factually incorrect responses. This can occur especially when the model is asked to make inferences outside its training data scope or when responding to questions that require specialized knowledge beyond its token budget.
Context Length: Due to the model's parameter count and token limitations, it may struggle with handling very long conversations or contexts that exceed its processing capacity, potentially leading to truncation or incomplete answers.
Fine-Tuning Specificity: While fine-tuned on curated datasets, it may not always perform as well on highly specialized domains unless additional fine-tuning is applied or domain-specific data is included.
Generalization: As a smaller model, SmolLM2_135M_Grpo_Gsm8k may not generalize as well as larger models to unseen tasks or rare queries. Its responses could be overly generic or fail to grasp nuances in complex scenarios.
Limited Multi-turn Conversations: While it can manage basic multi-turn conversations, its performance might degrade as the conversation length increases, as it might lose track of context or produce repetitive responses.

Intended Use of SmolLM2_135M_Grpo_Gsm8k

General-purpose Conversational AI: The model is designed to excel at basic conversational tasks, such as answering general knowledge questions, providing explanations, and offering context-based responses. It's ideal for small to medium-sized chatbots and interactive virtual assistants.
Education & Tutoring: The model can be used in educational applications where it can assist with answering questions, explaining concepts, and helping users with learning new topics across various domains.
Content Generation: It can generate short-form content, including text snippets, outlines, or ideas, making it suitable for writing assistants, idea generation tools, or brainstorming applications.
Code Assistance: Given its fine-tuning on programming datasets, it can assist with code-related tasks, debugging, and providing explanations for programming concepts or snippets.
Instruction Following: SmolLM2_135M_Grpo_Gsm8k has been fine-tuned for better instruction-following abilities, making it suitable for applications where users provide specific commands or requests.
Prototyping & Experimentation: With its smaller size and easier deployment, the model is useful for rapid prototyping and experimentation in new AI-driven applications, particularly where speed and cost efficiency are more important than state-of-the-art performance.
Low-Resource Environments: Due to its smaller model size, it can be used in environments with limited computational resources (e.g., edge devices, mobile applications, or local servers) where running larger models might not be feasible.
Research and Development: Researchers interested in exploring fine-tuned models and improving upon smaller AI systems can use SmolLM2_135M_Grpo_Gsm8k for experimentation or as a base for further fine-tuning.

prithivMLmods
/

SmolLM2_135M_Grpo_Gsm8k