Pingsz
/

pruned

+---
+license: apache-2.0
+tags:
+  - transformers
+  - smollm
+  - pruned-model
+  - instruct
+  - small-llm
+  - text-generation
+model_creator: HuggingFaceTB
+base_model: HuggingFaceTB/SmolLM-135M-Instruct
+model_name: SmolLM-90M-Instruct-Pruned
+pipeline_tag: text-generation
+language:
+  - en
+---
+# SmolLM-90M-Instruct-Pruned 🧠💡
+A **pruned** version of [`HuggingFaceTB/SmolLM-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct), reduced from **135M** parameters to approximately **90M** for faster inference and reduced memory usage, while maintaining reasonable performance for instruction-style tasks.
+## 🔧 What’s Inside
+- Base: `SmolLM-135M-Instruct`
+- Parameters: **~90M**
+- Pruning method: Structured pruning (e.g., attention heads, MLP layers) using PyTorch/NVIDIA pruning tools *(customize if needed)*.
+- Vocabulary, tokenizer, and training objectives remain **identical** to the base model.
+## 🚀 Intended Use
+This model is optimized for:
+- **Low-latency applications**
+- **Edge deployments**
+- **Instruction-following tasks** with compact models
+- Use in environments with **limited VRAM or compute**
+### Example Use
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M-Instruct")
+model = AutoModelForCausalLM.from_pretrained("your-username/SmolLM-90M-Instruct-Pruned")
+prompt = "Explain quantum computing to a 10-year-old."
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```