--- language: - en license: mit model-index: - name: Nano-Llama results: [] tags: - pytorch - causal-lm - text-generation - fineweb datasets: - HuggingFaceFW/fineweb library_name: transformers --- # Nano-Llama A compact 67M parameter LLaMA-2-style language model pretrained on FineWeb dataset. ## Model Details - **Architecture**: LLaMA-2-style transformer - **Parameters**: 678M - **Training Data**: FineWeb dataset (~100M tokens) - **Context Length**: 1024 tokens - **Layers**: 6 - **Hidden Size**: 768 - **Attention Heads**: 12 ## Training - **Dataset**: FineWeb (web-crawled high-quality text) - **Tokens Trained**: ~110M tokens - **Training Time**: ~6 hours on RTX 3090 - **Optimizer**: AdamW - **Learning Rate**: 1e-4 ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("vishesh-t27/Nano-Llama") model = AutoModelForCausalLM.from_pretrained("vishesh-t27/Nano-Llama") model.eval() # Test prompt text = "The future of artificial intelligence is" inputs = tokenizer(text, return_tensors="pt") # Generate text outputs = model.generate( **inputs, max_new_tokens=50, temperature=0.8, do_sample=True, pad_token_id=tokenizer.eos_token_id ) # Decode and print generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ## Limitations - Small model size (67M parameters) - Limited training data compared to larger models - May generate repetitive or nonsensical text ## License MIT License