Fox-Gen2

Introduction

Fox-Gen2 is the latest series of Fox large language models. For Fox-Gen2, we release a range of base and instruction-tuned language models from 0.5 to 72 billion parameters. Fox-Gen2 introduces the following enhancements:

Significantly more knowledge and improved capabilities in coding and mathematics, leveraging specialized expert models.
Superior instruction following, long-text generation (over 8K tokens), structured data understanding (e.g., tables), and structured output generation, particularly JSON. Enhanced resilience to diverse prompts, improving role-play and chatbot functionality.
Long-context support up to 128K tokens, with the ability to generate up to 8K tokens.
Multilingual support for over 29 languages, including English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

This repository contains the instruction-tuned 0.5B Fox-Gen2 model, which features:

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings
Number of Parameters: 0.49B
Number of Parameters (Non-Embedding): 0.36B
Number of Layers: 24
Number of Attention Heads (GQA): 14 for Q and 2 for KV
Context Length: Full 32,768 tokens, generation up to 8192 tokens

Requirements

The code for Fox-Gen2 is integrated into the latest version of the Hugging Face transformers library. Ensure you use the latest version to avoid compatibility issues.

Quickstart

Here is a sample code snippet demonstrating how to load the tokenizer and model and generate content:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ShikharLLM/Llm1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language models."
messages = [
    {"role": "system", "content": "You are Fox-Gen2, a helpful assistant created by Shikhar Jadav."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Evaluation & Performance

Evaluation results demonstrate Fox-Gen2's significant improvements in knowledge, multilingual capabilities, and efficiency for various NLP tasks.

Citation

If you find Fox-Gen2 helpful, feel free to cite it as a contribution to advancing large language models.

@misc{fox-gen2,
    title = {Fox-Gen2: Advancing Multilingual and Instruction-Tuned Language Models},
    author = {Shikhar Jadav},
    year = {2024}
}