Gemma-200M-hindi:

Gemma-200M-hindi is a 200M parameter model trained from scratch in Hindi language using fineweb-edu-hindi. It uses the Gemma 2 architecture. The model is trained using the v4-128 TPU chip provided by the TPU Research Cloud. The model is not sft trained.

Tokenizer:

The tokenizer is trained from scratch Gemma-hindi-tokenizer

Using the Model:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "KathirKs/Gemma-200M-hindi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

input_text = '''कुछ क्षेत्रों में यह अनुमान लगाया गया है कि लगभग 30 में से एक व्यक्ति कुष्ठ रोग से संक्रमित था'''
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

# Pure sampling configuration
outputs = model.generate(
    input_ids=input_ids.input_ids,
    attention_mask=input_ids.attention_mask,
    max_new_tokens=100,
    do_sample=True,         # Enables sampling
    temperature=1.5,        # Standard temperature for balanced randomness
    top_k=1000,                # No top-k filtering
    top_p=1.0               # No nucleus sampling
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Codebase:

The levanter codebase is used to train the model on TPUs.

Warning:

The model may produce some inappropriate context. Please report if any such content is found.

Contact:

For any queries, write to Kathir

KathirKs
/

Gemma-200M-hindi