Gemma-200M-hindi:
Gemma-200M-hindi is a 200M parameter model trained from scratch in Hindi language using fineweb-edu-hindi. It uses the Gemma 2 architecture. The model is trained using the v4-128 TPU chip provided by the TPU Research Cloud. The model is not sft trained.
Tokenizer:
The tokenizer is trained from scratch Gemma-hindi-tokenizer
Using the Model:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "KathirKs/Gemma-200M-hindi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
input_text = '''कुछ क्षेत्रों में यह अनुमान लगाया गया है कि लगभग 30 में से एक व्यक्ति कुष्ठ रोग से संक्रमित था'''
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
# Pure sampling configuration
outputs = model.generate(
input_ids=input_ids.input_ids,
attention_mask=input_ids.attention_mask,
max_new_tokens=100,
do_sample=True, # Enables sampling
temperature=1.5, # Standard temperature for balanced randomness
top_k=1000, # No top-k filtering
top_p=1.0 # No nucleus sampling
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Codebase:
The levanter codebase is used to train the model on TPUs.
Warning:
The model may produce some inappropriate context. Please report if any such content is found.
Contact:
For any queries, write to Kathir
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support