--- base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit language: - en - hi license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl datasets: - ai4bharat/IndicQuestionGeneration pipeline_tag: question-answering --- # Uploaded model - **Developed by:** Ashed00 - **License:** apache-2.0 - **Finetuned from model :** unsloth/Llama-3.2-1B-Instruct-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) # Inference Code ```python import torch prompt = """Below is given a Question and context to solve the question. Provide the answer to the question from the context. ### Question: {} ### Context: {} ### Answer: {}""" if True: from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "Ashed00/Hindi_tuned_Llama-3.2-1B", # YOUR MODEL YOU USED FOR TRAINING max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) FastLanguageModel.for_inference(model) # Enable native 2x faster inference inputs = tokenizer( [ prompt.format( 'Who stopped revolt of Ballarat?', #Question in hindi/english "'इसे ब्रिटिश सैनिकों द्वारा कुचल दिया गया था, लेकिन असंतोष ने औपनिवेशिक अधिकारियों को प्रशासन में सुधार करने (विशेष रूप से घृणित खनन लाइसेंस शुल्क को कम करना) और मताधिकार का विस्तार करने के लिए प्रेरित किया।'", # Context "", ) ], return_tensors = "pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens = 500, use_cache = True, temperature = 1.5, min_p = 0.1) answer=tokenizer.batch_decode(outputs) answer = answer[0].split("### Answer:")[-1] print("Answer of the question is:", answer) ``` # Metrics (to be calculated)