Model Card for Llama-3.2-3B-Reasoning-Vi-Medical-LoRA

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct, optimized with 4-bit quantization for efficient memory usage and high performance. It supports a maximum sequence length of 2048 tokens for extended context processing. The model was fine-tuned using TRL without full parameter updates, ensuring resource-efficient training.

Alternatively, it can leverage the optimizations of unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit or ununsloth/Llama-3.2-3B-Instruct with 4-bit loading for enhanced efficiency in memory-constrained environments while maintaining robust language capabilities.

Training procedure

This model was trained with SFT.

Usage

HuggingFace Authentication

import os
from huggingface_hub import login

# Set the Hugging Face API token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>"

# # Initialize API
login(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))

Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"

# Define model and LoRA adapter paths
base_model_name = "unsloth/Llama-3.2-3B-Instruct"
lora_adapter_name = "danhtran2mind/Llama-3.2-3B-Reasoning-Vi-Medical-LoRA"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load base model with optimized settings
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,  # Use FP16 for efficiency
    device_map=device,
    trust_remote_code=True
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, lora_adapter_name)

# Set model to evaluation mode
model.eval()

inference_prompt_style = """Bên dưới là một hướng dẫn mô tả một tác vụ, đi kèm với một thông tin đầu vào để cung cấp thêm ngữ cảnh.
Hãy viết một phản hồi để hoàn thành yêu cầu một cách phù hợp.
Trước khi trả lời, hãy suy nghĩ cẩn thận về câu hỏi và tạo một chuỗi suy nghĩ từng bước để đảm bảo phản hồi logic và chính xác.

### Instruction:
Bạn là một chuyên gia y tế có kiến thức chuyên sâu về lập luận lâm sàng, chẩn đoán và lập kế hoạch điều trị.
Vui lòng trả lời câu hỏi y tế sau đây.

### Question:
{}

### Response:
<think>
"""

# Define the question
question = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào "
            "tại bệnh viện để thăm khám?")

seed = 42
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

inputs = tokenizer(
    [inference_prompt_style.format(question) + tokenizer.eos_token],
    return_tensors="pt"
).to(device)

outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.95,
    top_k=64,
)

response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0].split("### Response:")[1])

<think>
 Trên thực tế, điều đầu tiên cần làm khi phát hiện bản thân hoặc con trẻ có những dấu hiệu mất nước nhẹ là cần chuyển bệnh nhân sang nằm. Sau đó, bạn cần khẩn trương gọi cơn khẩn cấp hoặc đến ngay bệnh viện hoặc cơ sở y tế gần nhất. Nếu đang ở nhà hay ở nơi làm việc, hãy gọi 114 hoặc cấp cứu ngay. Hãy nói rõ là người bệnh nghi ngờ bị loét dạ dày tá tràng. Điều này sẽ giúp có những nhân viên y tế có thể đến khám ngay.  Khi nhận được gọi, các nhân viên y tế sẽ nhanh chóng cung cấp cho bạn liệu pháp bổ sung nước, điện giải. Đồng thờiDiễn tiến hành thăm khám và lấy mẫu mẫu dịch trong loét, nếu cần, có thể sẽ chỉ định bạn phẫu thuật.  Nhìn chung, loét dạ dày tá tràng là một bệnh lý khá nguy hiểm nếu không được điều trị đúng cách. Vì vậy, hãy xây dựng cho mình một chế độ ăn uống lành mạnh và đi khám ngay khi bạn có những triệu chứng. Đừng quá phớt lờ những cơn buồn nôn, khó tiêu, đau bụng. Hãy đến ngay cơ sở y tế gần nhất, gọi 1155 hoặc 114 nếu đang ở trong Zones 1, 2. 
</think>
Cần chuyển bệnh nhân sang nằm và khẩn trương gọi cơn cấp cứu hoặc đến bệnh viện hoặc cơ sở y tế gần nhất.

Libraries version

PEFT 0.15.2
TRL: 0.19.1
Transformers: 4.52.4
Pytorch: 2.7.0
Datasets: 3.6.0
Tokenizers: 0.21.2

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for danhtran2mind/Llama-3.2-3B-Reasoning-Vi-Medical-LoRA

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(468)

this model

Dataset used to train danhtran2mind/Llama-3.2-3B-Reasoning-Vi-Medical-LoRA

Space using danhtran2mind/Llama-3.2-3B-Reasoning-Vi-Medical-LoRA 1

Collection including danhtran2mind/Llama-3.2-3B-Reasoning-Vi-Medical-LoRA

DanhTran2Mind's LLMs

Collection

DanhTran2Mind's fine-tuned LLMs use LoRA for efficiency or full fine-tuning for top performance, customized to each model hub and task. • 8 items • Updated Jul 22