Dataset

You can download dataset at this url: https://github.com/triet2397/UIT-ViCoV19QA

Metrics

Loss:
- Trainging Set: 0.306100.
- Validation Set: 0.322764.

Usage

Import Libraires

import torch
from underthesea import word_tokenize
from transformers import AutoModel, AutoTokenizer, AutoModelForSeq2SeqLM

Define Functions

def load_model_and_tokenizer(model_path):
    # Load the trained tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Load the trained model
    model = AutoModelForSeq2SeqLM.from_pretrained(model_path)

    # Move the model to the GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    return tokenizer, model, device

def generate_text(tokenizer, model, device, prompt, max_length=100,
                  num_return_sequences=1, top_p=0.95, temperature=0.7):
    # Tokenize the input prompt
    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)

    # Generate text
    output = model.generate(
        input_ids,
        max_length=max_length,
        no_repeat_ngram_size=2,
        top_k=50,
        top_p=top_p,
        temperature=temperature,
        do_sample=True
    )

    # Convert the generated text back to a string
    generated_text = [tokenizer.decode(ids, skip_special_tokens=True) for ids in output]

    return generated_text

# Load the trained model and tokenizer
model_path = "danhtran2mind/vi-medical-t5-finetune"
tokenizer, model, device = load_model_and_tokenizer(model_path)

Inference

# Define the prompt for text generation
prompt = "vaccine covid-19 là gì?"
prompt = word_tokenize(prompt, format='text')
# Generate text
generated_text = generate_text(tokenizer, model, device, prompt,
                               max_length=768,
                               top_p=0.95, temperature=0.7)

# Print the generated text
print("Generated Text:\n")
result = generated_text[0].replace("_", " ").replace(" ,", ",").replace(" .", ".")
print(result)

Output

# Generated Text:

# Hiện nay, vaccine Covid-19 của AstraZeneca là các loại tác dụng phụ được sử xét nghiệm lâm sàng và tạo ra kháng thể trong một vài ngày. Trong trường hợp những người có yếu tố nguy cơ mắc bệnh như hen phế quản ; viêm mũi miệng : vi rút mắt / môi trường - rối loạn đông máu ) Các loạt vắc xin khác nhau nên lấy mẫu bảo vệ ở mức độ nhanh ít nhất 1 tháng gần đây lên nguồn cung ứng RNA covid-19 nói chung vì ảnh ưởng đến việc chống lại Covid 19

Dependencies Enviroment

Python Vesion: 3.10.12

List of Libraries:

pandas==2.2.3
numpy==1.26.4
matplotlib==3.7.5
scikit-learn==1.2.2
gensim==4.3.3
underthesea==6.8.4
tensorflow==2.17.1
datasets==3.3.1
torch==2.5.1+cu121
transformers==4.47.0

danhtran2mind
/

vi-medical-t5-finetune-qa

Dataset

Metrics

Usage

Import Libraires

Define Functions

Inference

Output

Dependencies Enviroment

Model tree for danhtran2mind/vi-medical-t5-finetune-qa

Spaces using danhtran2mind/vi-medical-t5-finetune-qa 2