LLM Course documentation

Pretrained Models များကို မျှဝေခြင်း

LLM Course

0. စတင်ပြင်ဆင်ခြင်း

1. Transformer models များ

2. 🤗 Transformers ကို အသုံးပြုခြင်း

3. Pretrained Model တစ်ခုကို Fine-tuning လုပ်ခြင်း

4. Models နှင့် Tokenizers များကို မျှဝေခြင်း

Hugging Face Hub Pretrained Models များကို အသုံးပြုခြင်း Pretrained Models များကို မျှဝေခြင်း Model Card တစ်ခု တည်ဆောက်ခြင်း အပိုင်း ၁ ပြီးဆုံးပါပြီ! အခန်း (၄) ဆိုင်ရာ မေးခွန်းများ

5. The 🤗 Datasets library

6. The 🤗 Tokenizers library

7. Classical NLP Tasks များ

8. အကူအညီတောင်းခံနည်း

9. Demos များ တည်ဆောက်ခြင်းနှင့် မျှဝေခြင်း

10. အရည်အသွေးမြင့် Datasets များကို စုစည်းခြင်း

11. Large Language Models များကို Fine-tune လုပ်ခြင်း

12. Reasoning Models များ တည်ဆောက်ခြင်း new

သင်တန်း ဆိုင်ရာ အခမ်းအနားများ

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Pytorch TensorFlow

Pretrained Models များကို မျှဝေခြင်း

အောက်ပါအဆင့်တွေမှာ pretrained models တွေကို 🤗 Hub ကို အလွယ်ဆုံးနည်းလမ်းတွေနဲ့ ဘယ်လိုမျှဝေရမလဲဆိုတာ ကျွန်တော်တို့ လေ့လာသွားပါမယ်။ models တွေကို Hub မှာ တိုက်ရိုက်မျှဝေပြီး update လုပ်တာကို ရိုးရှင်းစေတဲ့ ကိရိယာတွေနဲ့ utility တွေ ရရှိနိုင်ပြီး၊ အောက်မှာ ကျွန်တော်တို့ လေ့လာသွားပါမယ်။

models တွေကို train လုပ်တဲ့ သုံးစွဲသူအားလုံးကို community နဲ့ မျှဝေခြင်းဖြင့် ပံ့ပိုးကူညီဖို့ ကျွန်တော်တို့ တိုက်တွန်းပါတယ်။ အလွန်တိကျတဲ့ datasets တွေပေါ်မှာ train လုပ်ထားတဲ့ models တွေကိုတောင် မျှဝေခြင်းက တခြားသူတွေကို အချိန်နဲ့ compute resources တွေ သက်သာစေပြီး အသုံးဝင်တဲ့ trained artifacts တွေကို ရယူအသုံးပြုနိုင်စေပါလိမ့်မယ်။ ဒါ့အပြင်၊ တခြားသူတွေ လုပ်ခဲ့တဲ့ အလုပ်တွေကနေလည်း သင်အကျိုးခံစားနိုင်ပါတယ်။

model repositories အသစ်တွေ ဖန်တီးဖို့ နည်းလမ်းသုံးသွယ် ရှိပါတယ်။

push_to_hub API ကို အသုံးပြုခြင်း
huggingface_hub Python library ကို အသုံးပြုခြင်း
web interface ကို အသုံးပြုခြင်း

repository တစ်ခု ဖန်တီးပြီးတာနဲ့ git နဲ့ git-lfs ကို အသုံးပြုပြီး files တွေ upload လုပ်နိုင်ပါတယ်။ အောက်ပါအပိုင်းတွေမှာ model repositories တွေ ဖန်တီးပုံနဲ့ files တွေ upload လုပ်ပုံကို ကျွန်တော်တို့ ပြသသွားပါမယ်။

push_to_hub API ကို အသုံးပြုခြင်း

Hub ကို files တွေ upload လုပ်ဖို့ အလွယ်ကူဆုံးနည်းလမ်းက push_to_hub API ကို အကျိုးရှိရှိ အသုံးပြုခြင်းပါပဲ။

ဆက်မသွားခင်မှာ၊ authentication token တစ်ခုကို generate လုပ်ဖို့ လိုအပ်ပါလိမ့်မယ်။ ဒါမှ huggingface_hub API က သင်ဘယ်သူလဲဆိုတာနဲ့ ဘယ် namespaces တွေမှာ write access ရှိတယ်ဆိုတာ သိမှာပါ။ သင် transformers install လုပ်ထားတဲ့ environment တစ်ခုမှာ ရှိနေကြောင်း သေချာပါစေ (Setup မှာ ကြည့်ပါ)။ သင် notebook ထဲမှာဆိုရင် login လုပ်ဖို့ အောက်ပါ function ကို အသုံးပြုနိုင်ပါတယ်။

from huggingface_hub import notebook_login

notebook_login()

terminal မှာဆိုရင်တော့ အောက်ပါအတိုင်း run နိုင်ပါတယ်။

huggingface-cli login

နည်းလမ်းနှစ်ခုလုံးမှာ သင့်ရဲ့ username နဲ့ password ကို ထည့်သွင်းဖို့ တောင်းဆိုပါလိမ့်မယ်။ ဒါတွေက Hub ကို log in လုပ်ဖို့ သင်အသုံးပြုတဲ့ အတူတူပါပဲ။ သင်မှာ Hub profile မရှိသေးဘူးဆိုရင် ဒီနေရာ မှာ တစ်ခု ဖန်တီးသင့်ပါတယ်။

ကောင်းပြီ! အခု သင့်ရဲ့ authentication token ကို cache folder ထဲမှာ သိမ်းဆည်းထားပါပြီ။ repository အချို့ကို ဖန်တီးကြရအောင်!

သင် Trainer API ကို အသုံးပြုပြီး model တစ်ခုကို train လုပ်ခဲ့တယ်ဆိုရင်၊ ဒါကို Hub ကို upload လုပ်ဖို့ အလွယ်ကူဆုံးနည်းလမ်းက သင့် TrainingArguments ကို သတ်မှတ်တဲ့အခါ push_to_hub=True လို့ သတ်မှတ်ပေးဖို့ပါပဲ။

from transformers import TrainingArguments

training_args = TrainingArguments(
    "bert-finetuned-mrpc", save_strategy="epoch", push_to_hub=True
)

သင် trainer.train() ကို ခေါ်တဲ့အခါ၊ Trainer က သင့် model ကို သိမ်းဆည်းတိုင်း (ဒီနေရာမှာတော့ epoch တိုင်း) သင့် namespace ထဲက repository တစ်ခုမှာ Hub ကို upload လုပ်ပါလိမ့်မယ်။ အဲဒီ repository ကို သင်ရွေးချယ်ခဲ့တဲ့ output directory (ဒီနေရာမှာ bert-finetuned-mrpc) အတိုင်း နာမည်ပေးထားပါလိမ့်မယ်။ ဒါပေမယ့် hub_model_id = "a_different_name" နဲ့ မတူညီတဲ့ နာမည်တစ်ခုကို ရွေးချယ်နိုင်ပါတယ်။

သင်အဖွဲ့ဝင်ဖြစ်တဲ့ organization တစ်ခုကို သင့် model ကို upload လုပ်ဖို့အတွက် hub_model_id = "my_organization/my_repo_name" နဲ့ ထည့်ပေးလိုက်ရုံပါပဲ။

သင့် training ပြီးဆုံးသွားတာနဲ့၊ သင့် model ရဲ့ နောက်ဆုံး version ကို upload လုပ်ဖို့ နောက်ဆုံး trainer.push_to_hub() ကို လုပ်သင့်ပါတယ်။ ဒါက အသုံးပြုခဲ့တဲ့ hyperparameters တွေနဲ့ evaluation results တွေကို ဖော်ပြတဲ့ model card တစ်ခုကိုလည်း relevant metadata တွေအားလုံးနဲ့အတူ generate လုပ်ပေးပါလိမ့်မယ်။ ဒီလို model card တစ်ခုမှာ သင်တွေ့ရမယ့် content ဥပမာတစ်ခုကတော့ ဒီမှာပါ…

An example of an auto-generated model card.

အနိမ့်အဆင့်မှာဆိုရင် Model Hub ကို model တွေ၊ tokenizers တွေနဲ့ configuration objects တွေမှာရှိတဲ့ ၎င်းတို့ရဲ့ push_to_hub() method ကနေ တိုက်ရိုက် ဝင်ရောက်ကြည့်ရှုနိုင်ပါတယ်။ ဒီ method က repository ဖန်တီးတာနဲ့ model နဲ့ tokenizer files တွေကို repository ကို တိုက်ရိုက် push လုပ်တာ နှစ်ခုလုံးကို လုပ်ဆောင်ပေးပါတယ်။ အောက်မှာ ကျွန်တော်တို့တွေ့ရမယ့် API နဲ့ မတူဘဲ၊ manual handling လုံးဝမလိုအပ်ပါဘူး။

ဒါက ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတာ သိဖို့အတွက်၊ ပထမဆုံး model တစ်ခုနဲ့ tokenizer တစ်ခုကို initialize လုပ်ကြရအောင်…

from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

ဒီ models တွေနဲ့ သင်ဘာမဆိုလုပ်ဆောင်နိုင်ပါတယ်၊ tokenizer ကို tokens တွေထည့်တာ၊ model ကို train လုပ်တာ၊ fine-tune လုပ်တာ။ ရရှိလာတဲ့ model, weights, နဲ့ tokenizer တွေနဲ့ သင်ကျေနပ်သွားတာနဲ့၊ model object မှာ တိုက်ရိုက်ရရှိနိုင်တဲ့ push_to_hub() method ကို အကျိုးရှိရှိ အသုံးပြုနိုင်ပါတယ်။

model.push_to_hub("dummy-model")

ဒါက သင့် profile မှာ dummy-model repository အသစ်တစ်ခုကို ဖန်တီးပေးပြီး သင့် model files တွေနဲ့ ဖြည့်ပေးပါလိမ့်မယ်။ tokenizer နဲ့လည်း အတူတူလုပ်ပါ၊ ဒါမှ files အားလုံးဟာ ဒီ repository မှာ ရရှိနိုင်ပါလိမ့်မယ်။

tokenizer.push_to_hub("dummy-model")

သင် organization တစ်ခုမှာ အဖွဲ့ဝင်ဆိုရင်၊ အဲဒီ organization ရဲ့ namespace ကို upload လုပ်ဖို့ organization argument ကို သတ်မှတ်ပေးလိုက်ရုံပါပဲ။

tokenizer.push_to_hub("dummy-model", organization="huggingface")

သင် သီးခြား Hugging Face token တစ်ခုကို အသုံးပြုချင်တယ်ဆိုရင်၊ push_to_hub() method ကိုလည်း သတ်မှတ်ပေးနိုင်ပါတယ်။

tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token="<TOKEN>")

အခု သင်အခုမှ upload လုပ်ထားတဲ့ model ကို ရှာဖို့ Model Hub: https://huggingface.co/user-or-organization/dummy-model ကို သွားပါ။

“Files and versions” tab ကို နှိပ်ပြီး အောက်ပါ screenshot မှာ မြင်ရတဲ့ files တွေကို သင်တွေ့ရပါလိမ့်မယ်။

Dummy model containing both the tokenizer and model files.

✏️ စမ်းသပ်ကြည့်ပါ! bert-base-cased checkpoint နဲ့ ဆက်စပ်နေတဲ့ model နဲ့ tokenizer ကို ယူပြီး push_to_hub() method ကို အသုံးပြုပြီး သင့် namespace ထဲက repo တစ်ခုကို upload လုပ်ပါ။ repo ဟာ သင့် page မှာ မှန်ကန်စွာ ပေါ်လာခြင်းရှိမရှိ နှစ်ကြိမ်စစ်ဆေးပြီးမှ ဖျက်ပစ်ပါ။

သင်တွေ့ခဲ့ရတဲ့အတိုင်း၊ push_to_hub() method က arguments အများအပြားကို လက်ခံပါတယ်။ ဒါကြောင့် သီးခြား repository ဒါမှမဟုတ် organization namespace ကို upload လုပ်တာ၊ ဒါမှမဟုတ် မတူညီတဲ့ API token တစ်ခုကို အသုံးပြုတာတွေ လုပ်ဆောင်နိုင်ပါတယ်။ ဖြစ်နိုင်ခြေရှိတဲ့ အရာတွေအကြောင်း သိရှိဖို့ 🤗 Transformers documentation မှာ တိုက်ရိုက်ရရှိနိုင်တဲ့ method specification ကို ကြည့်ရှုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။

push_to_hub() method က huggingface_hub Python package ကို အထောက်အပံ့ပေးထားပါတယ်။ ဒီ package က Hugging Face Hub ကို တိုက်ရိုက် API ကို ပေးပါတယ်။ ဒါကို 🤗 Transformers နဲ့ အခြား machine learning libraries များစွာ (ဥပမာ- allenlp လိုမျိုး) ထဲမှာ ပေါင်းစပ်ထားပါတယ်။ ဒီအခန်းမှာ 🤗 Transformers integration ကို အာရုံစိုက်ပေမယ့်၊ ဒါကို သင်ရဲ့ကိုယ်ပိုင် code ဒါမှမဟုတ် library ထဲကို ပေါင်းစပ်တာက ရိုးရှင်းပါတယ်။

သင်အခုမှ ဖန်တီးခဲ့တဲ့ repository ကို files တွေ ဘယ်လို upload လုပ်ရမယ်ဆိုတာ သိဖို့ နောက်ဆုံးအပိုင်းကို သွားလိုက်ပါ။

huggingface_hub Python library ကို အသုံးပြုခြင်း

huggingface_hub Python library ဟာ model တွေနဲ့ datasets hub တွေအတွက် ကိရိယာအစုံအလင်ကို ပေးတဲ့ package တစ်ခု ဖြစ်ပါတယ်။ Hub ပေါ်က repositories တွေအကြောင်း အချက်အလက်ရယူတာ၊ ၎င်းတို့ကို စီမံခန့်ခွဲတာလိုမျိုး အသုံးများတဲ့ လုပ်ငန်းတွေအတွက် ရိုးရှင်းတဲ့ methods တွေနဲ့ classes တွေကို ပေးထားပါတယ်။ repositories တွေရဲ့ content ကို စီမံခန့်ခွဲဖို့နဲ့ သင်ရဲ့ project တွေနဲ့ libraries တွေထဲမှာ Hub ကို ပေါင်းစပ်ဖို့အတွက် git ရဲ့ အပေါ်မှာ အလုပ်လုပ်တဲ့ ရိုးရှင်းတဲ့ APIs တွေကို ပေးထားပါတယ်။

push_to_hub API ကို အသုံးပြုတာနဲ့ ဆင်တူစွာ၊ ဒါက သင့် API token ကို သင့် cache မှာ သိမ်းဆည်းထားဖို့ လိုအပ်ပါလိမ့်မယ်။ ဒါကို လုပ်ဆောင်ဖို့အတွက်၊ ယခင်အပိုင်းမှာ ဖော်ပြခဲ့တဲ့အတိုင်း CLI ကနေ login command ကို အသုံးပြုဖို့ လိုအပ်ပါလိမ့်မယ် (Google Colab မှာ run နေတယ်ဆိုရင် ဒီ commands တွေကို ! character နဲ့ အရှေ့ကနေ ထည့်သွင်းဖို့ သေချာပါစေ)။

huggingface-cli login

huggingface_hub package က ကျွန်တော်တို့ရဲ့ ရည်ရွယ်ချက်အတွက် အသုံးဝင်တဲ့ methods တွေနဲ့ classes အများအပြားကို ပေးထားပါတယ်။ ပထမဆုံး၊ repository ဖန်တီးခြင်း၊ ဖျက်ပစ်ခြင်း စသည်တို့ကို စီမံခန့်ခွဲရန် methods အချို့ရှိပါတယ်။

from huggingface_hub import (
    # User management
    login,
    logout,
    whoami,

    # Repository creation and management
    create_repo,
    delete_repo,
    update_repo_visibility,

    # And some methods to retrieve/change information about the content
    list_models,
    list_datasets,
    list_metrics,
    list_repo_files,
    upload_file,
    delete_file,
)

ထို့အပြင်၊ ၎င်းသည် local repository တစ်ခုကို စီမံခန့်ခွဲရန် အလွန်အစွမ်းထက်သော Repository class ကို ပေးထားသည်။ ၎င်းတို့ကို အကျိုးရှိရှိ အသုံးပြုပုံကို နားလည်ရန် နောက်အပိုင်းအနည်းငယ်တွင် ဤ methods များနှင့် class ကို ကျွန်တော်တို့ လေ့လာသွားပါမယ်။

create_repo method ကို အသုံးပြုပြီး Hub မှာ repository အသစ်တစ်ခု ဖန်တီးနိုင်ပါတယ်။

from huggingface_hub import create_repo

create_repo("dummy-model")

ဒါက သင့် namespace မှာ dummy-model repository ကို ဖန်တီးပေးပါလိမ့်မယ်။ သင်ကြိုက်နှစ်သက်ရင် organization argument ကို အသုံးပြုပြီး repository က ဘယ် organization နဲ့ သက်ဆိုင်တယ်ဆိုတာ သတ်မှတ်နိုင်ပါတယ်။

from huggingface_hub import create_repo

create_repo("dummy-model", organization="huggingface")

ဒါက သင်အဲဒီ organization မှာ အဖွဲ့ဝင်ဖြစ်တယ်ဆိုရင် dummy-model repository ကို huggingface namespace မှာ ဖန်တီးပေးပါလိမ့်မယ်။

အသုံးဝင်နိုင်တဲ့ အခြား arguments တွေကတော့…

private၊ repository ကို တခြားသူတွေ မြင်နိုင်မမြင်နိုင် သတ်မှတ်ဖို့။
token၊ သင် cache မှာ သိမ်းထားတဲ့ token ကို သတ်မှတ်ထားတဲ့ token တစ်ခုနဲ့ override လုပ်ချင်ရင်။
repo_type၊ model အစား dataset ဒါမှမဟုတ် space တစ်ခုကို ဖန်တီးချင်ရင်။ လက်ခံနိုင်တဲ့ တန်ဖိုးတွေကတော့ "dataset" နဲ့ "space" ဖြစ်ပါတယ်။

repository ကို ဖန်တီးပြီးတာနဲ့၊ ကျွန်တော်တို့ files တွေ ထည့်သွင်းသင့်ပါတယ်။ ဒါကို ဘယ်လိုနည်းလမ်းသုံးခုနဲ့ ကိုင်တွယ်နိုင်မလဲဆိုတာ သိဖို့ နောက်အပိုင်းကို သွားလိုက်ပါ။

Web Interface ကို အသုံးပြုခြင်း

web interface က Hub မှာ repositories တွေကို တိုက်ရိုက်စီမံခန့်ခွဲဖို့ ကိရိယာတွေကို ပေးထားပါတယ်။ interface ကို အသုံးပြုပြီး၊ သင် repositories တွေ လွယ်လွယ်ကူကူ ဖန်တီးနိုင်တယ်၊ files တွေ ထည့်နိုင်တယ် (ကြီးမားတဲ့ files တွေတောင်မှ!)၊ models တွေကို လေ့လာနိုင်တယ်၊ diffs တွေကို မြင်နိုင်တယ်၊ ပြီးတော့ တခြားအများကြီး လုပ်နိုင်ပါတယ်။

repository အသစ်တစ်ခု ဖန်တီးဖို့ huggingface.co/new ကို သွားပါ။

Page showcasing the model used for the creation of a new model repository.

ပထမဆုံး၊ repository ရဲ့ ပိုင်ရှင်ကို သတ်မှတ်ပါ၊ ဒါက သင်ကိုယ်တိုင် ဒါမှမဟုတ် သင်နဲ့ ဆက်စပ်နေတဲ့ organization တစ်ခုခု ဖြစ်နိုင်ပါတယ်။ သင် organization တစ်ခုကို ရွေးချယ်ရင်၊ model ကို organization ရဲ့ page မှာ ဖော်ပြထားမှာဖြစ်ပြီး organization ရဲ့ အဖွဲ့ဝင်တိုင်းက repository ကို ပံ့ပိုးကူညီနိုင်ပါလိမ့်မယ်။

နောက်တစ်ခုကတော့ သင့် model ရဲ့ နာမည်ကို ထည့်သွင်းပါ။ ဒါက repository ရဲ့ နာမည်လည်း ဖြစ်ပါလိမ့်မယ်။ နောက်ဆုံးအနေနဲ့၊ သင်ရဲ့ model ကို public ဒါမှမဟုတ် private ဖြစ်စေချင်လားဆိုတာ သတ်မှတ်နိုင်ပါတယ်။ Private models တွေကို အများပြည်သူမြင်ကွင်းကနေ ဝှက်ထားပါတယ်။

သင့် model repository ကို ဖန်တီးပြီးတာနဲ့၊ အောက်ပါကဲ့သို့ page တစ်ခုကို သင်တွေ့ရပါလိမ့်မယ်။

An empty model page after creating a new repository.

ဒီနေရာမှာ သင့် model ကို လက်ခံထားမှာ ဖြစ်ပါတယ်။ ဒါကို စတင်ဖြည့်ဆည်းဖို့၊ web interface ကနေ README file တစ်ခုကို တိုက်ရိုက်ထည့်သွင်းနိုင်ပါတယ်။

The README file showing the Markdown capabilities.

README file က Markdown နဲ့ ရေးထားပါတယ်၊ လွတ်လပ်စွာ ဖန်တီးနိုင်ပါတယ်။ ဒီအခန်းရဲ့ တတိယအပိုင်းက model card တစ်ခုတည်ဆောက်ခြင်းအတွက် ရည်ရွယ်ပါတယ်။ ဒါတွေက သင့် model ကို တန်ဖိုးရှိစေရာမှာ အရေးပါပါတယ်၊ ဘာလို့လဲဆိုတော့ ဒါတွေက သင့် model က ဘာတွေလုပ်နိုင်လဲဆိုတာကို တခြားသူတွေကို ပြောပြတဲ့ နေရာဖြစ်လို့ပါပဲ။

“Files and versions” tab ကို သင်ကြည့်လိုက်ရင်၊ အဲဒီမှာ files အများကြီး မရှိသေးတာကို သင်တွေ့ရပါလိမ့်မယ်၊ သင်အခုမှ ဖန်တီးခဲ့တဲ့ README.md နဲ့ ကြီးမားတဲ့ files တွေကို ခြေရာခံထားတဲ့ .gitattributes file ပဲ ရှိပါသေးတယ်။

The 'Files and versions' tab only shows the .gitattributes and README.md files.

files အသစ်တွေ ဘယ်လိုထည့်ရမလဲဆိုတာကို နောက်မှာ ကျွန်တော်တို့ လေ့လာသွားပါမယ်။

Model Files များကို Upload လုပ်ခြင်း

Hugging Face Hub မှာ files တွေကို စီမံခန့်ခွဲတဲ့ system က ပုံမှန် files တွေအတွက် git ကို အခြေခံထားပြီး၊ ကြီးမားတဲ့ files တွေအတွက် git-lfs (Git Large File Storage) ကို အခြေခံထားပါတယ်။

နောက်အပိုင်းမှာ Hub ကို files တွေ upload လုပ်ဖို့ နည်းလမ်းသုံးခုကို ကျွန်တော်တို့ လေ့လာသွားပါမယ်- huggingface_hub ကို အသုံးပြုခြင်းနဲ့ git commands တွေကို အသုံးပြုခြင်းတို့ ဖြစ်ပါတယ်။

upload_file ချဉ်းကပ်မှု

upload_file ကို အသုံးပြုတာက သင့် system မှာ git နဲ့ git-lfs install လုပ်ထားဖို့ မလိုအပ်ပါဘူး။ ဒါက HTTP POST requests တွေကို အသုံးပြုပြီး files တွေကို 🤗 Hub ကို တိုက်ရိုက် push လုပ်ပါတယ်။ ဒီနည်းလမ်းရဲ့ ကန့်သတ်ချက်ကတော့ 5GB ထက်ကြီးတဲ့ files တွေကို ကိုင်တွယ်နိုင်ခြင်း မရှိပါဘူး။ သင့် files တွေက 5GB ထက်ကြီးတယ်ဆိုရင် အောက်မှာ အသေးစိတ်ဖော်ပြထားတဲ့ အခြားနည်းလမ်းနှစ်ခုကို လိုက်နာပါ။

API ကို အောက်ပါအတိုင်း အသုံးပြုနိုင်ပါတယ်။

from huggingface_hub import upload_file

upload_file(
    "<path_to_file>/config.json",
    path_in_repo="config.json",
    repo_id="<namespace>/dummy-model",
)

ဒါက <path_to_file> မှာရှိတဲ့ config.json file ကို repository ရဲ့ root ကို config.json အနေနဲ့ dummy-model repository ကို upload လုပ်ပါလိမ့်မယ်။

အသုံးဝင်နိုင်တဲ့ အခြား arguments တွေကတော့…

token၊ သင် cache မှာ သိမ်းထားတဲ့ token ကို သတ်မှတ်ထားတဲ့ token တစ်ခုနဲ့ override လုပ်ချင်ရင်။
repo_type၊ model အစား dataset ဒါမှမဟုတ် space တစ်ခုကို upload လုပ်ချင်ရင်။ လက်ခံနိုင်တဲ့ တန်ဖိုးတွေကတော့ "dataset" နဲ့ "space" ဖြစ်ပါတယ်။

Repository Class

Repository class က local repository တစ်ခုကို git-like နည်းလမ်းနဲ့ စီမံခန့်ခွဲပါတယ်။ ဒါက ကျွန်တော်တို့ လိုအပ်တဲ့ features အားလုံးကို ပေးဖို့အတွက် git နဲ့ ဖြစ်နိုင်တဲ့ ခက်ခဲတဲ့အချက်အများစုကို abstract လုပ်ထားပါတယ်။

ဒီ class ကို အသုံးပြုဖို့အတွက် git နဲ့ git-lfs install လုပ်ထားဖို့ လိုအပ်ပါတယ်။ ဒါကြောင့် မစတင်ခင် git-lfs ကို install လုပ်ပြီး set up လုပ်ထားကြောင်း သေချာပါစေ (ဒီနေရာမှာ install လုပ်နည်းလမ်းညွှန်တွေကို ကြည့်ပါ)။

ကျွန်တော်တို့ အခုမှ ဖန်တီးခဲ့တဲ့ repository နဲ့ စတင်လုပ်ဆောင်ဖို့၊ remote repository ကို clone လုပ်ခြင်းဖြင့် local folder တစ်ခုထဲကို initialize လုပ်နိုင်ပါတယ်။

from huggingface_hub import Repository

repo = Repository("<path_to_dummy_folder>", clone_from="<namespace>/dummy-model")

ဒါက ကျွန်တော်တို့ရဲ့ working directory မှာ <path_to_dummy_folder> folder ကို ဖန်တီးလိုက်ပါတယ်။ ဒီ folder မှာ .gitattributes file တစ်ခုတည်းသာ ပါဝင်ပါတယ်၊ ဘာလို့လဲဆိုတော့ repository ကို create_repo ကနေ instantiate လုပ်တဲ့အခါ ဖန်တီးခဲ့တဲ့ တစ်ခုတည်းသော file ဖြစ်လို့ပါပဲ။

ဒီအချိန်ကစပြီး၊ ကျွန်တော်တို့ဟာ ရိုးရာ git methods အများအပြားကို အကျိုးရှိရှိ အသုံးပြုနိုင်ပါတယ်။

repo.git_pull()
repo.git_add()
repo.git_commit()
repo.git_push()
repo.git_tag()

နဲ့ တခြားအရာတွေ! ရရှိနိုင်တဲ့ methods အားလုံးကို ခြုံငုံသုံးသပ်ဖို့ ဒီနေရာမှာ ရရှိနိုင်တဲ့ Repository documentation ကို ကြည့်ရှုဖို့ ကျွန်တော်တို့ အကြံပြုပါတယ်။

လက်ရှိမှာ၊ Hub ကို push လုပ်ချင်တဲ့ model တစ်ခုနဲ့ tokenizer တစ်ခု ရှိပါတယ်။ ကျွန်တော်တို့ repository ကို အောင်မြင်စွာ clone လုပ်ခဲ့ပြီးဖြစ်လို့၊ အဲဒီ repository ထဲမှာ files တွေကို သိမ်းဆည်းနိုင်ပါတယ်။

နောက်ဆုံးပြောင်းလဲမှုတွေကို pull လုပ်ခြင်းဖြင့် ကျွန်တော်တို့ရဲ့ local clone က နောက်ဆုံးအခြေအနေဖြစ်ကြောင်း အရင်ဆုံး သေချာစေပါတယ်။

repo.git_pull()

ဒါပြီးတာနဲ့ model နဲ့ tokenizer files တွေကို သိမ်းဆည်းပါတယ်။

model.save_pretrained("<path_to_dummy_folder>")
tokenizer.save_pretrained("<path_to_dummy_folder>")

<path_to_dummy_folder> မှာ model နဲ့ tokenizer files အားလုံး အခုပါဝင်နေပါပြီ။ files တွေကို staging area ကို ထည့်တာ၊ commit လုပ်တာနဲ့ Hub ကို push လုပ်တာလိုမျိုး ပုံမှန် git workflow ကို ကျွန်တော်တို့ လိုက်နာပါတယ်။

repo.git_add()
repo.git_commit("Add model and tokenizer files")
repo.git_push()

ဂုဏ်ယူပါတယ်! သင် Hub မှာ ပထမဆုံး files တွေကို push လုပ်ခဲ့ပါပြီ။

Git-based ချဉ်းကပ်မှု

ဒါက files တွေ upload လုပ်ဖို့ အလွန်ရိုးရှင်းတဲ့ နည်းလမ်းဖြစ်ပါတယ်- ကျွန်တော်တို့ git နဲ့ git-lfs ကို တိုက်ရိုက်အသုံးပြုပါမယ်။ ခက်ခဲတဲ့အချက်အများစုကို ယခင်ချဉ်းကပ်မှုတွေက abstract လုပ်ထားပြီးဖြစ်ပေမယ့်၊ အောက်ပါ method မှာ သတိထားရမယ့် အချက်အချို့ရှိတဲ့အတွက် ပိုမိုရှုပ်ထွေးတဲ့ use-case တစ်ခုကို ကျွန်တော်တို့ လိုက်နာပါမယ်။

ပထမဆုံး git-lfs ကို initialize လုပ်ခြင်းဖြင့် စတင်ပါ။

git lfs install

Updated git hooks.
Git LFS initialized.

ဒါပြီးတာနဲ့၊ ပထမအဆင့်ကတော့ သင်ရဲ့ model repository ကို clone လုပ်ဖို့ပါပဲ။

git clone https://huggingface.co/<namespace>/<your-model-id>

ကျွန်တော့် username က lysandre ဖြစ်ပြီး model name က dummy ကို အသုံးပြုခဲ့တဲ့အတွက်၊ ကျွန်တော့်အတွက် command က အောက်ပါအတိုင်း ဖြစ်ပါလိမ့်မယ်။

git clone https://huggingface.co/lysandre/dummy

ကျွန်တော့် working directory မှာ dummy လို့ခေါ်တဲ့ folder တစ်ခု အခုရှိပါပြီ။ folder ထဲကို cd လုပ်ပြီး contents တွေကို ကြည့်နိုင်ပါတယ်။

cd dummy && ls

README.md

သင် Hugging Face Hub ရဲ့ create_repo method ကို အသုံးပြုပြီး repository ကို အခုမှ ဖန်တီးခဲ့တာဆိုရင်၊ ဒီ folder မှာ hidden .gitattributes file တစ်ခုတည်းသာ ပါဝင်သင့်ပါတယ်။ သင် web interface ကို အသုံးပြုပြီး repository တစ်ခု ဖန်တီးဖို့ ယခင်အပိုင်းက ညွှန်ကြားချက်တွေကို လိုက်နာခဲ့တယ်ဆိုရင်တော့၊ folder မှာ hidden .gitattributes file နဲ့အတူ README.md file တစ်ခု ပါဝင်သင့်ပါတယ်။

configuration file တစ်ခု၊ vocabulary file တစ်ခု ဒါမှမဟုတ် kilobytes အနည်းငယ်အောက်ရှိတဲ့ မည်သည့် file ကိုမဆို ပုံမှန်အရွယ်အစားရှိတဲ့ file တစ်ခုကို ထည့်သွင်းတာက မည်သည့် git-based system မှာမဆို လုပ်ဆောင်ရမယ့်အတိုင်း အတိအကျပါပဲ။ သို့သော်လည်း၊ ပိုကြီးမားတဲ့ files တွေကို huggingface.co ကို push လုပ်နိုင်ဖို့ git-lfs ကနေတစ်ဆင့် register လုပ်ရပါမယ်။

ကျွန်တော်တို့ရဲ့ dummy repository ကို commit လုပ်ချင်တဲ့ model တစ်ခုနဲ့ tokenizer တစ်ခုကို generate လုပ်ဖို့ Python ကို ခဏပြန်သွားကြရအောင်…

from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Do whatever with the model, train it, fine-tune it...

model.save_pretrained("<path_to_dummy_folder>")
tokenizer.save_pretrained("<path_to_dummy_folder>")

model နဲ့ tokenizer artifacts အချို့ကို သိမ်းဆည်းပြီးပြီဆိုတော့ dummy folder ကို ထပ်ကြည့်ကြရအောင်…

ls

config.json  pytorch_model.bin  README.md  sentencepiece.bpe.model  special_tokens_map.json tokenizer_config.json  tokenizer.json

သင် file sizes တွေကို (ဥပမာ- ls -lh နဲ့) ကြည့်လိုက်ရင် model state dict file (pytorch_model.bin) က 400 MB ကျော်ရှိတဲ့ တစ်ခုတည်းသော ခြားနားချက်ဖြစ်တာကို သင်တွေ့ရပါလိမ့်မယ်။

✏️ web interface ကနေ repository ကို ဖန်တီးတဲ့အခါ၊ *.gitattributes* file ကို *.bin* နဲ့ *.h5* လိုမျိုး သတ်မှတ်ထားတဲ့ extensions တွေပါတဲ့ files တွေကို ကြီးမားတဲ့ files တွေအဖြစ် သတ်မှတ်ဖို့ အလိုအလျောက် set up လုပ်ထားပါတယ်။ ဒါကြောင့် သင့်ဘက်ကနေ မည်သည့် setup မှ မလိုအပ်ဘဲ git-lfs က ၎င်းတို့ကို ခြေရာခံပါလိမ့်မယ်။

အခု ကျွန်တော်တို့ဟာ ရိုးရာ Git repositories တွေနဲ့ ပုံမှန်လုပ်ဆောင်သလိုပဲ ဆက်လက်လုပ်ဆောင်နိုင်ပါပြီ။ git add command ကို အသုံးပြုပြီး files အားလုံးကို Git ရဲ့ staging environment ထဲကို ထည့်နိုင်ပါတယ်။

git add .

ပြီးရင် လက်ရှိ staged လုပ်ထားတဲ့ files တွေကို ကြည့်နိုင်ပါတယ်။

git status

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
  modified:   .gitattributes
	new file:   config.json
	new file:   pytorch_model.bin
	new file:   sentencepiece.bpe.model
	new file:   special_tokens_map.json
	new file:   tokenizer.json
	new file:   tokenizer_config.json

အလားတူပဲ၊ git-lfs က မှန်ကန်တဲ့ files တွေကို ခြေရာခံနေခြင်းရှိမရှိ ၎င်းရဲ့ status command ကို အသုံးပြုပြီး သေချာအောင် လုပ်နိုင်ပါတယ်။

git lfs status

On branch main
Objects to be pushed to origin/main:


Objects to be committed:

	config.json (Git: bc20ff2)
	pytorch_model.bin (LFS: 35686c2)
	sentencepiece.bpe.model (LFS: 988bc5a)
	special_tokens_map.json (Git: cb23931)
	tokenizer.json (Git: 851ff3e)
	tokenizer_config.json (Git: f0f7783)

Objects not staged for commit:

files အားလုံးမှာ Git ကို handler အဖြစ်ရှိနေတာကို ကျွန်တော်တို့ တွေ့နိုင်ပါတယ်၊ pytorch_model.bin နဲ့ sentencepiece.bpe.model တို့မှလွဲ၍ LFS ကို handler အဖြစ်ရှိနေပါတယ်။ ကောင်းပါပြီ!

နောက်ဆုံးအဆင့်တွေဖြစ်တဲ့ commit လုပ်ခြင်းနဲ့ huggingface.co remote repository ကို push လုပ်ခြင်းဆီ ဆက်သွားကြရအောင်…

git commit -m "First model version"

[main b08aab1] First model version
 7 files changed, 29027 insertions(+)
  6 files changed, 36 insertions(+)
 create mode 100644 config.json
 create mode 100644 pytorch_model.bin
 create mode 100644 sentencepiece.bpe.model
 create mode 100644 special_tokens_map.json
 create mode 100644 tokenizer.json
 create mode 100644 tokenizer_config.json

push လုပ်တာက သင့် internet connection ရဲ့ မြန်နှုန်းနဲ့ သင့် files တွေရဲ့ အရွယ်အစားပေါ်မူတည်ပြီး အချိန်အနည်းငယ် ကြာနိုင်ပါတယ်။

git push

Uploading LFS objects: 100% (1/1), 433 MB | 1.3 MB/s, done.
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 12 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 288.27 KiB | 6.27 MiB/s, done.
Total 9 (delta 1), reused 0 (delta 0), pack-reused 0
To https://huggingface.co/lysandre/dummy
   891b41d..b08aab1  main -> main

ဒါပြီးတာနဲ့ model repository ကို ကျွန်တော်တို့ကြည့်လိုက်ရင်၊ အခုမှ ထည့်သွင်းထားတဲ့ files တွေအားလုံးကို ကျွန်တော်တို့ တွေ့မြင်နိုင်ပါပြီ။

The 'Files and versions' tab now contains all the recently uploaded files.

UI (User Interface) က သင့်ကို model files တွေနဲ့ commits တွေကို လေ့လာနိုင်စေပြီး commit တစ်ခုစီက ဖြစ်ပေါ်စေတဲ့ diffs တွေကို ကြည့်ရှုနိုင်စေပါတယ်။

The diff introduced by the recent commit.

ဝေါဟာရ ရှင်းလင်းချက် (Glossary)

Pretrained Models: အကြီးစား ဒေတာအမြောက်အမြားဖြင့် ကြိုတင်လေ့ကျင့်ထားပြီးဖြစ်သော AI (Artificial Intelligence) မော်ဒယ်များ။
🤗 Hub (Hugging Face Hub): AI မော်ဒယ်တွေ၊ datasets တွေနဲ့ demo တွေကို အခြားသူတွေနဲ့ မျှဝေဖို့၊ ရှာဖွေဖို့နဲ့ ပြန်လည်အသုံးပြုဖို့အတွက် အွန်လိုင်း platform တစ်ခု ဖြစ်ပါတယ်။
push_to_hub API (Application Programming Interface): Hugging Face Transformers library မှ ပံ့ပိုးပေးသော method တစ်ခုဖြစ်ပြီး trained models များနှင့် tokenizers များကို Hugging Face Hub သို့ အလွယ်တကူ upload လုပ်နိုင်စေသည်။
huggingface_hub Python Library: Hugging Face Hub ကို အပြန်အလှန်ဆက်သွယ်ရန် (repositories ဖန်တီးခြင်း၊ files များ upload လုပ်ခြင်း စသည်ဖြင့်) ကိရိယာများနှင့် functions များကို ပံ့ပိုးပေးသော Python library။
Web Interface: Hugging Face Hub ဝက်ဘ်ဆိုဒ်၏ ဂရပ်ဖစ်အခြေခံ အသုံးပြုသူမျက်နှာပြင်။
Git: Version control system တစ်ခုဖြစ်ပြီး code changes များကို ခြေရာခံရန်နှင့် developer အများအပြား ပူးပေါင်းလုပ်ဆောင်နိုင်ရန် ဒီဇိုင်းထုတ်ထားသည်။
Git-LFS (Git Large File Storage): Git repositories များတွင် ကြီးမားသော files များကို ထိရောက်စွာ ကိုင်တွယ်ရန်အတွက် extension တစ်ခု။
Compute Resources: AI/ML model များကို လေ့ကျင့်ရန်နှင့် အသုံးပြုရန်အတွက် လိုအပ်သော ကွန်ပျူတာ hardware (ဥပမာ- CPU, GPU, memory)။
Trained Artifacts: လေ့ကျင့်ထားသော AI model များ၊ ၎င်းတို့၏ weights များ၊ tokenizers များ စသည်ဖြင့် အသုံးပြုနိုင်သော output များ။
Model Repositories: Hugging Face Hub တွင် AI model များကို သိမ်းဆည်းထားသော Git repositories များ။
Authentication Token: အသုံးပြုသူတစ်ဦး၏ အထောက်အထားကို စစ်ဆေးရန်နှင့် ၎င်းတို့၏ permissions များကို အတည်ပြုရန် အသုံးပြုသော လုံခြုံရေး token။
huggingface-cli login: Hugging Face CLI (Command Line Interface) မှတစ်ဆင့် Hub ကို login လုပ်ရန် command။
CLI (Command Line Interface): ကွန်ပျူတာပရိုဂရမ်တစ်ခုနှင့် text-based commands များ အသုံးပြု၍ အပြန်အလှန်ဆက်သွယ်သော နည်းလမ်း။
Notebook: Jupyter Notebook သို့မဟုတ် Google Colab ကဲ့သို့သော interactive computing environment။
notebook_login() Function: Jupyter/Colab Notebooks များတွင် Hugging Face Hub ကို login လုပ်ရန် huggingface_hub library မှ function။
Cache Folder: အနာဂတ်တွင် မြန်ဆန်စွာ ဝင်ရောက်ကြည့်ရှုနိုင်ရန် ဒေတာများကို ယာယီသိမ်းဆည်းထားသော နေရာ။
Namespace: Hugging Face Hub တွင် အသုံးပြုသူများ သို့မဟုတ် organization များအတွက် ထူးခြားသော ID သို့မဟုတ် ခွဲခြားသတ်မှတ်မှု။
Trainer API: Hugging Face Transformers library မှ model များကို ထိရောက်စွာ လေ့ကျင့်ရန်အတွက် ဒီဇိုင်းထုတ်ထားသော မြင့်မားသောအဆင့် (high-level) API။
TrainingArguments: Trainer API တွင် training process အတွက် parameters များကို သတ်မှတ်ရန် အသုံးပြုသော class။
bert-finetuned-mrpc: MRPC dataset တွင် fine-tuned လုပ်ထားသော BERT model ကို ဖော်ပြသော အမည်။
save_strategy="epoch": Model ကို epoch တိုင်း သိမ်းဆည်းရန် သတ်မှတ်ခြင်း။
push_to_hub=True: Model ကို Hugging Face Hub သို့ အလိုအလျောက် push လုပ်ရန် သတ်မှတ်ခြင်း။
trainer.train(): Trainer API ကို အသုံးပြု၍ model ကို လေ့ကျင့်ရန် method။
trainer.push_to_hub(): Trainer API ကို အသုံးပြု၍ လက်ရှိ training အခြေအနေကို Hub သို့ push လုပ်ရန် method။
Model Card: Hugging Face Hub တွင် မော်ဒယ်တစ်ခုစီအတွက် ပါရှိသော အချက်အလက်များပါသည့် စာမျက်နှာ။ ၎င်းတွင် မော်ဒယ်ကို မည်သို့လေ့ကျင့်ခဲ့သည်၊ မည်သည့် datasets များကို အသုံးပြုခဲ့သည်၊ ၎င်း၏ ကန့်သတ်ချက်များ၊ ဘက်လိုက်မှုများ (biases) နှင့် အသုံးပြုနည်းများ ပါဝင်သည်။
Metadata: ဒေတာအကြောင်းအရာများကို ဖော်ပြသော ဒေတာ (ဥပမာ- hyperparameters, evaluation results)။
Hyperparameters: Model ကို လေ့ကျင့်ရာတွင် အသုံးပြုသော ပြင်ပ parameters များ (ဥပမာ- learning rate, batch size)။
Evaluation Results: Model ၏ စွမ်းဆောင်ရည်ကို တိုင်းတာသော ရလဒ်များ (ဥပမာ- accuracy, F1 score)။
Keras: TensorFlow အပေါ်တွင် တည်ဆောက်ထားသော open-source deep learning library တစ်ခု။
PushToHubCallback: Keras models များကို Hub သို့ push လုပ်ရန်အတွက် Hugging Face Transformers library မှ callback class။
model.fit(): Keras မှာ model ကို လေ့ကျင့်ရန် method။
hub_model_id: Hub ပေါ်ရှိ model repository အတွက် သတ်မှတ်ထားသော ID။
Organization Namespace: Hugging Face Hub တွင် organization တစ်ခုအတွက် သတ်မှတ်ထားသော namespace။
Configuration Objects: model ၏ ဖွဲ့စည်းပုံနှင့် parameters များကို သိမ်းဆည်းထားသော object များ။
push_to_hub() Method (Model/Tokenizer/Config): model, tokenizer, သို့မဟုတ် configuration object များပေါ်တွင် တိုက်ရိုက်ရရှိနိုင်သော method ဖြစ်ပြီး ၎င်းတို့ကို Hub သို့ push လုပ်ရန်။
AutoModelForMaskedLM / TFAutoModelForMaskedLM: Masked Language Modeling (MLM) အတွက် AutoModel classes များ (PyTorch/TensorFlow)။
AutoTokenizer: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။
from_pretrained(): Pretrained model သို့မဟုတ် tokenizer ကို Hub မှ load လုပ်ရန် method။
Weights: Model ၏ လေ့ကျင့်ပြီးသား parameters များ။
dummy-model: ဥပမာပြရန်အတွက် အသုံးပြုသော model repository အမည်။
API Token: Access token ကို ရည်ညွှန်းသည်။
use_auth_token Argument: authentication token ကို push_to_hub() method သို့ တိုက်ရိုက်ပေးရန် argument။
bert-base-cased: BERT model ၏ base version, cased text (အကြီးအသေး ခွဲခြားသော) ဖြင့် လေ့ကျင့်ထားသော checkpoint identifier။
model_sharing (Transformers documentation): Transformers documentation ရှိ model မျှဝေခြင်းနှင့် ပတ်သက်သော အပိုင်း။
allenlp (Library): NLP သုတေသနနှင့် အပလီကေးရှင်းများအတွက် အသုံးပြုသော open-source deep learning library။
login, logout, whoami (User Management): huggingface_hub library မှ user account ကို စီမံခန့်ခွဲရန် functions များ။
create_repo, delete_repo, update_repo_visibility (Repository Management): huggingface_hub library မှ repository များကို ဖန်တီး၊ ဖျက်ပစ်၊ မြင်နိုင်စွမ်းကို ပြောင်းလဲရန် functions များ။
list_models, list_datasets, list_metrics, list_repo_files, upload_file, delete_file (Content Information): huggingface_hub library မှ Hub ပေါ်ရှိ content အကြောင်း အချက်အလက်များ ရယူရန်/ပြောင်းလဲရန် functions များ။
private Argument (create_repo): Repository ကို public သို့မဟုတ် private အဖြစ် သတ်မှတ်ရန် argument။
repo_type Argument (create_repo): Repository အမျိုးအစားကို သတ်မှတ်ရန် argument (ဥပမာ- "model", "dataset", "space")။
Space (Hugging Face Space): AI demo များကို host လုပ်ရန်အတွက် Hugging Face Hub တွင်ရှိသော platform။
Owner: Repository ကို ပိုင်ဆိုင်သော အသုံးပြုသူ သို့မဟုတ် organization။
Affiliated with: ဆက်စပ်မှုရှိသော။
README File (README.md): project တစ်ခု သို့မဟုတ် repository တစ်ခုအကြောင်း အချက်အလက်များပါဝင်သော စာသားဖိုင်။ Markdown format ဖြင့် ရေးသားလေ့ရှိသည်။
Markdown: စာသားကို အလှဆင်ရန်အတွက် အသုံးပြုသော lightweight markup language။
.gitattributes File: Git repositories များတွင် ကြီးမားသော files များကို git-lfs ဖြင့် ကိုင်တွယ်ရန် သတ်မှတ်ထားသော configuration file။
HTTP POST Requests: Web server သို့ data ပို့ရန်အတွက် အသုံးပြုသော HTTP method။
Local Repository: သင့်ကွန်ပျူတာပေါ်ရှိ Git repository ၏ copy တစ်ခု။
Clone: Remote Git repository တစ်ခု၏ copy ကို local folder တစ်ခုသို့ ကူးယူခြင်း။
repo.git_pull(): Local repository ကို remote repository မှ နောက်ဆုံးပြောင်းလဲမှုများဖြင့် update လုပ်ရန် method။
repo.git_add(): Files များကို Git ၏ staging area သို့ ထည့်ရန် method။
repo.git_commit(): Staged files များကို repository ၏ history သို့ သိမ်းဆည်းရန် method။
repo.git_push(): Local commits များကို remote repository သို့ ပေးပို့ရန် method။
repo.git_tag(): Git history တွင် သီးခြားမှတ်တိုင်တစ်ခုကို အမှတ်အသားပြုရန် tag တစ်ခု ဖန်တီးရန် method။
model.save_pretrained() / tokenizer.save_pretrained(): model သို့မဟုတ် tokenizer ၏ weights နှင့် configuration များကို local folder တစ်ခုသို့ သိမ်းဆည်းရန် method။
Staging Area: Git တွင် commit မလုပ်မီ changes များကို ယာယီသိမ်းဆည်းထားသော နေရာ။
Commit: Git repository ၏ history သို့ changes များကို သိမ်းဆည်းခြင်း။
Git-based System: Git version control system ကို အသုံးပြုထားသော system။
git lfs install: Git LFS ကို initialize လုပ်ရန် command။
git clone: Git repository ကို clone လုပ်ရန် command။
cd: Command line command တစ်ခုဖြစ်ပြီး directory တစ်ခုမှ အခြားတစ်ခုသို့ ပြောင်းလဲရန် အသုံးပြုသည်။
ls: Command line command တစ်ခုဖြစ်ပြီး လက်ရှိ directory ရှိ ဖိုင်များနှင့် directory များကို ပြသရန် အသုံးပြုသည်။
Configuration File: ဆော့ဖ်ဝဲလ်တစ်ခု၏ setting များနှင့် parameters များကို သိမ်းဆည်းထားသော ဖိုင်။
Vocabulary File: Tokenizer က အသုံးပြုသော tokens များ၏ စာရင်းပါဝင်သော ဖိုင်။
Model State Dict File (pytorch_model.bin / tf_model.h5): PyTorch သို့မဟုတ် TensorFlow model ၏ weights များကို သိမ်းဆည်းထားသော ဖိုင်။
ls -lh: File sizes များကို human-readable format ဖြင့် ပြသရန် ls command ၏ option။
Outlier: အများစုနှင့် ကွဲပြားနေသော အရာ။
Git’s Staging Environment: Git တွင် နောက် commit အတွက် ပြင်ဆင်ထားသော changes များရှိရာ နေရာ။
git add .: လက်ရှိ directory ရှိ changes အားလုံးကို staging area သို့ ထည့်ရန် command။
git status: လက်ရှိ repository ၏ အခြေအနေ (staged, unstaged, untracked files) ကို ပြသရန် command။
Handler: File အမျိုးအစားတစ်ခုကို စီမံခန့်ခွဲသော ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။
Remote Repository: Local repository ၏ အွန်လိုင်း (cloud) ပေါ်ရှိ copy။
git commit -m "Message": Commit message ပါဝင်သော commit တစ်ခု ပြုလုပ်ရန် command။
git push: Local commits များကို remote repository သို့ ပေးပို့ရန် command။
Uploading LFS Objects: Git LFS မှ စီမံခန့်ခွဲသော ကြီးမားသော files များကို upload လုပ်ခြင်း။
Enumerating Objects: Git history တွင် object များကို ရေတွက်ခြင်း။
Counting Objects: Git commit history တွင် object များကို ရေတွက်ခြင်း။
Delta Compression: Changes များကိုသာ သိမ်းဆည်းခြင်းဖြင့် file အရွယ်အစားကို လျှော့ချသော နည်းလမ်း။
Compressing Objects: Git object များကို ချုံ့ခြင်း။
Writing Objects: Git database သို့ object များကို ရေးသားခြင်း။
Total/Reused/Pack-reused: Git push လုပ်ငန်းစဉ်၏ အကျဉ်းချုပ် အချက်အလက်များ။
UI (User Interface): အသုံးပြုသူနှင့် အပြန်အလှန်တုံ့ပြန်နိုင်သော ဂရပ်ဖစ်မျက်နှာပြင်။
Commits: Git repository ၏ history တွင် မှတ်တမ်းတင်ထားသော changes များ။
Diffs: commits နှစ်ခုကြားရှိ ကွာခြားချက်များကို ပြသခြင်း။

Update on GitHub

←Pretrained Models များကို အသုံးပြုခြင်း Model Card တစ်ခု တည်ဆောက်ခြင်း→