Yhyu13
/

phi-2-sft-alpaca_gpt4_en-ep1

Text Generation

Model card Files Files and versions

Yhyu13 commited on Dec 20, 2023

Commit

4909fcf

·

1 Parent(s): d927329

Update README.md

Files changed (1) hide show

README.md +35 -0

README.md CHANGED Viewed

@@ -3,3 +3,38 @@ license: other
 license_name: microsoft-research-license
 license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
 ---

 license_name: microsoft-research-license
 license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
 ---
+This is the merged model for LoRA https://huggingface.co/Yhyu13/phi-2-sft-alpaca_gpt4_en-ep1-lora
+---
+From this
+https://huggingface.co/microsoft/phi-2/discussions/38
+Since phi2 requires remote code which HF open llm leaderboard would not accept at this moment,
+I ran phi2 and my sft to the AlpcaEval benchmark
+https://tatsu-lab.github.io/alpaca_eval/
+Here is result evaluated by chatpgpt https://github.com/tatsu-lab/alpaca_eval/pull/183
+```
+                       win_rate  standard_error  n_total  avg_length
+gpt4                      73.79            1.54      805        1365
+claude                    70.37            1.60      805        1082
+chatgpt                   66.09            1.66      805         811
+wizardlm-13b              65.16            1.67      805         985
+vicuna-13b                64.10            1.69      805        1037
+guanaco-65b               62.36            1.71      805        1249
+oasst-rlhf-llama-33b      62.05            1.71      805        1079
+alpaca-farm-ppo-human     60.25            1.72      805         803
+falcon-40b-instruct       56.52            1.74      805         662
+phi-2-alpaca-gpt4(new)    54.23            1.75      804        1138
+text_davinci_003          50.00            0.00      805         307
+alpaca-7b                 45.22            1.74      805         396
+phi-2(new)                43.79            1.74      805         924
+text_davinci_001          28.07            1.56      805         296
+```
+It could be a milestone for small models, we finally have one open model can run for everyone which surpass GPT3.5!