basilisk-4b / README.md

Update README.md

30f6933 almost 2 years ago

5.16 kB

	---
	datasets:
	- Open-Orca/OpenOrca
	library_name: transformers
	tags:
	- llama
	---
	# Basilisk 4B

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

	Built on `winglian/llama-2-4b`, a 4B parameter Llama-2 model, this model is finetuned with open orca CoT data.


	```
	hf-causal-experimental (pretrained=winglian/basilisk-4b,use_accelerate=True,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|------------------------------------------------\|------:\|---------------------\|-----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|0.2362\|_ \|0.0267\|
	\| \| \|acc_norm \|0.2283\|_ \|0.0264\|
	\|agieval_logiqa_en \| 0\|acc \|0.2688\|_ \|0.0174\|
	\| \| \|acc_norm \|0.2811\|_ \|0.0176\|
	\|agieval_lsat_ar \| 0\|acc \|0.2130\|_ \|0.0271\|
	\| \| \|acc_norm \|0.1913\|_ \|0.0260\|
	\|agieval_lsat_lr \| 0\|acc \|0.2255\|_ \|0.0185\|
	\| \| \|acc_norm \|0.2745\|_ \|0.0198\|
	\|agieval_lsat_rc \| 0\|acc \|0.2305\|_ \|0.0257\|
	\| \| \|acc_norm \|0.2491\|_ \|0.0264\|
	\|agieval_sat_en \| 0\|acc \|0.3641\|_ \|0.0336\|
	\| \| \|acc_norm \|0.3495\|_ \|0.0333\|
	\|agieval_sat_en_without_passage \| 0\|acc \|0.2427\|_ \|0.0299\|
	\| \| \|acc_norm \|0.2427\|_ \|0.0299\|
	\|agieval_sat_math \| 0\|acc \|0.2318\|_ \|0.0285\|
	\| \| \|acc_norm \|0.2091\|_ \|0.0275\|
	\|bigbench_causal_judgement \| 0\|multiple_choice_grade\|0.5000\|_ \|0.0364\|
	\|bigbench_date_understanding \| 0\|multiple_choice_grade\|0.3930\|_ \|0.0255\|
	\|bigbench_disambiguation_qa \| 0\|multiple_choice_grade\|0.2674\|_ \|0.0276\|
	\|bigbench_geometric_shapes \| 0\|multiple_choice_grade\|0.1838\|_ \|0.0205\|
	\| \| \|exact_str_match \|0.0279\|_ \|0.0087\|
	\|bigbench_logical_deduction_five_objects \| 0\|multiple_choice_grade\|0.2380\|_ \|0.0191\|
	\|bigbench_logical_deduction_seven_objects \| 0\|multiple_choice_grade\|0.1843\|_ \|0.0147\|
	\|bigbench_logical_deduction_three_objects \| 0\|multiple_choice_grade\|0.3800\|_ \|0.0281\|
	\|bigbench_movie_recommendation \| 0\|multiple_choice_grade\|0.3480\|_ \|0.0213\|
	\|bigbench_navigate \| 0\|multiple_choice_grade\|0.5000\|_ \|0.0158\|
	\|bigbench_reasoning_about_colored_objects \| 0\|multiple_choice_grade\|0.3680\|_ \|0.0108\|
	\|bigbench_ruin_names \| 0\|multiple_choice_grade\|0.2746\|_ \|0.0211\|
	\|bigbench_salient_translation_error_detection \| 0\|multiple_choice_grade\|0.2806\|_ \|0.0142\|
	\|bigbench_snarks \| 0\|multiple_choice_grade\|0.4972\|_ \|0.0373\|
	\|bigbench_sports_understanding \| 0\|multiple_choice_grade\|0.4939\|_ \|0.0159\|
	\|bigbench_temporal_sequences \| 0\|multiple_choice_grade\|0.2740\|_ \|0.0141\|
	\|bigbench_tracking_shuffled_objects_five_objects \| 0\|multiple_choice_grade\|0.1904\|_ \|0.0111\|
	\|bigbench_tracking_shuffled_objects_seven_objects\| 0\|multiple_choice_grade\|0.1394\|_ \|0.0083\|
	\|bigbench_tracking_shuffled_objects_three_objects\| 0\|multiple_choice_grade\|0.3800\|_ \|0.0281\|

	hf-causal-experimental (pretrained=winglian/basilisk-4b,use_accelerate=True,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 12
	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|-------------\|------:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|0.3285\|_ \|0.0137\|
	\| \| \|acc_norm\|0.3532\|_ \|0.0140\|
	\|arc_easy \| 0\|acc \|0.6364\|_ \|0.0099\|
	\| \| \|acc_norm\|0.6035\|_ \|0.0100\|
	\|boolq \| 1\|acc \|0.7196\|_ \|0.0079\|
	\|hellaswag \| 0\|acc \|0.4239\|_ \|0.0049\|
	\| \| \|acc_norm\|0.5473\|_ \|0.0050\|
	\|openbookqa \| 0\|acc \|0.2220\|_ \|0.0186\|
	\| \| \|acc_norm\|0.3320\|_ \|0.0211\|
	\|piqa \| 0\|acc \|0.6937\|_ \|0.0108\|
	\| \| \|acc_norm\|0.6921\|_ \|0.0108\|
	\|winogrande \| 0\|acc \|0.5399\|_ \|0.0140\|
	```

	---
	datasets:
	- Open-Orca/OpenOrca
	library_name: transformers
	tags:
	- llama
	---
	# Basilisk 4B

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

	Built on `winglian/llama-2-4b`, a 4B parameter Llama-2 model, this model is finetuned with open orca CoT data.


	```
	hf-causal-experimental (pretrained=winglian/basilisk-4b,use_accelerate=True,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|------------------------------------------------\|------:\|---------------------\|-----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|0.2362\|_ \|0.0267\|
	\| \| \|acc_norm \|0.2283\|_ \|0.0264\|
	\|agieval_logiqa_en \| 0\|acc \|0.2688\|_ \|0.0174\|
	\| \| \|acc_norm \|0.2811\|_ \|0.0176\|
	\|agieval_lsat_ar \| 0\|acc \|0.2130\|_ \|0.0271\|
	\| \| \|acc_norm \|0.1913\|_ \|0.0260\|
	\|agieval_lsat_lr \| 0\|acc \|0.2255\|_ \|0.0185\|
	\| \| \|acc_norm \|0.2745\|_ \|0.0198\|
	\|agieval_lsat_rc \| 0\|acc \|0.2305\|_ \|0.0257\|
	\| \| \|acc_norm \|0.2491\|_ \|0.0264\|
	\|agieval_sat_en \| 0\|acc \|0.3641\|_ \|0.0336\|
	\| \| \|acc_norm \|0.3495\|_ \|0.0333\|
	\|agieval_sat_en_without_passage \| 0\|acc \|0.2427\|_ \|0.0299\|
	\| \| \|acc_norm \|0.2427\|_ \|0.0299\|
	\|agieval_sat_math \| 0\|acc \|0.2318\|_ \|0.0285\|
	\| \| \|acc_norm \|0.2091\|_ \|0.0275\|
	\|bigbench_causal_judgement \| 0\|multiple_choice_grade\|0.5000\|_ \|0.0364\|
	\|bigbench_date_understanding \| 0\|multiple_choice_grade\|0.3930\|_ \|0.0255\|
	\|bigbench_disambiguation_qa \| 0\|multiple_choice_grade\|0.2674\|_ \|0.0276\|
	\|bigbench_geometric_shapes \| 0\|multiple_choice_grade\|0.1838\|_ \|0.0205\|
	\| \| \|exact_str_match \|0.0279\|_ \|0.0087\|
	\|bigbench_logical_deduction_five_objects \| 0\|multiple_choice_grade\|0.2380\|_ \|0.0191\|
	\|bigbench_logical_deduction_seven_objects \| 0\|multiple_choice_grade\|0.1843\|_ \|0.0147\|
	\|bigbench_logical_deduction_three_objects \| 0\|multiple_choice_grade\|0.3800\|_ \|0.0281\|
	\|bigbench_movie_recommendation \| 0\|multiple_choice_grade\|0.3480\|_ \|0.0213\|
	\|bigbench_navigate \| 0\|multiple_choice_grade\|0.5000\|_ \|0.0158\|
	\|bigbench_reasoning_about_colored_objects \| 0\|multiple_choice_grade\|0.3680\|_ \|0.0108\|
	\|bigbench_ruin_names \| 0\|multiple_choice_grade\|0.2746\|_ \|0.0211\|
	\|bigbench_salient_translation_error_detection \| 0\|multiple_choice_grade\|0.2806\|_ \|0.0142\|
	\|bigbench_snarks \| 0\|multiple_choice_grade\|0.4972\|_ \|0.0373\|
	\|bigbench_sports_understanding \| 0\|multiple_choice_grade\|0.4939\|_ \|0.0159\|
	\|bigbench_temporal_sequences \| 0\|multiple_choice_grade\|0.2740\|_ \|0.0141\|
	\|bigbench_tracking_shuffled_objects_five_objects \| 0\|multiple_choice_grade\|0.1904\|_ \|0.0111\|
	\|bigbench_tracking_shuffled_objects_seven_objects\| 0\|multiple_choice_grade\|0.1394\|_ \|0.0083\|
	\|bigbench_tracking_shuffled_objects_three_objects\| 0\|multiple_choice_grade\|0.3800\|_ \|0.0281\|

	hf-causal-experimental (pretrained=winglian/basilisk-4b,use_accelerate=True,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 12
	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|-------------\|------:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|0.3285\|_ \|0.0137\|
	\| \| \|acc_norm\|0.3532\|_ \|0.0140\|
	\|arc_easy \| 0\|acc \|0.6364\|_ \|0.0099\|
	\| \| \|acc_norm\|0.6035\|_ \|0.0100\|
	\|boolq \| 1\|acc \|0.7196\|_ \|0.0079\|
	\|hellaswag \| 0\|acc \|0.4239\|_ \|0.0049\|
	\| \| \|acc_norm\|0.5473\|_ \|0.0050\|
	\|openbookqa \| 0\|acc \|0.2220\|_ \|0.0186\|
	\| \| \|acc_norm\|0.3320\|_ \|0.0211\|
	\|piqa \| 0\|acc \|0.6937\|_ \|0.0108\|
	\| \| \|acc_norm\|0.6921\|_ \|0.0108\|
	\|winogrande \| 0\|acc \|0.5399\|_ \|0.0140\|
	```