TheBloke commited on
Commit
517b77d
1 Parent(s): 58e182c

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -51
README.md CHANGED
@@ -380,13 +380,15 @@ And thank you again to a16z for their generous grant.
380
 
381
  ## OpenHermes x Notus x Neural
382
 
 
 
383
  This is an RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
384
 
385
  DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
386
 
387
- # Training Details
388
 
389
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
390
 
391
  DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
392
 
@@ -442,82 +444,80 @@ In LM-Studio, simply select the ChatML Prefix on the settings side pane:
442
  ```
443
  | Task |Version| Metric |Value | |Stderr|
444
  |------------------------------|------:|--------|-----:|---|-----:|
445
- |agieval_aqua_rat | 0|acc |0.2480|_ |0.0272|
446
- | | |acc_norm|0.2520|_ |0.0273|
447
- |agieval_logiqa_en | 0|acc |0.3810|_ |0.0190|
448
- | | |acc_norm|0.3856|_ |0.0191|
449
- |agieval_lsat_ar | 0|acc |0.2348|_ |0.0280|
450
- | | |acc_norm|0.2304|_ |0.0278|
451
- |agieval_lsat_lr | 0|acc |0.5118|_ |0.0222|
452
- | | |acc_norm|0.5196|_ |0.0221|
453
  |agieval_lsat_rc | 0|acc |0.5948|_ |0.0300|
454
- | | |acc_norm|0.5688|_ |0.0303|
455
- |agieval_sat_en | 0|acc |0.7427|_ |0.0305|
456
- | | |acc_norm|0.7427|_ |0.0305|
457
- |agieval_sat_en_without_passage| 0|acc |0.4563|_ |0.0348|
458
- | | |acc_norm|0.4515|_ |0.0348|
459
- |agieval_sat_math | 0|acc |0.3818|_ |0.0328|
460
- | | |acc_norm|0.3682|_ |0.0326|
461
  ```
462
 
463
- Average: 0.4399
464
 
465
  ## BigBench Hard
466
 
467
  ```
468
- hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
469
  | Task |Version| Metric |Value | |Stderr|
470
  |------------------------------------------------|------:|---------------------|-----:|---|-----:|
471
- |bigbench_causal_judgement | 0|multiple_choice_grade|0.5632|_ |0.0361|
472
- |bigbench_date_understanding | 0|multiple_choice_grade|0.6612|_ |0.0247|
473
  |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3566|_ |0.0299|
474
  |bigbench_geometric_shapes | 0|multiple_choice_grade|0.2006|_ |0.0212|
475
- | | |exact_str_match |0.0334|_ |0.0095|
476
- |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.3020|_ |0.0206|
477
- |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2086|_ |0.0154|
478
- |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.5033|_ |0.0289|
479
- |bigbench_movie_recommendation | 0|multiple_choice_grade|0.4220|_ |0.0221|
480
  |bigbench_navigate | 0|multiple_choice_grade|0.5000|_ |0.0158|
481
- |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.7035|_ |0.0102|
482
- |bigbench_ruin_names | 0|multiple_choice_grade|0.4107|_ |0.0233|
483
- |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2154|_ |0.0130|
484
- |bigbench_snarks | 0|multiple_choice_grade|0.7127|_ |0.0337|
485
- |bigbench_sports_understanding | 0|multiple_choice_grade|0.6988|_ |0.0146|
486
- |bigbench_temporal_sequences | 0|multiple_choice_grade|0.4670|_ |0.0158|
487
- |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2072|_ |0.0115|
488
- |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1731|_ |0.0090|
489
- |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.5033|_ |0.0289|
490
  ```
491
 
492
- Average: 0.4338
493
 
494
  ## GPT4All
495
 
496
  ```
497
  | Task |Version| Metric |Value | |Stderr|
498
  |-------------|------:|--------|-----:|---|-----:|
499
- |arc_challenge| 0|acc |0.5930|_ |0.0144|
500
- | | |acc_norm|0.6323|_ |0.0141|
501
- |arc_easy | 0|acc |0.8443|_ |0.0074|
502
- | | |acc_norm|0.8295|_ |0.0077|
503
  |boolq | 1|acc |0.8599|_ |0.0061|
504
- |hellaswag | 0|acc |0.6548|_ |0.0047|
505
- | | |acc_norm|0.8365|_ |0.0037|
506
- |openbookqa | 0|acc |0.3520|_ |0.0214|
507
- | | |acc_norm|0.4640|_ |0.0223|
508
- |piqa | 0|acc |0.8210|_ |0.0089|
509
- | | |acc_norm|0.8335|_ |0.0087|
510
- |winogrande | 0|acc |0.7466|_ |0.0122|
511
  ```
512
 
513
- Average: 0.7431
514
 
515
  ## TruthfulQA
516
 
517
  ```
518
- hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
519
  | Task |Version|Metric|Value | |Stderr|
520
  |-------------|------:|------|-----:|---|-----:|
521
- |truthfulqa_mc| 1|mc1 |0.4186|_ |0.0173|
522
- | | |mc2 |0.5847|_ |0.0153|
523
  ```
 
380
 
381
  ## OpenHermes x Notus x Neural
382
 
383
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
384
+
385
  This is an RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
386
 
387
  DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
388
 
389
+ Errata: Due to an issue with the DPO-only version failing to generate an eos token, this model was additional SFT with 7000 rows from the openhermes dataset to teach the model to use the eos_token again to end the turn. This resulted in lower benchmark scores. You can find the original DPO-only model in the `dpo-v0` branch.
390
 
391
+ # Training Details
392
 
393
  DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
394
 
 
444
  ```
445
  | Task |Version| Metric |Value | |Stderr|
446
  |------------------------------|------:|--------|-----:|---|-----:|
447
+ |agieval_aqua_rat | 0|acc |0.2559|_ |0.0274|
448
+ | | |acc_norm|0.2598|_ |0.0276|
449
+ |agieval_logiqa_en | 0|acc |0.3733|_ |0.0190|
450
+ | | |acc_norm|0.3886|_ |0.0191|
451
+ |agieval_lsat_ar | 0|acc |0.2522|_ |0.0287|
452
+ | | |acc_norm|0.2522|_ |0.0287|
453
+ |agieval_lsat_lr | 0|acc |0.5137|_ |0.0222|
454
+ | | |acc_norm|0.5294|_ |0.0221|
455
  |agieval_lsat_rc | 0|acc |0.5948|_ |0.0300|
456
+ | | |acc_norm|0.5725|_ |0.0302|
457
+ |agieval_sat_en | 0|acc |0.7379|_ |0.0307|
458
+ | | |acc_norm|0.7282|_ |0.0311|
459
+ |agieval_sat_en_without_passage| 0|acc |0.4466|_ |0.0347|
460
+ | | |acc_norm|0.4466|_ |0.0347|
461
+ |agieval_sat_math | 0|acc |0.3909|_ |0.0330|
462
+ | | |acc_norm|0.3591|_ |0.0324|
463
  ```
464
 
465
+ Average: 0.4364
466
 
467
  ## BigBench Hard
468
 
469
  ```
 
470
  | Task |Version| Metric |Value | |Stderr|
471
  |------------------------------------------------|------:|---------------------|-----:|---|-----:|
472
+ |bigbench_causal_judgement | 0|multiple_choice_grade|0.5684|_ |0.0360|
473
+ |bigbench_date_understanding | 0|multiple_choice_grade|0.6667|_ |0.0246|
474
  |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3566|_ |0.0299|
475
  |bigbench_geometric_shapes | 0|multiple_choice_grade|0.2006|_ |0.0212|
476
+ | | |exact_str_match |0.0724|_ |0.0137|
477
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2980|_ |0.0205|
478
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2071|_ |0.0153|
479
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.5067|_ |0.0289|
480
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|0.4140|_ |0.0220|
481
  |bigbench_navigate | 0|multiple_choice_grade|0.5000|_ |0.0158|
482
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.6980|_ |0.0103|
483
+ |bigbench_ruin_names | 0|multiple_choice_grade|0.4174|_ |0.0233|
484
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2044|_ |0.0128|
485
+ |bigbench_snarks | 0|multiple_choice_grade|0.7238|_ |0.0333|
486
+ |bigbench_sports_understanding | 0|multiple_choice_grade|0.6876|_ |0.0148|
487
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|0.4360|_ |0.0157|
488
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2112|_ |0.0115|
489
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1754|_ |0.0091|
490
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.5067|_ |0.0289|
491
  ```
492
 
493
+ Average: 0.4321
494
 
495
  ## GPT4All
496
 
497
  ```
498
  | Task |Version| Metric |Value | |Stderr|
499
  |-------------|------:|--------|-----:|---|-----:|
500
+ |arc_challenge| 0|acc |0.5862|_ |0.0144|
501
+ | | |acc_norm|0.6297|_ |0.0141|
502
+ |arc_easy | 0|acc |0.8472|_ |0.0074|
503
+ | | |acc_norm|0.8321|_ |0.0077|
504
  |boolq | 1|acc |0.8599|_ |0.0061|
505
+ |hellaswag | 0|acc |0.6520|_ |0.0048|
506
+ | | |acc_norm|0.8357|_ |0.0037|
507
+ |openbookqa | 0|acc |0.3440|_ |0.0213|
508
+ | | |acc_norm|0.4580|_ |0.0223|
509
+ |piqa | 0|acc |0.8199|_ |0.0090|
510
+ | | |acc_norm|0.8319|_ |0.0087|
511
+ |winogrande | 0|acc |0.7482|_ |0.0122|
512
  ```
513
 
514
+ Average: 0.7422
515
 
516
  ## TruthfulQA
517
 
518
  ```
 
519
  | Task |Version|Metric|Value | |Stderr|
520
  |-------------|------:|------|-----:|---|-----:|
521
+ |truthfulqa_mc| 1|mc1 |0.3941|_ |0.0171|
522
+ | | |mc2 |0.5698|_ |0.0154|
523
  ```