Writer
/

palmyra-mini

+# Model Card: palmyra-mini
+## Model Details
+**Model Name:** palmyra-mini
+**Version:** 1.0
+**Type:** Generative AI Language Model
+## Model Description
+The palmyra-mini model demonstrates exceptional capabilities in complex reasoning and mathematical problem-solving domains. Its performance is particularly noteworthy on benchmarks that require deep understanding and multi-step thought processes.
+A key strength of the model is its proficiency in grade-school-level math problems, as evidenced by its impressive score of 0.818 on the gsm8k (strict-match) benchmark. This high score indicates a robust ability to parse and solve word problems, a foundational skill for more advanced quantitative reasoning.
+This aptitude for mathematics is further confirmed by its outstanding performance on the MATH500 benchmark, where it also achieved a score of 0.818. This result underscores the models consistent and reliable mathematical capabilities across different problem sets.
+The model also shows strong performance on the AMC23 benchmark, with a solid score of 0.6. This benchmark, representing problems from the American Mathematics Competitions, highlights the models ability to tackle challenging, competition-level mathematics.
+Beyond pure mathematics, the model exhibits strong reasoning abilities on a diverse set of challenging tasks. Its score of 0.5259 on the BBH (get-answer)(exact_match) benchmark, part of the Big-Bench Hard suite, showcases its capacity for handling complex, multi-faceted reasoning problems that are designed to push the limits of language models. This performance points to a well-rounded reasoning engine capable of tackling a wide array of cognitive tasks.
+## Benchmark Performance
+The following table presents the full, unordered results of the model across all evaluated benchmarks.
+| Benchmark                                                        |    Score |
+|:-----------------------------------------------------------------|---------:|
+| gsm8k (strict-match)                                             | 0.818    |
+| minerva_math(exact_match)                                        | 0.4582   |
+| mmlu_pro(exact_match)                                            | 0.314    |
+| hendrycks_math                                                   | 0.025    |
+| ifeval (inst_level_loose_acc)                                    | 0.4688   |
+| mathqa (acc)                                                     | 0.4509   |
+| humaneval (pass@1)                                               | 0.5      |
+| BBH (get-answer)(exact_match)                                    | 0.5259   |
+| mbpp                                                             | 0.47     |
+| leadboard_musr (acc_norm)                                        | 0.3413   |
+| gpqa  lighteval gpqa diamond_pass@1:8_samples                    | 0.442    |
+| AIME24(pass@1)(avg-of-1)                                         | 0.2      |
+| AIME25(pass@1)(avg-of-1)                                         | 0.25     |
+| Livecodebench-codegen (livecodebench/code_generation_lite v4_v5) | 0.1519   |
+| AMC23                                                            | 0.6      |
+| MATH500                                                          | 0.818    |
+| Minerva                                                          | 0.2794   |
+| Olympiadbench (extractive_match)                                 | 0.3822   |
+| Codecontests (pass_rate)                                         | 0.1034   |
+| Codeforces (pass_rate)                                           | 0.3199   |
+| Taco (pass_rate)                                                 | 0.1744   |
+| APPS (all_levels)                                                | 0.0405   |
+| HMMT23 (extractive_match)                                        | 0.0333   |
+| Average                                                          | 0.355091 |
+## Intended Use
+This model is intended for research and development in the field of generative AI, particularly for tasks requiring mathematical and logical reasoning.
+## Limitations
+The model's performance has been evaluated on a specific set of benchmarks. Its performance on other tasks or in real-world applications may vary.
+## Ethical Considerations
+As with any language model, there is a potential for generating biased or inaccurate information. Users should be aware of these limitations and use the model responsibly.