Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,9 @@ For RL stage we setup training with:
|
|
41 |
## III. Evaluation Results
|
42 |
|
43 |
Our II-Medical-8B model achieved a 40% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to OpenAI's o1 reasoning model and GPT-4.5, OpenAI's largest and most advanced model to date. We provide a comparison to models available in ChatGPT below.
|
44 |
-
|
|
|
|
|
45 |
|
46 |

|
47 |
|
|
|
41 |
## III. Evaluation Results
|
42 |
|
43 |
Our II-Medical-8B model achieved a 40% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to OpenAI's o1 reasoning model and GPT-4.5, OpenAI's largest and most advanced model to date. We provide a comparison to models available in ChatGPT below.
|
44 |
+
|
45 |
+

|
46 |
+
Detailed result for HealthBench can be found [here](https://huggingface.co/datasets/Intelligent-Internet/OpenAI-HealthBench-II-Medical-8B-GPT-4.1).
|
47 |
|
48 |

|
49 |
|