Update README.md
Browse files
README.md
CHANGED
@@ -40,7 +40,7 @@ For RL stage we setup training with:
|
|
40 |
|
41 |
## III. Evaluation Results
|
42 |
|
43 |
-
Our II-Medical-8B model achieved a 40% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to OpenAI's o1 reasoning model and GPT-4.5, OpenAI's largest and most advanced model to date.
|
44 |
. Detailed result for HealthBench can be found [here](https://huggingface.co/datasets/Intelligent-Internet/OpenAI-HealthBench-II-Medical-8B-GPT-4.1).
|
45 |
|
46 |

|
|
|
40 |
|
41 |
## III. Evaluation Results
|
42 |
|
43 |
+
Our II-Medical-8B model achieved a 40% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to OpenAI's o1 reasoning model and GPT-4.5, OpenAI's largest and most advanced model to date. We provide a comparison to models available in ChatGPT below.
|
44 |
. Detailed result for HealthBench can be found [here](https://huggingface.co/datasets/Intelligent-Internet/OpenAI-HealthBench-II-Medical-8B-GPT-4.1).
|
45 |
|
46 |

|