npc0/DISC-MedLLM-ggml · Hugging Face

This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.

The weights are converted to GGML format using baichuan13b.cpp (based on llama.cpp)

Model	GGML quantize method	HDD size
ggml-model-q4_0.bin	q4_0	7.55 GB
ggml-model-q4_1.bin	q4_1	8.36 GB
ggml-model-q5_0.bin	q5_0	9.17 GB
ggml-model-q5_1.bin	q5_1	9.97 GB
ggml-model-q8_0.bin	q8_0	14 GB

How to inference

Compile baichuan13b, a main executable baichuan13b/build/bin/main and a server baichuan13b/build/bin/server will be generated.
Download the weight in this repository to baichuan13b/build/bin/
For command line interface, the following command is useful. You can also read the doc including other command line parameters
```
cd baichuan13b/build/bin/
./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
```
For API interface, the following command is usefule. You can also read the doc about server command line options
```
cd baichuan13b/build/bin/
./server -m ggml-model-q4_0.bin -c 2048
```

To test API interface, you can use curl:

curl --request POST \
--url http://localhost:8080/completion \
--data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'

Use it in Python

To use it in Python script like cli_demo.py all you need to do is replacing the model.chat() using import requests, POST to localhost:8080 in JSON and decode HTTP return.

import requests
llm_output = requests.post(
  "http://localhost:8080/completion"
).json({
  "prompt": "I feel sick. Nausea and Vomiting.",
  "n_predict": 512
}).json()
print(llm_output)

npc0
/

DISC-MedLLM-ggml

How to inference

Use it in Python

Dataset used to train npc0/DISC-MedLLM-ggml