IlyaGusev
/

saiga_30b_ggml

Text Generation

Russian

conversational

Model card Files Files and versions Community

IlyaGusev commited on May 22, 2023

Commit

f568b29

•

1 Parent(s): 6609b53

Update llama.cpp version

Browse files

Files changed (3) hide show

README.md +6 -6
ggml-model-q4_1.bin +2 -2
interact.py +0 -73

README.md CHANGED Viewed

@@ -11,17 +11,17 @@ pipeline_tag: text2text-generation
 Llama.cpp compatible version of an original [30B model](https://huggingface.co/IlyaGusev/saiga_30b_lora).
 How to run:
 ```
 sudo apt-get install git-lfs
-pip install llama-cpp-python==0.1.38 fire
-git clone https://huggingface.co/IlyaGusev/saiga_30b_lora_llamacpp
-cd saiga_30b_lora_llamacpp
-python3 interact.py ggml-model-q4_1.bin
 ```
 System requirements:
 * 32GB RAM
-* CPU with 4 cores

 Llama.cpp compatible version of an original [30B model](https://huggingface.co/IlyaGusev/saiga_30b_lora).
+* Download `ggml-model-q4_1.bin`.
+* Download [interact_llamacpp.py](https://raw.githubusercontent.com/IlyaGusev/rulm/master/self_instruct/src/interact_llamacpp.py)
 How to run:
 ```
 sudo apt-get install git-lfs
+pip install llama-cpp-python fire
+python3 interact_llamacpp.py ggml-model-q4_1.bin
 ```
 System requirements:
 * 32GB RAM
+* CPU with 4 cores

ggml-model-q4_1.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f4e6bf295d3e2eee786610e147b885193a148425df46d8ac2e45b61151dd7172
-size 24399792512

 version https://git-lfs.github.com/spec/v1
+oid sha256:b2b25d918f5e2b02152a3d2469b1cc0d49de9c4c592ea9f5ae67aad0e66dd8da
+size 20333775232

interact.py DELETED Viewed

@@ -1,73 +0,0 @@
-import fire
-from llama_cpp import Llama
-SYSTEM_PROMPT = "Ты — Сайга, русскоязычный автоматический ассистент. Ты разговариваешь с людьми и помогаешь им."
-SYSTEM_TOKEN = 1788
-USER_TOKEN = 1404
-BOT_TOKEN = 9225
-LINEBREAK_TOKEN = 13
-ROLE_TOKENS = {
-    "user": USER_TOKEN,
-    "bot": BOT_TOKEN,
-    "system": SYSTEM_TOKEN
-}
-def get_message_tokens(model, role, content):
-    message_tokens = model.tokenize(content.encode("utf-8"))
-    message_tokens.insert(1, ROLE_TOKENS[role])
-    message_tokens.insert(2, LINEBREAK_TOKEN)
-    message_tokens.append(model.token_eos())
-    return message_tokens
-def get_system_tokens(model):
-    system_message = {
-        "role": "system",
-        "content": SYSTEM_PROMPT
-    }
-    return get_message_tokens(model, **system_message)
-def interact(
-    model_path,
-    n_ctx=2000,
-    top_k=30,
-    top_p=0.9,
-    temperature=0.2,
-    repeat_penalty=1.1
-):
-    model = Llama(
-        model_path=model_path,
-        n_ctx=n_ctx,
-        n_parts=1,
-    )
-    system_tokens = get_system_tokens(model)
-    tokens = system_tokens
-    model.eval(tokens)
-    while True:
-        user_message = input("User: ")
-        message_tokens = get_message_tokens(model=model, role="user", content=user_message)
-        role_tokens = [model.token_bos(), BOT_TOKEN, LINEBREAK_TOKEN]
-        tokens += message_tokens + role_tokens
-        generator = model.generate(
-            tokens,
-            top_k=top_k,
-            top_p=top_p,
-            temp=temperature,
-            repeat_penalty=repeat_penalty
-        )
-        for token in generator:
-            token_str = model.detokenize([token]).decode("utf-8")
-            tokens.append(token)
-            if token == model.token_eos():
-                break
-            print(token_str, end="", flush=True)
-        print()
-if __name__ == "__main__":
-    fire.Fire(interact)