File size: 1,274 Bytes
e55133d
 
 
 
f655f35
 
 
 
 
 
 
e55133d
 
f655f35
e55133d
f655f35
e55133d
f655f35
e55133d
f655f35
e55133d
f655f35
e55133d
 
f655f35
e55133d
 
f655f35
e55133d
 
 
 
f655f35
e55133d
 
 
 
f655f35
e55133d
 
 
f655f35
e55133d
 
 
 
 
f655f35
 
e55133d
 
 
 
f655f35
 
e55133d
 
f655f35
e55133d
f655f35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
frameworks:
- Pytorch
license: other
license_name: glm-4
license_link: LICENSE
pipeline_tag: image-text-to-text
tags:
  - glm
  - edge
inference: false
---

# GLM-Edge-1.5B-Chat

中文阅读, 点击[这里](README_zh.md)

## Inference with Transformers

### Installation

Install the transformers library from the source code:

```shell
pip install git+https://github.com/huggingface/transformers.git
```

### Inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "THUDM/glm-edge-1.5b-chat"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")

message = [{"role": "user", "content": "hello!"}]

inputs = tokenizer.apply_chat_template(
    message,
    return_tensors="pt",
    add_generation_prompt=True,
    return_dict=True,
).to(model.device)

generate_kwargs = {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"],
    "max_new_tokens": 128,
    "do_sample": False,
}
out = model.generate(**generate_kwargs)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

```

## License

The usage of this model’s weights is subject to the terms outlined in the [LICENSE](LICENSE).