File size: 1,195 Bytes
4f24865
38260f4
 
 
 
 
 
 
4f24865
 
38260f4
4f24865
38260f4
4f24865
38260f4
4f24865
38260f4
 
 
 
4f24865
38260f4
4f24865
38260f4
 
4f24865
38260f4
 
4f24865
38260f4
 
 
 
 
 
4f24865
38260f4
4f24865
38260f4
 
 
 
4f24865
38260f4
4f24865
38260f4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
- biology
- protein
- gene-ontology
- GO-terms
---

# Qwen3-1.7B-GO

Qwen3 1.7B model enhanced with pre-trained Gene Ontology (GO) term embeddings.

## Model Description

This model is based on Qwen3 1.7B and includes:
- Pre-trained embeddings for GO terms
- Special tokens for protein sequence handling
- Fine-tuned on GO term descriptions and relationships

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("wanglab/Qwen3-1.7B-go")
tokenizer = AutoTokenizer.from_pretrained("wanglab/Qwen3-1.7B-go")

# Example with GO terms
text = "What is the function of GO:0008150?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## GO Terms

The model includes embeddings for Gene Ontology terms, allowing it to understand and reason about:
- Biological processes (GO:0008150)
- Molecular functions (GO:0003674)  
- Cellular components (GO:0005575)

## Training

GO embeddings were pre-trained using QLora on GO term descriptions and relationships.