File size: 2,470 Bytes
17cf6b8
 
0ee3511
 
 
 
 
 
 
 
 
 
17cf6b8
 
0ee3511
 
 
 
 
 
 
4a5c0ac
0ee3511
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f71fe89
0ee3511
 
 
 
 
 
 
f71fe89
0ee3511
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f71fe89
0ee3511
 
 
 
 
 
 
 
 
f71fe89
0ee3511
 
 
 
 
 
 
f71fe89
0ee3511
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
library_name: transformers
license: apache-2.0
language:
- aeb
base_model:
- Qwen/Qwen2-7B-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- trl
---

## Model Overview

Barcha-7B-Instruct is an open model instruction-tuned for Tunisian Derja, is a continually pre-trained and aligned version of Qwen/Qwen2-7B-Instruct with Tunisian_Derja_Dataset
# Uploaded  model

- **Developed by:** Linagora
- **License:** apache-2.0
- **Finetuned from model :** Qwen/Qwen2-7B-Instruct
## Usage
Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:

```sh
pip install transformers
```
#### Running with the `pipeline` API
```python
import torch
from transformers import pipeline
model_id= "linagora/Barcha-7B-Instruct"
pipe = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},    
    device="cuda" # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": ' شنو معنى برشا'},
]

outputs = pipe(messages, max_new_tokens=128,  temperature=0.0)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
```
```
- Response:برشّا هي كلمة تعني كتر من واحد حاجة  
```
#### Running the model on a single / multi GPU
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model_id= "linagora/Barcha-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {"role": "user", "content":  "شنو معنى لاباس""},
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True , add_generation_prompt=True).to(model.device)

outputs = model.generate(**input_ids, max_new_tokens=128)

print(tokenizer.decode(outputs[0]))
```
```
- Response:لاباس هو كلمة جاية من العربية، معناها هل أنت بخير
```
## Citations
When using this model ** Barcha-7B-Instruct **, please cite:

```bibtex
@model{linagora2025LLM-tn,
  author = {Wajdi Ghezaiel and Jean-Pierre Lorré},
  title = {Barcha-7B-Instruct :Tunisian Arabic Derja LLM based on Qwen2-7B},
  year = {2025},
  month = {July},  
  url = {https://huggingface.co/datasets/linagora/Barcha-7B-Instruct}
}

```