RichardErkhov commited on
Commit
1553b38
·
verified ·
1 Parent(s): cddf902

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +175 -0
README.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ titulm-llama-3.2-3b-v1.1 - GGUF
11
+ - Model creator: https://huggingface.co/hishab/
12
+ - Original model: https://huggingface.co/hishab/titulm-llama-3.2-3b-v1.1/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [titulm-llama-3.2-3b-v1.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q2_K.gguf) | Q2_K | 1.27GB |
18
+ | [titulm-llama-3.2-3b-v1.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q3_K_S.gguf) | Q3_K_S | 1.44GB |
19
+ | [titulm-llama-3.2-3b-v1.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q3_K.gguf) | Q3_K | 1.57GB |
20
+ | [titulm-llama-3.2-3b-v1.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q3_K_M.gguf) | Q3_K_M | 1.57GB |
21
+ | [titulm-llama-3.2-3b-v1.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q3_K_L.gguf) | Q3_K_L | 1.69GB |
22
+ | [titulm-llama-3.2-3b-v1.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.IQ4_XS.gguf) | IQ4_XS | 1.71GB |
23
+ | [titulm-llama-3.2-3b-v1.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q4_0.gguf) | Q4_0 | 1.79GB |
24
+ | [titulm-llama-3.2-3b-v1.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.IQ4_NL.gguf) | IQ4_NL | 1.79GB |
25
+ | [titulm-llama-3.2-3b-v1.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q4_K_S.gguf) | Q4_K_S | 1.8GB |
26
+ | [titulm-llama-3.2-3b-v1.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q4_K.gguf) | Q4_K | 1.88GB |
27
+ | [titulm-llama-3.2-3b-v1.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q4_K_M.gguf) | Q4_K_M | 1.88GB |
28
+ | [titulm-llama-3.2-3b-v1.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q4_1.gguf) | Q4_1 | 1.95GB |
29
+ | [titulm-llama-3.2-3b-v1.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q5_0.gguf) | Q5_0 | 2.11GB |
30
+ | [titulm-llama-3.2-3b-v1.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q5_K_S.gguf) | Q5_K_S | 2.11GB |
31
+ | [titulm-llama-3.2-3b-v1.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q5_K.gguf) | Q5_K | 2.16GB |
32
+ | [titulm-llama-3.2-3b-v1.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q5_K_M.gguf) | Q5_K_M | 2.16GB |
33
+ | [titulm-llama-3.2-3b-v1.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q5_1.gguf) | Q5_1 | 2.28GB |
34
+ | [titulm-llama-3.2-3b-v1.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q6_K.gguf) | Q6_K | 2.46GB |
35
+ | [titulm-llama-3.2-3b-v1.1.Q8_0.gguf](https://huggingface.co/RichardErkhov/hishab_-_titulm-llama-3.2-3b-v1.1-gguf/blob/main/titulm-llama-3.2-3b-v1.1.Q8_0.gguf) | Q8_0 | 3.19GB |
36
+
37
+
38
+
39
+
40
+ Original model description:
41
+ ---
42
+ language:
43
+ - bn
44
+ library_name: transformers
45
+ pipeline_tag: text-generation
46
+ tags:
47
+ - hishab
48
+ - titulm
49
+ - pytorch
50
+ - llama
51
+ - llama-3
52
+ - llama-factory
53
+ license: llama3.2
54
+ base_model:
55
+ - meta-llama/Llama-3.2-3B
56
+ ---
57
+
58
+ ## Model Information
59
+
60
+ This model is a continually pre-trained version of the [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) architecture, fine-tuned on extensive Bangla datasets. The primary goal of the continual pretraining was to enhance the model's ability to generate high-quality Bangla text. By extending the pretraining process specifically on Bangla data, the model has demonstrated superior performance in Bangla language understanding evaluation benchmarks and text generation tasks.
61
+
62
+ **Model Architecture:** Llama 3.2 is an auto-regressive language model with optimized transformer architecture.
63
+
64
+ | | Training Data | Params | Input modalities | Output modalities | Context Length | GQA | Shared Embeddings | Token count | Knowledge cutoff |
65
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
66
+ | Llama 3.2 (text only) | Hishab curated Bangla text corpus | 3B(3.21B) | Monolingual Text(Bangla) | Monolingual Text(Bangla) | 4096 | Yes | Yes | 8.5B tokens | |
67
+
68
+ **Supported Languages:** Bengali (primary) and English (secondary)
69
+
70
+ **Llama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.
71
+
72
+ **Model Release Date:** October 24, 2024
73
+
74
+ **Status:** This is a static model trained on an offline dataset. Future versions may be released to improve model capabilities.
75
+
76
+ **License:** We are using a similar license to Llama 3.2. Use of Llama 3.2 is governed by the [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE) (a custom, commercial license agreement).
77
+
78
+
79
+ ## How to use
80
+ - Use with transformers
81
+
82
+ Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
83
+
84
+ Make sure to update your transformers installation via pip install --upgrade transformers.
85
+
86
+ ```python
87
+ import torch
88
+ from transformers import pipeline
89
+
90
+ model_id = "hishab/titulm-llama-3.2-3b-v1.1"
91
+
92
+ pipe = pipeline(
93
+ "text-generation",
94
+ model=model_id,
95
+ torch_dtype=torch.bfloat16,
96
+ device_map="auto"
97
+ )
98
+
99
+ pipe("আমাদের দেশের নাম")
100
+ ```
101
+
102
+ ## Hardware and Software
103
+
104
+ **Training Factors:** We used [llama-factory](https://github.com/hiyouga/LLaMA-Factory) training library, Cloud GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on cloud infrastructure.
105
+
106
+
107
+ ## Training Data
108
+
109
+ **Overview:** We have collected a large Bangla raw dataset of text data from a wide variety of sources. Our collected data so far includes a mix of web documents, books, translated text, transliterated text, transcribed text, code-mixed text, conversations, and open-source raw data. The dataset is cleaned and filtered by different filtering criteria to ensure the quality of the data. Our collected data size is roughly around 268 GB. The total trained tokens are 37B tokens.
110
+
111
+ Data sources summary:
112
+ - Web documents: Extracted, clean, and filtered common crawl data
113
+ - Books: Extracted, clean, filtered books data
114
+ - Transcribed text: Used in-house Bangla ASR model to transcribe Bangla audio data
115
+ - Translation data: We trained an English-Bangla translation LLM model and used it to translate English data to Bangla
116
+ - Code-mixed data: We trained an English-Bangla code-mixed LLM model and used it to generate code-mixed data
117
+ - Transliteration data: We trained a Bangla-English transliteration LLM model and used it to generate transliterated data
118
+ - Synthetic data: We generated synthetic data using a Bangla LLM model
119
+ - Others: We scrapped some selected website data, used open-source data, and used some other data sources
120
+
121
+
122
+ ## Benchmarks
123
+
124
+ In this section, we report the results for __titulm-llama-3.2-3b-v1.1__ models on standard automatic benchmarks. For all these evaluations, we used [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) evaluations library.
125
+
126
+ ### Evaluation Datasets
127
+ We evaluated our pre-trained models on both Bangla and English benchmark datasets. Although the model is trained on Bangla data, its English capability is also evaluated on English benchmark datasets. The evaluation datasets are as follows:
128
+
129
+ #### Bangla Benchmark datasets
130
+ We evaluated the models on the following datasets:
131
+ - [Bangla MMLU](): A private multiple choice question dataset developed by Hishab curated from various sources.
132
+ - [CommonsenseQa Bangla](https://huggingface.co/datasets/hishab/commonsenseqa-bn): A Bangla translation of the CommonsenseQA dataset. The dataset was translated using a new method called Expressive Semantic Translation (EST), which combines Google Machine Translation with LLM-based rewriting modifications.
133
+ - [OpenbookQA Bangla](https://huggingface.co/datasets/hishab/openbookqa-bn): A Bangla translation of the OpenbookQA dataset. The dataset was translated using a new method called Expressive Semantic Translation (EST), which combines Google Machine Translation with LLM-based rewriting modifications.
134
+ - [Piqa Bangla](https://huggingface.co/datasets/hishab/piqa-bn): A Bangla translation of the Piqa dataset. The dataset was translated using a new method called Expressive Semantic Translation (EST), which combines Google Machine Translation with LLM-based rewriting modifications.
135
+ - [BoolQ Bangla](https://huggingface.co/datasets/hishab/boolq_bn): The dataset contains 15,942 examples, with each entry consisting of a triplet: (question, passage, answer). The questions are naturally occurring, generated from unprompted and unconstrained settings. Input passages were sourced from Bangla Wikipedia, Banglapedia, and News Articles, and GPT-4 was used to generate corresponding yes/no questions with answers.
136
+
137
+ #### English Benchmark datasets
138
+ - [MMLU](https://huggingface.co/datasets/cais/mmlu): This is a massive multitask test consisting of multiple-choice questions from various branches of knowledge.
139
+ - [CommonseQa](https://huggingface.co/datasets/tau/commonsense_qa): CommonsenseQA is a new multiple-choice question-answering dataset that requires different types of commonsense knowledge to predict the correct answers.
140
+ - [OpenbookQA](https://huggingface.co/datasets/allenai/openbookqa): OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in.
141
+ - [Piqa](https://huggingface.co/datasets/ybisk/piqa): The PIQA dataset focuses on physical commonsense reasoning, challenging AI to handle everyday situations requiring practical knowledge and unconventional solutions. Inspired by instructables.com, it aims to enhance AI's ability to understand and reason about physical interactions.
142
+ - [BoolQ](https://huggingface.co/datasets/google/boolq): BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.
143
+
144
+ ### Evaluation Results
145
+
146
+ #### Evaluation of Bangla Benchmark datasets
147
+ - **llama-3.2-3b** performs best on **Bangla MMLU** with a 0-shot score of **0.36** and 5-shot score of **0.38**.
148
+ - **hishab/titulm-llama-3.2-3b-v1.1** outperforms in most tasks, leading in **BoolQ BN**, **Commonsense QA BN**, **OpenBook QA BN**, and **PIQA BN** in both 0-shot and 5-shot settings, with the highest 5-shot scores.
149
+
150
+ | Model | Shots | Bangla MMLU | BoolQ BN | Commonsense QA BN | OpenBook QA BN | PIQA BN |
151
+ |--------------------------------------|--------|-------------|----------|-------------------|----------------|---------|
152
+ | llama-3.2-3b | 0-shot | **0.36** | 0.55 | 0.26 | 0.31 | 0.56 |
153
+ | | 5-shot | 0.38 | - | 0.29 | 0.32 | 0.58 |
154
+ | hishab/titulm-llama-3.2-3b-v1.1 | 0-shot | 0.35 | **0.66** | **0.31** | **0.37** | **0.62**|
155
+ | | 5-shot | **0.40** | - | **0.40** | **0.37** | **0.63**|
156
+
157
+ #### Evaluation of English Benchmark datasets
158
+ - **llama-3.2-3b** dominates across all tasks, achieving the highest scores in **MMLU**, **BoolQ**, **Commonsense QA**, **OpenBook QA**, and **PIQA**, with improvements in the 5-shot setting.
159
+ - **titulm-llama-3.2-3b-v1.1** shows competitive performance but generally trails behind **llama-3.2-3b**, especially in the 0-shot setting across English benchmarks.
160
+
161
+ | Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
162
+ |--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|
163
+ | llama-3.2-3b | 0-shot | **0.54** | **0.72** | **0.64** | **0.43** | **0.77** |
164
+ | | 5-shot | **0.56** | **0.73** | **0.67** | **0.45** | **0.80** |
165
+ | titulm-llama-3.2-3b-v1.1 | 0-shot | 0.43 | 0.65 | 0.56 | 0.39 | 0.77 |
166
+ | | 5-shot | 0.51 | 0.72 | 0.61 | 0.43 | 0.78 |
167
+
168
+ ### Instruction Tuned Models
169
+
170
+
171
+ ### Intended Use
172
+ - Bangla text generation
173
+ - Bangla language understanding tasks
174
+ - Bangla instruction fine-tuning tasks
175
+