olsi8 commited on
Commit
01dc5d5
·
verified ·
1 Parent(s): 76b0fb7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -125
README.md CHANGED
@@ -1,199 +1,189 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
- ## Model Details
13
 
14
- ### Model Description
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
 
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
 
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
 
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
49
 
50
- [More Information Needed]
 
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
55
 
56
- [More Information Needed]
 
57
 
58
- ## Bias, Risks, and Limitations
 
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
63
 
64
  ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
91
 
 
92
 
93
- #### Training Hyperparameters
 
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
 
 
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
100
 
101
- [More Information Needed]
 
 
 
102
 
103
- ## Evaluation
 
 
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
 
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 
118
 
119
- [More Information Needed]
 
120
 
121
- #### Metrics
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
- ### Results
128
 
129
- [More Information Needed]
 
130
 
131
- #### Summary
 
132
 
 
 
133
 
 
 
134
 
135
- ## Model Examination [optional]
 
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
 
138
 
139
- [More Information Needed]
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
 
 
 
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
 
156
 
157
- [More Information Needed]
 
158
 
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
 
175
  **BibTeX:**
176
-
177
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
178
 
179
  **APA:**
 
180
 
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
192
 
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
 
196
 
197
- ## Model Card Contact
 
198
 
199
- [More Information Needed]
 
1
+ # Model Card for olsi8/gemma-3-4b-it-shqip-v1
 
 
 
2
 
 
3
 
 
4
 
5
+ ### Model Details
6
 
7
+ **Model Description**
8
 
9
+ This model, identified as olsi8/gemma-3-4b-it-shqip-v1, is a 🤗 Transformers model hosted on the Hugging Face Hub. It represents a fine-tuned iteration of the gemma-3-4b-it model, specifically adapted for the Albanian language (Shqip). The primary purpose of this model is text generation, leveraging the underlying capabilities of the Gemma architecture. This model card provides an overview of its development, intended uses, and other relevant details.
10
 
11
+ **Developed by:**
12
+ The model was developed by the Hugging Face user olsi8.
13
 
14
+ **Funded by (optional):**
15
+ Information regarding funding for this specific fine-tuning effort is not explicitly provided.
16
 
17
+ **Shared by (optional):**
18
+ The model is shared by the Hugging Face user olsi8.
19
 
20
+ **Model type:**
21
+ This is a large language model (LLM) based on the Gemma architecture. It is designed for text generation tasks and has been fine-tuned from the `gemma-3-4b-it` model. The tags on its Hugging Face page include "Image-Text-to-Text", "Transformers", "gemma3", and "text-generation-inference", indicating its capabilities and framework.
 
 
 
 
 
22
 
23
+ **Language(s) (NLP):**
24
+ The primary language supported by this model is Albanian (Shqip). Given its base model, it may retain some capabilities in English, though its fine-tuning focus is on Albanian.
 
25
 
26
+ **License:**
27
+ The license for this specific fine-tuned model is not explicitly stated on its Hugging Face page. It is likely to inherit the license of the base model, `gemma-3-4b-it`, which is typically governed by the Gemma Terms of Use. Users should verify the licensing terms before use.
 
28
 
29
+ **Finetuned from model (optional):**
30
+ This model was fine-tuned from `google/gemma-3-4b-it`.
31
 
 
32
 
 
33
 
34
+ ### Model Sources [optional]
 
 
35
 
36
+ **Repository:**
37
+ The model is hosted on the Hugging Face Model Hub. The repository can be found at [https://huggingface.co/olsi8/gemma-3-4b-it-shqip-v1](https://huggingface.co/olsi8/gemma-3-4b-it-shqip-v1).
38
 
39
+ **Paper [optional]:**
40
+ There is no specific paper associated with this fine-tuned model. However, for information on the base Gemma models, users can refer to the relevant Google publications.
41
 
42
+ **Demo [optional]:**
43
+ No specific demo is provided for this model on its Hugging Face page.
44
 
45
+ ### Uses
46
 
47
+ **Direct Use:**
48
+ This model is intended for direct use in generating text in the Albanian language. It can be employed for tasks such as content creation, translation assistance (from other languages to Albanian, with caution), and as a foundational tool for further fine-tuning on more specific Albanian NLP tasks. Due to its proof-of-concept nature, extensive testing is recommended before deployment in production environments.
49
 
50
+ **Downstream Use [optional]:**
51
+ The model can serve as a base for further fine-tuning on specialized Albanian language datasets for tasks like sentiment analysis, question answering, or domain-specific text generation.
52
 
53
+ **Out-of-Scope Use:**
54
+ This model is not intended for generating harmful, biased, or misleading content. It should not be used for critical decision-making without human oversight, especially given its current stage as a proof-of-concept. Use in languages other than Albanian is not its primary design and may yield suboptimal results.
55
 
56
+ ### Bias, Risks, and Limitations
57
 
58
+ As with all large language models, olsi8/gemma-3-4b-it-shqip-v1 may inherit biases present in its training data, which includes books and the `olsi8/albanian-lang-gemma-format` dataset. These biases could manifest in the generated text. The model's performance is limited by the scope and quality of its training data; it may generate incorrect or nonsensical information (hallucinations). Given that it is a proof-of-concept, its robustness and generalization capabilities might be limited compared to more extensively trained models. The accuracy of 0.77 on the `olsi8/albanian-lang-gemma-format` dataset indicates that while it has learned the task to a degree, there is still room for improvement and potential for errors.
59
 
60
  ### Recommendations
61
 
62
+ Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. It is strongly recommended to thoroughly evaluate the model's outputs for any specific application. Human oversight is crucial when using the model for any sensitive tasks. Users should also note that this model is considered a proof of concept and has been superseded by newer versions or alternative approaches, as indicated by its deprecated status. For production use, exploring more mature and extensively evaluated models is advised.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
 
64
 
 
65
 
66
+ ### How to Get Started with the Model
67
 
68
+ To get started with the `olsi8/gemma-3-4b-it-shqip-v1` model, you can use the Hugging Face Transformers library. Below is an example of how to load the model and tokenizer and generate text. Ensure you have the `transformers` and `torch` libraries installed.
69
 
70
+ ```python
71
+ from transformers import AutoTokenizer, AutoModelForCausalLM
72
 
73
+ model_id = "olsi8/gemma-3-4b-it-shqip-v1"
74
 
75
+ # Load tokenizer and model
76
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
77
+ model = AutoModelForCausalLM.from_pretrained(model_id)
78
 
79
+ # Prepare your input prompt (example in Albanian)
80
+ input_text = "Përshëndetje, si mund të të ndihmoj sot?"
81
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
82
 
83
+ # Generate text
84
+ # Note: You might need to adjust generation parameters like max_length, num_beams, etc.
85
+ # based on your specific needs and the model's capabilities.
86
+ outputs = model.generate(input_ids, max_length=50)
87
 
88
+ # Decode and print the generated text
89
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
90
+ ```
91
 
92
+ It is important to note that since this model is a proof-of-concept and marked as deprecated, users should exercise caution and consider more recent or robust models for production applications. The example above provides a basic framework; further customization of generation parameters will likely be necessary for optimal results.
93
 
94
+ ### Training Details
95
 
96
+ **Training Data**
97
 
98
+ The model `olsi8/gemma-3-4b-it-shqip-v1` was fine-tuned on a combination of Albanian language data. This included a collection of books and the specific dataset `olsi8/albanian-lang-gemma-format`, which is available on Hugging Face at [https://huggingface.co/datasets/olsi8/albanian-lang-gemma-format](https://huggingface.co/datasets/olsi8/albanian-lang-gemma-format). The nature of the books used is not specified in detail, but they contributed to the Albanian language corpus for training.
99
 
100
+ **Training Procedure**
101
 
102
+ *Preprocessing [optional]*
103
+ Details regarding the specific preprocessing steps applied to the training data are not extensively documented on the model card. Standard preprocessing for language models typically includes tokenization, formatting of input-output pairs, and potentially data cleaning or filtering. Given the use of the `albanian-lang-gemma-format` dataset, it is likely that the data was structured to be compatible with the Gemma model's training requirements.
104
 
105
+ *Training Hyperparameters*
106
+ Specific training hyperparameters such as learning rate, batch size, number of epochs, and optimization algorithms used for fine-tuning `olsi8/gemma-3-4b-it-shqip-v1` are not detailed on its Hugging Face page. The training regime would have involved fine-tuning the pre-trained `gemma-3-4b-it` model on the aforementioned Albanian language data.
107
 
108
+ *Speeds, Sizes, Times [optional]*
109
+ Information about the training speed, computational resources utilized, and total training time for this specific fine-tuning effort is not provided.
110
 
 
111
 
 
112
 
113
+ ### Evaluation
114
 
115
+ **Testing Data, Factors & Metrics**
116
 
117
+ *Testing Data*
118
+ The model was evaluated on the `olsi8/albanian-lang-gemma-format` dataset. Information about other specific testing datasets or benchmarks used for a broader evaluation is not readily available.
119
 
120
+ *Factors*
121
+ Key factors influencing the model's performance include the size and quality of the Albanian language data it was fine-tuned on, the architecture of the base `gemma-3-4b-it` model, and the specific fine-tuning process. The model's capabilities are primarily focused on the Albanian language.
122
 
123
+ *Metrics*
124
+ The primary metric reported for this model is an accuracy of 0.77 on the `olsi8/albanian-lang-gemma-format` dataset. Other standard NLP evaluation metrics (e.g., perplexity, BLEU scores for translation-like tasks, ROUGE for summarization, etc.) are not explicitly provided for this model on its Hugging Face page.
125
 
126
+ **Results**
127
+ The reported accuracy of 0.77 on the specified dataset indicates a degree of success in learning the target task for the Albanian language. However, as a proof-of-concept model, these results should be interpreted with caution, and further evaluation on diverse benchmarks would be necessary to fully understand its performance characteristics.
128
 
129
+ **Summary**
130
+ `olsi8/gemma-3-4b-it-shqip-v1` demonstrates foundational capabilities in Albanian language processing, achieving a notable accuracy on its fine-tuning dataset. Nevertheless, its status as a proof-of-concept and deprecated model suggests that it serves more as an experimental iteration than a production-ready solution.
131
 
132
+ ### Model Examination [optional]
133
+ Further examination of the model's outputs, error analysis, and performance on specific linguistic phenomena in Albanian would be beneficial for a deeper understanding of its strengths and weaknesses. Such detailed examination is not provided in the current model card.
134
 
135
+ ### Environmental Impact
136
 
137
+ Information regarding the environmental impact of fine-tuning this specific model is not provided. However, general considerations for large language models apply.
138
 
139
+ * **Hardware Type:** [More Information Needed - Typically GPUs like NVIDIA A100s or H100s are used for training models of this scale, but specifics for this fine-tune are unknown]
140
+ * **Hours used:** [More Information Needed - Training time for fine-tuning can vary based on dataset size and hardware]
141
+ * **Cloud Provider:** [More Information Needed - Common providers include GCP, AWS, Azure, or private clusters]
142
+ * **Compute Region:** [More Information Needed]
143
+ * **Carbon Emitted:** [More Information Needed - Can be estimated using tools like the Machine Learning Impact calculator if hardware and usage details were available]
144
 
145
+ ### Technical Specifications [optional]
146
 
147
+ **Model Architecture and Objective**
148
+ The model utilizes the Gemma architecture, specifically the `gemma-3-4b-it` version, which has approximately 3.4 billion parameters. The objective of this fine-tuned version is causal language modeling, tailored for generating coherent and contextually relevant text in the Albanian language.
 
 
 
149
 
150
+ **Compute Infrastructure**
151
 
152
+ *Hardware*
153
+ Specific hardware used for the fine-tuning process is not detailed. Training models of this size typically requires access to high-performance GPUs.
154
 
155
+ *Software*
156
+ The model was developed using the Hugging Face Transformers library. Other common software includes PyTorch or TensorFlow, CUDA for GPU acceleration, and various Python libraries for data processing.
157
 
158
+ ### Citation [optional]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159
 
160
  **BibTeX:**
161
+ As this is a fine-tuned model by a community user, a specific BibTeX entry for this exact model version may not exist. For the base Gemma models, refer to Google's official publications. A general citation for the model repository could be:
162
+
163
+ ```bibtex
164
+ @misc{olsi8_gemma_3_4b_it_shqip_v1,
165
+ author = {olsi8},
166
+ title = {gemma-3-4b-it-shqip-v1: A fine-tuned Gemma model for Albanian},
167
+ year = {2024},
168
+ publisher = {Hugging Face},
169
+ journal = {Hugging Face repository},
170
+ howpublished = {\url{https://huggingface.co/olsi8/gemma-3-4b-it-shqip-v1}}
171
+ }
172
+ ```
173
 
174
  **APA:**
175
+ olsi8. (2024). *gemma-3-4b-it-shqip-v1: A fine-tuned Gemma model for Albanian*. Hugging Face. Retrieved from https://huggingface.co/olsi8/gemma-3-4b-it-shqip-v1
176
 
177
+ ### Glossary [optional]
178
+ [More Information Needed - A glossary could define terms like "fine-tuning", "causal language modeling", "Gemma architecture", etc., if deemed necessary for the audience.]
 
 
 
 
 
 
 
179
 
180
+ ### More Information [optional]
181
 
182
+ This model, `olsi8/gemma-3-4b-it-shqip-v1`, should be considered a proof-of-concept for fine-tuning Gemma models for the Albanian language. It has been marked as **deprecated**. Users are advised that newer, potentially more robust models or alternative approaches may be available and should be preferred for ongoing development or production use. The model was trained on a combination of books and the `olsi8/albanian-lang-gemma-format` dataset.
183
 
184
+ ### Model Card Authors [optional]
185
+ This model card was generated by an AI assistant based on publicly available information and user-provided details. The original model was developed and shared by the Hugging Face user 'olsi8'.
186
 
187
+ ### Model Card Contact
188
+ For questions or issues regarding the model itself, contacting the model owner 'olsi8' through their Hugging Face profile ([https://huggingface.co/olsi8](https://huggingface.co/olsi8)) would be the most direct approach. For issues with this model card, please refer to the generating entity.
189