Update README.md
Browse files
README.md
CHANGED
@@ -5,20 +5,22 @@ language:
|
|
5 |
license: apache-2.0
|
6 |
base_model: BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024
|
7 |
tags:
|
8 |
-
-
|
9 |
-
|
10 |
-
-
|
11 |
-
|
12 |
datasets:
|
13 |
- pszemraj/flan-subsets-deduped
|
14 |
---
|
15 |
|
16 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
17 |
-
should probably proofread and complete it, then remove this comment. -->
|
18 |
|
19 |
-
#
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## testing
|
24 |
|
@@ -95,13 +97,3 @@ No additional optimizer arguments
|
|
95 |
- lr_scheduler_warmup_ratio: 0.05
|
96 |
- num_epochs: 1.0
|
97 |
|
98 |
-
### Training results
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
### Framework versions
|
103 |
-
|
104 |
-
- Transformers 4.46.0.dev0
|
105 |
-
- Pytorch 2.4.1+cu124
|
106 |
-
- Datasets 3.0.1
|
107 |
-
- Tokenizers 0.20.1
|
|
|
5 |
license: apache-2.0
|
6 |
base_model: BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024
|
7 |
tags:
|
8 |
+
- flan
|
9 |
+
- t5
|
10 |
+
- gqa
|
11 |
+
- instruct
|
12 |
datasets:
|
13 |
- pszemraj/flan-subsets-deduped
|
14 |
---
|
15 |
|
|
|
|
|
16 |
|
17 |
+
# tFINE-680m-e32-d16-gqa-flan
|
18 |
|
19 |
+
FLAN-tuned variant of a tFINE (t5) model with GQA.
|
20 |
+
|
21 |
+
- 32 encoder layers
|
22 |
+
- 16 decoder layers
|
23 |
+
- 1024 hidden size
|
24 |
|
25 |
## testing
|
26 |
|
|
|
97 |
- lr_scheduler_warmup_ratio: 0.05
|
98 |
- num_epochs: 1.0
|
99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|