pszemraj commited on
Commit
57ce7bd
1 Parent(s): 88e4e66

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -18
README.md CHANGED
@@ -5,20 +5,22 @@ language:
5
  license: apache-2.0
6
  base_model: BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024
7
  tags:
8
- - generated_from_trainer
9
- model-index:
10
- - name: tFINE-680m-e32-d16-gqa-1024-flan-subsets-deduped-1024
11
- results: []
12
  datasets:
13
  - pszemraj/flan-subsets-deduped
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
 
19
- # BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
20
 
21
- This model is a fine-tuned version of [BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024](https://huggingface.co/BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024) on the pszemraj/flan-subsets-deduped dataset.
 
 
 
 
22
 
23
  ## testing
24
 
@@ -95,13 +97,3 @@ No additional optimizer arguments
95
  - lr_scheduler_warmup_ratio: 0.05
96
  - num_epochs: 1.0
97
 
98
- ### Training results
99
-
100
-
101
-
102
- ### Framework versions
103
-
104
- - Transformers 4.46.0.dev0
105
- - Pytorch 2.4.1+cu124
106
- - Datasets 3.0.1
107
- - Tokenizers 0.20.1
 
5
  license: apache-2.0
6
  base_model: BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024
7
  tags:
8
+ - flan
9
+ - t5
10
+ - gqa
11
+ - instruct
12
  datasets:
13
  - pszemraj/flan-subsets-deduped
14
  ---
15
 
 
 
16
 
17
+ # tFINE-680m-e32-d16-gqa-flan
18
 
19
+ FLAN-tuned variant of a tFINE (t5) model with GQA.
20
+
21
+ - 32 encoder layers
22
+ - 16 decoder layers
23
+ - 1024 hidden size
24
 
25
  ## testing
26
 
 
97
  - lr_scheduler_warmup_ratio: 0.05
98
  - num_epochs: 1.0
99