BEE-spoke-data
/

tFINE-680m-e32-d16-gqa-flan

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on 18 days ago

Commit

57ce7bd

•

1 Parent(s): 88e4e66

Update README.md

Files changed (1) hide show

README.md +10 -18

README.md CHANGED Viewed

@@ -5,20 +5,22 @@ language:
 license: apache-2.0
 base_model: BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024
 tags:
-- generated_from_trainer
-model-index:
-- name: tFINE-680m-e32-d16-gqa-1024-flan-subsets-deduped-1024
-  results: []
 datasets:
 - pszemraj/flan-subsets-deduped
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
-This model is a fine-tuned version of [BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024](https://huggingface.co/BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024) on the pszemraj/flan-subsets-deduped dataset.
 ## testing
@@ -95,13 +97,3 @@ No additional optimizer arguments
 - lr_scheduler_warmup_ratio: 0.05
 - num_epochs: 1.0
-### Training results
-### Framework versions
-- Transformers 4.46.0.dev0
-- Pytorch 2.4.1+cu124
-- Datasets 3.0.1
-- Tokenizers 0.20.1

 license: apache-2.0
 base_model: BEE-spoke-data/tFINE-680m-e32-d16-gqa-1024
 tags:
+- flan
+- t5
+- gqa
+- instruct
 datasets:
 - pszemraj/flan-subsets-deduped
 ---
+# tFINE-680m-e32-d16-gqa-flan
+FLAN-tuned variant of a tFINE (t5) model with GQA.
+- 32 encoder layers
+- 16 decoder layers
+- 1024 hidden size
 ## testing
 - lr_scheduler_warmup_ratio: 0.05
 - num_epochs: 1.0