llama-1.6T-dense-50-50-attn
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the Passthrough merge method.
Models Merged
The following models were included in the merge:
- /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn
- /mnt/weka/home/ggb/llama-dense-merge-test/H4xBase-405B-parallel-50-50-attn
Configuration
The following YAML configuration was used to produce this model:
dtype: bfloat16
merge_method: passthrough
tokenizer_source: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn
slices:
- sources:
- model: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn
layer_range: [0, 32]
- sources:
- model: /mnt/weka/home/ggb/llama-dense-merge-test/H4xBase-405B-parallel-50-50-attn
layer_range: [16, 48]
- sources:
- model: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn
layer_range: [32, 64]
- sources:
- model: /mnt/weka/home/ggb/llama-dense-merge-test/H4xBase-405B-parallel-50-50-attn
layer_range: [48, 80]
- sources:
- model: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn
layer_range: [64, 96]
- sources:
- model: /mnt/weka/home/ggb/llama-dense-merge-test/H4xBase-405B-parallel-50-50-attn
layer_range: [80, 112]
- sources:
- model: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn
layer_range: [96, 126]
- Downloads last month
- 261