llama-1.6T-dense-50-50-attn

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Passthrough merge method.

Models Merged

The following models were included in the merge:

  • /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn
  • /mnt/weka/home/ggb/llama-dense-merge-test/H4xBase-405B-parallel-50-50-attn

Configuration

The following YAML configuration was used to produce this model:

dtype: bfloat16
merge_method: passthrough

tokenizer_source: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn

slices:
  - sources: 
    - model: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn 
      layer_range: [0, 32]
  
  - sources: 
    - model: /mnt/weka/home/ggb/llama-dense-merge-test/H4xBase-405B-parallel-50-50-attn
      layer_range: [16, 48]
  
  - sources: 
    - model: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn 
      layer_range: [32, 64]
  
  - sources: 
    - model: /mnt/weka/home/ggb/llama-dense-merge-test/H4xBase-405B-parallel-50-50-attn
      layer_range: [48, 80]
  
  - sources: 
    - model: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn 
      layer_range: [64, 96]
  
  - sources: 
    - model: /mnt/weka/home/ggb/llama-dense-merge-test/H4xBase-405B-parallel-50-50-attn
      layer_range: [80, 112]
  
  - sources: 
    - model: /mnt/weka/home/ggb/llama-dense-merge-test/H3xBase-405B-parallel-50-50-attn 
      layer_range: [96, 126]
  
Downloads last month
261
Safetensors
Model size
1.4T params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support