Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.gitattributes +1 -0
README.md +129 -0
config.json +125 -0
easydel-model.parameters +3 -0
easydel-training-arguments.json +106 -0
generation_config.json +11 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+easydel-model.parameters filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+tags:
+- EasyDeL
+- llama
+- CausalLM
+- splash
+- safetensors
+- Flax
+- JAX
+- TPU
+---
+<p align="center">
+  <a href="https://github.com/erfanzar/EasyDeL">
+    <img src="https://raw.githubusercontent.com/erfanzar/easydel/main/images/easydel-logo-with-text.png" height="80">
+  </a>
+</p>
+<p align="center">
+  <a href="https://github.com/erfanzar/EasyDeL">
+    <img src="https://img.shields.io/badge/🤗_EasyDeL-v0.1.5-blue.svg" />
+  </a>
+  <a href="https://github.com/erfanzar/EasyDeL">
+    <img src="https://img.shields.io/badge/Model_Arch-llama-green.svg" />
+  </a>
+</p>
+# Training Run: marin-8b-instruct-orpo
+This document outlines the configuration and parameters used for training the model `marin-8b-instruct-orpo` using the [EasyDeL](https://github.com/erfanzar/EasyDeL) library.
+EasyDeL is an open-source framework designed to enhance and streamline the training process of machine learning models, with a primary focus on JAX/Flax for TPU/GPU environments.
+## How to Load This Checkpoint
+You can load the checkpoint generated from this training run using EasyDeL as follows:
+```python
+import easydel as ed
+from jax import numpy as jnp, lax
+# Path to the directory where this README.md is located
+repo_id = "user/model-id" # <-- TODO: Update this path with the actual save directory or model repo
+model = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
+    repo_id,
+    config_kwargs=EasyDeLBaseConfigDict(
+        # use_scan_mlp=False, # Set to True to potentially reduce memory usage
+        attn_dtype=jnp.float16, # Or jnp.bfloat16
+        # freq_max_position_embeddings=max_length, # Set if using RoPE and need truncation
+        # mask_max_position_embeddings=max_length, # Set if max length is defined
+        attn_mechanism=ed.AttentionMechanisms.SPLASH # Matches the mechanism used by this model
+    ),
+    dtype=jnp.float16, # Or jnp.bfloat16 - Computation data type
+    param_dtype=jnp.float16, # Or jnp.bfloat16 - Parameter data type
+    precision=lax.Precision("fastest"), # Like "default", "fastest", "high", "highest"
+    auto_shard_model=True, # Auto-shard across available devices
+)
+```
+*Note: Replace `checkpoint_path` with the actual path to the saved checkpoint directory.*
+*The `params` returned are ready to be used with the `model`.*
+## Training Configuration Summary
+### Model & Hardware
+- **Model Name (Run Name)**: `marin-8b-instruct-orpo`
+- **Base Model Architecture**: `llama`
+- **Platform**: `TPU`
+- **Number of Devices Used**: `4` (total), `4` (local)
+- **EasyDeL Version**: `v0.1.5`
+### Key Training Parameters
+- **Learning Rate (Start → End)**: `8e-07`
+- **Optimizer**: `EasyDeLOptimizers.ADAMW`
+- **Scheduler**: `EasyDeLSchedulers.COSINE`
+- **Warmup Steps**: `0`
+- **Weight Decay**: `0.01`
+- **Loss Configuration**: `LossConfig(
+  ignore_index : -100
+  label_smoothing : 0.0
+  z_loss : 0.0
+  loss_normalizing_factor : SpecialLossNormalizingFactor.NO_WEIGHT_NUM_REAL_TARGET_TOKENS
+  num_labels : None
+  problem_type : None
+  divide_weight_sum : False
+  shift_tokens : True
+  break_on_nan : True
+  reduction : None
+  num_classification_labels : None
+  classification_problem_type : None
+)`
+### Data & Batching
+- **Number of Training Epochs**: `8`
+- **Total Batch Size (per step)**: `4`
+- **Maximum Sequence Length**: `4096`
+- **Gradient Accumulation Steps**: `1`
+### Datatypes & Precision
+- **Computation `dtype`**: `<class 'jax.numpy.bfloat16'>`
+- **Parameter `param_dtype`**: `<class 'jax.numpy.bfloat16'>`
+- **Gradient Checkpointing Method**: `EasyDeLGradientCheckPointers.NOTHING_SAVEABLE`
+- **Attention Mechanism Used in Training**: `splash` (can be loaded as `AttentionMechanisms.SPLASH` if using `EasyDeLConfig`)
+### Run Control
+- **Max Training Steps**: `Not Set`
+- **Max Evaluation Steps**: `Not Set`
+- **Training Time Limit**: `Not Set`
+## Citation
+If you use EasyDeL in your research or work, please cite it:
+```bibtex
+@misc{Zare Chavoshi_2023,
+    title={EasyDeL: An open-source library for enhancing and streamlining the training process of machine learning models},
+    url={https://github.com/erfanzar/EasyDeL},
+    author={Zare Chavoshi, Erfan},
+    year={2023}
+}
+```
+---
+*This document was automatically generated by EasyDeL v0.1.5 during the training run.*

config.json ADDED Viewed

	@@ -0,0 +1,125 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "attn_mechanism": "splash",
+  "backend": null,
+  "begin_suppress_tokens": [
+    128000,
+    128001
+  ],
+  "bits": null,
+  "blocksize_b": 1,
+  "blocksize_k": 128,
+  "blocksize_q": 128,
+  "bos_token_id": 128000,
+  "decode_attn_mechanism": null,
+  "decoder_start_token_id": 128000,
+  "easy_method": "train",
+  "embd_pdrop": 0.0,
+  "eos_token_id": 128009,
+  "fcm_max_ratio": -1,
+  "fcm_min_ratio": -1,
+  "flash_attention_backward_pass_impl": "triton",
+  "freq_max_position_embeddings": 2048,
+  "gradient_checkpointing": "nothing_saveable",
+  "hardware_abstraction": false,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "kv_cache_quantization_blocksize": 64,
+  "kv_cache_quantization_method": "None",
+  "kv_cache_sharding_sequence_axis_name": "sp",
+  "mask_max_position_embeddings": 2048,
+  "max_position_embeddings": 4096,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "number_rep_kv": 1,
+  "pallas_k_block_size": 128,
+  "pallas_m_block_size": 128,
+  "pallas_n_block_size": 128,
+  "partition_axis": {
+    "attention_dim_axis": null,
+    "attention_kv_dim_axis": null,
+    "batch_axis": [
+      "fsdp",
+      "dp"
+    ],
+    "bias_head_sequence_axis": null,
+    "bias_key_sequence_axis": null,
+    "data_parallel_axis": "dp",
+    "decode_attention_dim_axis": null,
+    "decode_attention_kv_dim_axis": null,
+    "decode_batch_axis": [
+      "fsdp",
+      "dp"
+    ],
+    "decode_head_axis": "tp",
+    "decode_key_sequence_axis": "sp",
+    "decode_kv_head_axis": "tp",
+    "decode_query_sequence_axis": null,
+    "expert_axis": "ep",
+    "expert_gate_axis": null,
+    "expert_parallel_axis": "ep",
+    "fully_sharded_data_parallel_axis": "fsdp",
+    "head_axis": "tp",
+    "hidden_state_axis": "tp",
+    "key_sequence_axis": "sp",
+    "kv_head_axis": "tp",
+    "mlp_intermediate_axis": "tp",
+    "query_sequence_axis": "sp",
+    "sequence_axis": "sp",
+    "sequence_parallel_axis": "sp",
+    "tensor_parallel_axis": "tp",
+    "vocab_axis": "tp"
+  },
+  "platform": null,
+  "precompute_masks": true,
+  "pretraining_tp": 1,
+  "quantization_blocksize": 64,
+  "quantization_method": "None",
+  "quantization_pattern": ".*",
+  "resid_pdrop": 0.0,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 8.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 500000,
+  "scan_attention_layers": false,
+  "scan_layers": false,
+  "scan_mlp_chunk_size": 1024,
+  "scan_ring_attention": true,
+  "sequence_axis_name": "sp",
+  "shard_attention_computation": true,
+  "sharding_axis_dims": [
+    1,
+    -1,
+    1,
+    1
+  ],
+  "sharding_axis_names": [
+    "dp",
+    "fsdp",
+    "tp",
+    "sp"
+  ],
+  "sharding_dcn_axis_dims": null,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "use_scan_mlp": false,
+  "use_sharded_kv_caching": false,
+  "use_sharding_constraint": false,
+  "vocab_size": 128256
+}

easydel-model.parameters ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7ac2e62e6e6a803c321ee79678a0b2df8ea4bfd635d1ed4c44a2410866d99c3d
+size 16060556584

easydel-training-arguments.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "_can_log_metrics": null,
+  "auto_shard_states": true,
+  "aux_loss_enabled": false,
+  "backend": null,
+  "beta": 0.1,
+  "clip_grad": 1.0,
+  "custom_scheduler": null,
+  "dataloader_num_workers": 0,
+  "dataloader_pin_memory": false,
+  "dataset_num_proc": null,
+  "disable_dropout": true,
+  "do_eval": true,
+  "do_last_save": true,
+  "do_train": true,
+  "eval_batch_size": 4,
+  "evaluation_steps": null,
+  "extra_optimizer_kwargs": {},
+  "frozen_parameters": null,
+  "generate_during_eval": false,
+  "gradient_accumulation_steps": 1,
+  "ids_to_pop_from_dataset": [],
+  "init_tx": true,
+  "is_encoder_decoder": null,
+  "is_fine_tuning": true,
+  "jax_distributed_config": null,
+  "label_pad_token_id": -100,
+  "learning_rate": 8e-07,
+  "learning_rate_end": null,
+  "log_all_workers": false,
+  "log_grad_norms": true,
+  "log_steps": 5,
+  "loss_config": {
+    "break_on_nan": true,
+    "classification_problem_type": null,
+    "divide_weight_sum": false,
+    "ignore_index": -100,
+    "label_smoothing": 0.0,
+    "loss_normalizing_factor": "SpecialLossNormalizingFactor.NO_WEIGHT_NUM_REAL_TARGET_TOKENS",
+    "num_classification_labels": null,
+    "num_labels": null,
+    "problem_type": null,
+    "reduction": null,
+    "shift_tokens": true,
+    "z_loss": 0.0
+  },
+  "low_mem_usage": true,
+  "max_completion_length": 2048,
+  "max_evaluation_steps": null,
+  "max_length": 2048,
+  "max_prompt_length": 1024,
+  "max_sequence_length": 4096,
+  "max_training_steps": null,
+  "metrics_to_show_in_rich_pbar": null,
+  "model_name": "marin-8b-instruct-orpo",
+  "model_parameters": null,
+  "num_train_epochs": 8,
+  "offload_dataset": false,
+  "offload_device_index": 0,
+  "offload_device_type": "cpu",
+  "optimizer": "adamw",
+  "padding_value": 128009,
+  "per_epoch_evaluation_steps": null,
+  "per_epoch_training_steps": null,
+  "performance_mode": false,
+  "process_zero_is_admin": true,
+  "progress_bar_type": "json",
+  "pruning_module": null,
+  "remove_ckpt_after_load": false,
+  "remove_unused_columns": true,
+  "report_metrics": true,
+  "report_steps": 10,
+  "save_directory": "EasyDeL-Checkpoints",
+  "save_optimizer_state": false,
+  "save_steps": 1000,
+  "save_total_limit": 1,
+  "scheduler": "cosine",
+  "shuffle_train_dataset": true,
+  "sparse_module_type": "bcoo",
+  "sparsify_module": false,
+  "state_apply_fn_kwarguments_to_model": null,
+  "step_partition_spec": [
+    [
+      "dp",
+      "fsdp"
+    ],
+    "sp"
+  ],
+  "step_start_point": 0,
+  "total_batch_size": 4,
+  "track_memory": false,
+  "train_on_inputs": true,
+  "trainer_config_class": "ORPOConfig",
+  "training_time_limit": null,
+  "truncation_mode": "keep_end",
+  "tx_mu_dtype": null,
+  "use_data_collactor": true,
+  "use_wandb": true,
+  "verbose": true,
+  "wandb_entity": "erfanzar",
+  "wandb_name": null,
+  "warmup_steps": 0,
+  "weight_decay": 0.01,
+  "weight_distribution_log_steps": 100,
+  "weight_distribution_pattern": ".*"
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "_from_model_config": true,
+  "begin_suppress_tokens": [
+    128000,
+    128001
+  ],
+  "bos_token_id": 128000,
+  "decoder_start_token_id": 128000,
+  "eos_token_id": 128009,
+  "transformers_version": "4.51.3"
+}