Add readme and scripts

Browse files

Files changed (5) hide show

README.md +64 -0
scripts/convert_to_fp8e5m2.py +170 -0
scripts/merge_fp8_shards.py +85 -0
scripts/model_download.py +110 -0
scripts/safetensors_info.py +165 -0

README.md CHANGED Viewed

@@ -1,3 +1,67 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+tags:
+- skywork
+- skyreels
+- text-to-video
+- video-generation
+- fp8
+- e5m2
+- quantized
+- 14b
+- 540p
+- comfyui
+# Add more relevant tags
+base_model:
+  - Skywork/SkyReels-V2-DF-14B-540P
+  - Skywork/SkyReels-V2-T2V-14B-540P
 ---
+# SkyReels-V2-14B-540P FP8-E5M2 Quantized Models
+This repository contains FP8-E5M2 quantized versions of the Skywork SkyReels-V2 14B 540P models, suitable for use with hardware supporting this precision (e.g., NVIDIA RTX 3090/40-series with `torch.compile`) and popular workflows like those in ComfyUI.
+These models were quantized by [phazei](https://huggingface.co/phazei).
+## Original Models
+These quantized models are based on the following original FP32 models from Skywork:
+*   **DF Variant:** [Skywork/SkyReels-V2-DF-14B-540P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
+*   **T2V Variant:** [Skywork/SkyReels-V2-T2V-14B-540P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-540P)
+Please refer to the original model cards for details on their architecture, training, and intended use cases.
+## Quantization Details & Acknowledgements
+The models were converted from their original FP32 sharded format to a mixed-precision format. The specific layers quantized to `FP8-E5M2` (primarily weight layers within attention and FFN blocks, while biases and normalization layers were kept in FP32) were identified by analyzing the FP8 quantized models provided by **[Kijai](https://huggingface.co/Kijai)** from his repository **[Kijai/WanVideo_comfy](https://huggingface.co/Kijai/WanVideo_comfy)**.
+This conversion process replicates the quantization pattern observed in Kijai's converted files to produce these `FP8-E5M2` variants. Many thanks to Kijai for sharing his quantized models, which served as a clear reference for this work and benefit the ComfyUI community.
+The conversion was performed using PyTorch and `safetensors`. The scripts used for downloading the original models and performing this conversion are included in the `scripts/` directory of this repository.
+**Key characteristics of the quantized models:**
+*   **Precision:** Mixed (FP32, FP8-E5M2, U8 for metadata)
+*   **Target FP8 type:** `torch.float8_e5m2`
+*   **Compatibility:** Intended for use with PyTorch versions supporting `torch.float8_e5m2` and `torch.compile`. Well-suited for ComfyUI workflows that can leverage these models.
+## Files in this Repository
+*   `SkyReels-V2-DF-14B-540P-fp8e5m2.safetensors`: The quantized DF variant (single file).
+*   `SkyReels-V2-T2V-14B-540P-fp8e5m2.safetensors`: The quantized T2V variant (single file).
+*   `scripts/`: Contains Python scripts for downloading original models and performing the quantization.
+    *   `model_download.py`
+    *   `convert_to_fp8e5m2.py`
+    *   `merge_fp8_shards.py`
+    *   `safetensors_info.py`
+*   `README.md`: This model card.
+## Disclaimer
+This is a community-contributed quantization. While efforts were made to maintain model quality by following an established quantization pattern, performance may differ from the original FP32 models or other quantized versions. Use at your own discretion.
+## Acknowledgements
+*   **Skywork AI** for releasing the original SkyReels models.
+*   **[Kijai](https://huggingface.co/Kijai)** for providing the quantized model versions that served as a reference for the quantization pattern applied in this repository.

scripts/convert_to_fp8e5m2.py ADDED Viewed

	@@ -0,0 +1,170 @@

+import torch
+import os
+import json
+from safetensors.torch import save_file
+from safetensors import safe_open
+from collections import OrderedDict
+from tqdm import tqdm
+import gc # For garbage collection
+# --- Configuration ---
+# INPUT_MODEL_DIR = "F:/Models/SkyReels-V2-DF-14B-540P"
+INPUT_MODEL_DIR = "F:/Models/SkyReels-V2-T2V-14B-540P"
+OUTPUT_SHARD_DIR = os.path.join(INPUT_MODEL_DIR, "converted_fp8_shards") # Subdirectory for new shards
+# Example output shard filename: fp8-model-00001-of-00012.safetensors
+TARGET_FP8_DTYPE = torch.float8_e5m2
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"--- SCRIPT START (Shard-by-Shard Conversion) ---")
+print(f"Using device for conversion: {DEVICE}")
+print(f"Target FP8 dtype: {TARGET_FP8_DTYPE}")
+print(f"Input model directory: {INPUT_MODEL_DIR}")
+print(f"Output shard directory: {OUTPUT_SHARD_DIR}")
+def should_convert_to_fp8(tensor_name: str) -> bool:
+    if not tensor_name.endswith(".weight"):
+        return False
+    if not "blocks." in tensor_name:
+        return False
+    if "cross_attn" in tensor_name or \
+       "ffn" in tensor_name or \
+       "self_attn" in tensor_name:
+        if ".norm_k.weight" in tensor_name or \
+           ".norm_q.weight" in tensor_name or \
+           ".norm.weight" in tensor_name:
+            return False
+        return True
+    return False
+def convert_and_save_shards():
+    print(f"--- ENTERING convert_and_save_shards() ---")
+    index_json_path = os.path.join(INPUT_MODEL_DIR, "model.safetensors.index.json")
+    print(f"Index JSON path: {index_json_path}")
+    if not os.path.exists(index_json_path):
+        print(f"Error: model.safetensors.index.json not found in {INPUT_MODEL_DIR}")
+        return
+    os.makedirs(OUTPUT_SHARD_DIR, exist_ok=True)
+    print(f"Output directory for converted shards created/exists: {OUTPUT_SHARD_DIR}")
+    print(f"Loading index JSON...")
+    try:
+        with open(index_json_path, 'r') as f:
+            index_data = json.load(f)
+        print(f"Index JSON loaded successfully.")
+    except Exception as e:
+        print(f"Error loading or parsing index.json: {e}")
+        return
+    weight_map = index_data.get("weight_map")
+    if not weight_map:
+        print(f"Error: 'weight_map' not found in {index_json_path} or it is empty.")
+        return
+    print(f"Weight map found with {len(weight_map)} entries.")
+    if not weight_map:
+        print(f"Error: 'weight_map' is empty. Cannot proceed.")
+        return
+    # Group tensors by their original shard filename
+    tensors_by_shard = {}
+    for tensor_name, original_shard_filename in weight_map.items():
+        if original_shard_filename not in tensors_by_shard:
+            tensors_by_shard[original_shard_filename] = []
+        tensors_by_shard[original_shard_filename].append(tensor_name)
+    total_original_shards = len(tensors_by_shard)
+    print(f"Found {total_original_shards} unique input shards to process.")
+    # Process each original shard
+    for shard_idx, (original_shard_filename, tensor_names_in_shard) in enumerate(
+        tqdm(tensors_by_shard.items(), desc="Processing input shards", total=total_original_shards)
+    ):
+        current_input_shard_path = os.path.join(INPUT_MODEL_DIR, original_shard_filename)
+        # Construct output shard name, e.g., fp8-model-00001-of-00012.safetensors
+        # Assuming original_shard_filename is like "model-00001-of-00012.safetensors"
+        output_shard_filename_parts = original_shard_filename.split('-')
+        if len(output_shard_filename_parts) == 3: # model-xxxxx-of-yyyyy.safetensors
+            output_shard_filename = f"fp8-{output_shard_filename_parts[0]}-{output_shard_filename_parts[1]}-{output_shard_filename_parts[2]}"
+        else: # Fallback if naming is different
+            output_shard_filename = f"fp8_converted_{original_shard_filename}"
+        current_output_shard_path = os.path.join(OUTPUT_SHARD_DIR, output_shard_filename)
+        print(f"\n--- Processing Shard {shard_idx + 1}/{total_original_shards} ---")
+        print(f"Input shard: {current_input_shard_path}")
+        print(f"Output shard: {current_output_shard_path}")
+        # Skip if output shard already exists (for resumability)
+        if os.path.exists(current_output_shard_path):
+            print(f"Output shard {current_output_shard_path} already exists. Skipping.")
+            # Basic check: try to open it to see if it's valid (optional, adds time)
+            try:
+                with safe_open(current_output_shard_path, framework="pt", device="cpu") as f_test:
+                    _ = f_test.keys() # Just try to get keys
+                print(f"Existing output shard {current_output_shard_path} seems valid.")
+            except Exception as e_test:
+                print(f"Warning: Existing output shard {current_output_shard_path} might be corrupted: {e_test}. Consider deleting it and rerunning for this shard.")
+            continue
+        if not os.path.exists(current_input_shard_path):
+            print(f"Error: Input shard file {current_input_shard_path} not found. Skipping this shard.")
+            continue
+        shard_state_dict = OrderedDict()
+        try:
+            with safe_open(current_input_shard_path, framework="pt", device="cpu") as f_in:
+                for tensor_name in tqdm(tensor_names_in_shard, desc=f"Tensors in {original_shard_filename}", leave=False):
+                    print(f"  Loading tensor: {tensor_name}") # Debug if needed
+                    original_tensor = f_in.get_tensor(tensor_name)
+                    print(f"  Tensor '{tensor_name}' loaded. Dtype: {original_tensor.dtype}, Shape: {original_tensor.shape}")
+                    if should_convert_to_fp8(tensor_name):
+                        print(f"  Converting '{tensor_name}' to {TARGET_FP8_DTYPE} on {DEVICE}...")
+                        converted_tensor = original_tensor.to(DEVICE).to(TARGET_FP8_DTYPE).to("cpu")
+                        shard_state_dict[tensor_name] = converted_tensor
+                    else:
+                        print(f"  Keeping '{tensor_name}' as {original_tensor.dtype}.")
+                        shard_state_dict[tensor_name] = original_tensor.to("cpu") # Ensure on CPU
+            if shard_state_dict:
+                print(f"Saving {len(shard_state_dict)} tensors to new shard: {current_output_shard_path}")
+                save_file(shard_state_dict, current_output_shard_path)
+                print(f"Successfully saved new shard: {current_output_shard_path}")
+            else:
+                print(f"No tensors processed for output shard: {current_output_shard_path}")
+        except Exception as e:
+            print(f"CRITICAL ERROR processing input shard {current_input_shard_path}: {e}")
+            import traceback
+            traceback.print_exc()
+            print(f"Skipping rest of shard {original_shard_filename} due to error.")
+            # Optionally, you might want to delete a partially written output shard if an error occurs mid-save
+            if os.path.exists(current_output_shard_path) and not shard_state_dict: # If error before any save
+                 pass # No partial file to worry about if save_file hasn't been called
+            # If error during save_file, it's harder to handle cleanly without more complex logic
+        # Explicitly clear and collect garbage to free memory
+        del shard_state_dict
+        if 'original_tensor' in locals(): del original_tensor
+        if 'converted_tensor' in locals(): del converted_tensor
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        print(f"Memory cleanup after processing shard {original_shard_filename}")
+    print(f"\n--- All input shards processed. Converted shards are in {OUTPUT_SHARD_DIR} ---")
+if __name__ == "__main__":
+    print(f"--- __main__ block start ---")
+    if not os.path.exists(INPUT_MODEL_DIR):
+        print(f"Error: Input model directory not found: {INPUT_MODEL_DIR}")
+    else:
+        print(f"Input model directory exists. Calling convert_and_save_shards().")
+        convert_and_save_shards()
+    print(f"--- __main__ block end (Shard-by-Shard Conversion) ---")

scripts/merge_fp8_shards.py ADDED Viewed

	@@ -0,0 +1,85 @@

+import torch
+import os
+import json
+from safetensors.torch import load_file, save_file
+from safetensors import safe_open
+from collections import OrderedDict
+from tqdm import tqdm
+import glob # For finding shard files
+# --- Configuration ---
+# Should match OUTPUT_SHARD_DIR from the previous script
+# CONVERTED_SHARDS_DIR = "F:/Models/SkyReels-V2-DF-14B-540P/converted_fp8_shards" # Or T2V path
+CONVERTED_SHARDS_DIR = "F:/Models/SkyReels-V2-T2V-14B-540P/converted_fp8_shards" # Or T2V path
+# Define the final single output file
+FINAL_OUTPUT_MODEL_NAME = "SkyReels-V2-T2V-14B-540P-fp8_e5m2.safetensors" # Example final name
+FINAL_OUTPUT_MODEL_PATH = os.path.join(os.path.dirname(CONVERTED_SHARDS_DIR), FINAL_OUTPUT_MODEL_NAME) # Saves in parent of shards dir
+# This index is needed to know the *intended order* of tensors if it matters,
+# and also to map tensor names to the *new* shard files if your merge logic needs it.
+# However, for a simple merge, we can just load all tensors from all new shards.
+# For a more robust merge that respects original ordering from an index, we'd need one.
+# For now, let's assume we just load everything and save in whatever order they come.
+# If specific order is critical, the original index.json from the FP32 model would be needed
+# to guide the loading order.
+# ORIGINAL_FP32_INDEX_JSON = "F:/Models/SkyReels-V2-DF-14B-540P/model.safetensors.index.json"
+print(f"--- SCRIPT START (Merge Converted Shards) ---")
+print(f"Converted shards directory: {CONVERTED_SHARDS_DIR}")
+print(f"Final output model path: {FINAL_OUTPUT_MODEL_PATH}")
+def merge_converted_shards():
+    if not os.path.exists(CONVERTED_SHARDS_DIR):
+        print(f"Error: Directory with converted shards not found: {CONVERTED_SHARDS_DIR}")
+        return
+    # Find all .safetensors files in the converted_shards_dir
+    # Ensure they are sorted to process in a consistent order (e.g., 00001, 00002, ...)
+    shard_files = sorted(glob.glob(os.path.join(CONVERTED_SHARDS_DIR, "fp8_converted_model-*-of-*.safetensors")))
+    # Or a more generic pattern if your naming was different:
+    # shard_files = sorted(glob.glob(os.path.join(CONVERTED_SHARDS_DIR, "*.safetensors")))
+    if not shard_files:
+        print(f"Error: No converted shard files found in {CONVERTED_SHARDS_DIR}")
+        return
+    print(f"Found {len(shard_files)} converted shards to merge.")
+    merged_state_dict = OrderedDict()
+    for shard_path in tqdm(shard_files, desc="Merging shards"):
+        print(f"Loading tensors from: {shard_path}")
+        try:
+            # Load all tensors from the current converted shard
+            # No need for safe_open with individual get_tensor here, load_file is fine
+            # as these shards are smaller.
+            current_shard_state_dict = load_file(shard_path, device="cpu")
+            merged_state_dict.update(current_shard_state_dict)
+            print(f"  Added {len(current_shard_state_dict)} tensors from {os.path.basename(shard_path)}")
+        except Exception as e:
+            print(f"Error loading shard {shard_path}: {e}")
+            # Decide if you want to stop or continue
+            return # Stop if a shard can't be loaded for the merge
+    if not merged_state_dict:
+        print("No tensors were loaded from shards. Final model file will not be created.")
+        return
+    print(f"\nMerge complete. Total tensors in merged model: {len(merged_state_dict)}")
+    print(f"Saving merged model to {FINAL_OUTPUT_MODEL_PATH}...")
+    try:
+        os.makedirs(os.path.dirname(FINAL_OUTPUT_MODEL_PATH), exist_ok=True)
+        save_file(merged_state_dict, FINAL_OUTPUT_MODEL_PATH)
+        print(f"Successfully saved final merged model to {FINAL_OUTPUT_MODEL_PATH}")
+    except Exception as e:
+        print(f"Error saving the final merged model: {e}")
+if __name__ == "__main__":
+    print(f"--- __main__ block start ---")
+    if not os.path.exists(CONVERTED_SHARDS_DIR):
+         print(f"Error: Converted shards directory not found: {CONVERTED_SHARDS_DIR}")
+    else:
+        merge_converted_shards()
+    print(f"--- __main__ block end (Merge Converted Shards) ---")

scripts/model_download.py ADDED Viewed

	@@ -0,0 +1,110 @@

+import os
+from huggingface_hub import hf_hub_download
+from huggingface_hub.utils import HfHubHTTPError # More specific import path
+from tqdm import tqdm # For progress bars
+# --- Configuration ---
+MODELS_TO_DOWNLOAD = [
+    {
+        "repo_id": "Skywork/SkyReels-V2-DF-14B-540P",
+        "local_base_path": "F:/Models/SkyReels-V2-DF-14B-540P", # Base path for this model
+        "num_shards": 12,
+    },
+    {
+        "repo_id": "Skywork/SkyReels-V2-T2V-14B-540P",
+        "local_base_path": "F:/Models/SkyReels-V2-T2V-14B-540P", # Base path for this model
+        "num_shards": 12,
+    },
+]
+# Common files to download in addition to shards
+COMMON_FILES = [
+    "model.safetensors.index.json"
+    # Add other essential files like config.json, tokenizer_config.json, etc., if needed for loading later
+    # For now, we'll stick to the index file as specifically requested for sharded models.
+    # "config.json",
+    # "generation_config.json",
+    # "special_tokens_map.json",
+    # "tokenizer.json",
+    # "tokenizer_config.json",
+    # "vocab.json"
+]
+def download_model_files(repo_id, local_base_path, num_shards):
+    """
+    Downloads sharded .safetensors model files and common configuration files
+    from a Hugging Face repository.
+    """
+    print(f"\nDownloading files for repository: {repo_id}")
+    print(f"Target local directory: {local_base_path}")
+    # Create the local directory if it doesn't exist
+    os.makedirs(local_base_path, exist_ok=True)
+    # --- Download common files ---
+    for common_file in COMMON_FILES:
+        print(f"Attempting to download: {common_file}...")
+        try:
+            hf_hub_download(
+                repo_id=repo_id,
+                filename=common_file,
+                local_dir=local_base_path,
+                local_dir_use_symlinks=False, # Download actual file
+                resume_download=True,
+            )
+            print(f"Successfully downloaded {common_file}")
+        except HfHubHTTPError as e:
+            if e.response.status_code == 404:
+                print(f"Warning: {common_file} not found in repository {repo_id}. Skipping.")
+            else:
+                print(f"Error downloading {common_file}: {e}")
+        except Exception as e:
+            print(f"An unexpected error occurred while downloading {common_file}: {e}")
+    # --- Download sharded model files ---
+    shard_filenames = []
+    for i in range(1, num_shards + 1):
+        # Filename format: model-00001-of-00012.safetensors
+        shard_filename = f"model-{i:05d}-of-{num_shards:05d}.safetensors"
+        shard_filenames.append(shard_filename)
+    print(f"\nAttempting to download {num_shards} model shards...")
+    for shard_filename in tqdm(shard_filenames, desc=f"Downloading shards for {repo_id}"):
+        try:
+            # print(f"Downloading {shard_filename} to {local_base_path}...") # tqdm provides progress
+            hf_hub_download(
+                repo_id=repo_id,
+                filename=shard_filename,
+                local_dir=local_base_path,
+                local_dir_use_symlinks=False, # Important: download the actual file
+                resume_download=True, # Good for large files
+            )
+            # print(f"Successfully downloaded {shard_filename}") # tqdm indicates completion
+        except HfHubHTTPError as e:
+            print(f"Error downloading {shard_filename}: {e}")
+            if e.response.status_code == 404:
+                print(f"  {shard_filename} not found. Please check repository and shard count.")
+            return False # Stop if a shard download fails
+        except Exception as e:
+            print(f"An unexpected error occurred while downloading {shard_filename}: {e}")
+            return False
+    print(f"All {num_shards} shards for {repo_id} downloaded successfully (or skipped if not found).")
+    return True
+if __name__ == "__main__":
+    print("Starting model download process...")
+    all_successful = True
+    for model_config in MODELS_TO_DOWNLOAD:
+        success = download_model_files(
+            repo_id=model_config["repo_id"],
+            local_base_path=model_config["local_base_path"],
+            num_shards=model_config["num_shards"]
+        )
+        if not success:
+            all_successful = False
+            print(f"Failed to download all files for {model_config['repo_id']}.")
+    if all_successful:
+        print("\nAll specified model files downloaded successfully.")
+    else:
+        print("\nSome model files failed to download. Please check the logs.")

scripts/safetensors_info.py ADDED Viewed

	@@ -0,0 +1,165 @@

+import argparse
+from safetensors import safe_open
+from collections import Counter
+import os
+import math # math.prod is Python 3.8+
+# --- Dtype to Bytes Mapping ---
+# Safetensors Dtype strings:
+# BOOL, F8_E5M2, F8_E4M3FN, F16, BF16, F32, F64,
+# I8, I16, I32, I64, U8, U16, U32, U64,
+# F8_E5M2FNUZ, F8_E4M3FNUZ
+DTYPE_TO_BYTES = {
+    "BOOL": 1,
+    # Float8 variants
+    "F8_E5M2": 1,
+    "F8E5M2": 1,        # Common alternative naming
+    "F8_E4M3FN": 1,
+    "F8E4M3FN": 1,      # Common alternative naming
+    "F8_E4M3": 1,       # As seen in user example, likely E4M3FN
+    "F8_E5M2FNUZ": 1,
+    "F8E5M2FNUZ": 1,    # Common alternative naming
+    "F8_E4M3FNUZ": 1,
+    "F8E4M3FNUZ": 1,    # Common alternative naming
+    # Standard floats
+    "F16": 2,
+    "BF16": 2,
+    "F32": 4,
+    "F64": 8,
+    # Integers
+    "I8": 1,
+    "I16": 2,
+    "I32": 4,
+    "I64": 8,
+    # Unsigned Integers
+    "U8": 1,
+    "U16": 2,
+    "U32": 4,
+    "U64": 8,
+}
+def get_bytes_per_element(dtype_str):
+    """Returns the number of bytes for a given safetensors dtype string."""
+    return DTYPE_TO_BYTES.get(dtype_str.upper(), None)
+def calculate_num_elements(shape):
+    """Calculates the total number of elements from a tensor shape tuple."""
+    if not shape:  # Scalar tensor (shape is ())
+        return 1
+    if 0 in shape: # If any dimension is 0, total elements is 0
+        return 0
+    # Using math.prod for conciseness if Python 3.8+
+    # For broader compatibility, a loop can be used:
+    num_elements = 1
+    for dim_size in shape:
+        num_elements *= dim_size
+    return num_elements
+def inspect_safetensors_precision_and_size(filepath):
+    """
+    Reads a .safetensors file, iterates through its tensors,
+    and reports the precision (dtype), actual size, and theoretical FP32 size.
+    """
+    if not os.path.exists(filepath):
+        print(f"Error: File not found at '{filepath}'")
+        return
+    if not filepath.lower().endswith(".safetensors"):
+        print(f"Warning: File '{filepath}' does not have a .safetensors extension. Attempting to read anyway.")
+    tensor_info_list = []
+    dtype_counts = Counter()
+    total_actual_mb = 0.0
+    total_fp32_equiv_mb = 0.0
+    try:
+        print(f"Inspecting tensors in: {filepath}\n")
+        with safe_open(filepath, framework="pt", device="cpu") as f:
+            tensor_keys = list(f.keys())
+            if not tensor_keys:
+                print("No tensors found in the file.")
+                return
+            max_key_len = len("Tensor Name") # Default/minimum
+            if tensor_keys:
+                 max_key_len = max(max_key_len, max(len(k) for k in tensor_keys))
+            header = (
+                f"{'Tensor Name':<{max_key_len}} | "
+                f"{'Precision (dtype)':<17} | "
+                f"{'Actual Size (MB)':>16} | "
+                f"{'FP32 Equiv. (MB)':>18}"
+            )
+            print(header)
+            print(
+                f"{'-' * max_key_len}-|-------------------|------------------|-------------------"
+            )
+            for key in tensor_keys:
+                tensor_slice = f.get_slice(key)
+                dtype_str = tensor_slice.get_dtype()
+                shape = tensor_slice.get_shape()
+                num_elements = calculate_num_elements(shape)
+                bytes_per_el_actual = get_bytes_per_element(dtype_str)
+                actual_size_mb_str = "N/A"
+                fp32_equiv_size_mb_str = "N/A"
+                actual_size_mb_val = 0.0
+                if bytes_per_el_actual is not None:
+                    actual_bytes = num_elements * bytes_per_el_actual
+                    actual_size_mb_val = actual_bytes / (1024 * 1024)
+                    total_actual_mb += actual_size_mb_val
+                    actual_size_mb_str = f"{actual_size_mb_val:.3f}"
+                    # Theoretical FP32 size (FP32 is 4 bytes per element)
+                    fp32_equiv_bytes = num_elements * 4
+                    fp32_equiv_size_mb_val = fp32_equiv_bytes / (1024 * 1024)
+                    total_fp32_equiv_mb += fp32_equiv_size_mb_val
+                    fp32_equiv_size_mb_str = f"{fp32_equiv_size_mb_val:.3f}"
+                else:
+                    print(f"Warning: Unknown dtype '{dtype_str}' for tensor '{key}'. Cannot calculate size.")
+                print(
+                    f"{key:<{max_key_len}} | "
+                    f"{dtype_str:<17} | "
+                    f"{actual_size_mb_str:>16} | "
+                    f"{fp32_equiv_size_mb_str:>18}"
+                )
+                dtype_counts[dtype_str] += 1
+        print("\n--- Summary ---")
+        print(f"Total tensors found: {len(tensor_keys)}")
+        if dtype_counts:
+            print("Precision distribution:")
+            for dtype, count in dtype_counts.most_common():
+                print(f"  - {dtype:<12}: {count} tensor(s)")
+        else:
+            print("No dtypes to summarize.")
+        print(f"\nTotal actual size of all tensors: {total_actual_mb:.3f} MB")
+        print(f"Total theoretical FP32 size of all tensors: {total_fp32_equiv_mb:.3f} MB")
+        if total_fp32_equiv_mb > 0.00001: # Avoid division by zero or near-zero
+            savings_percentage = (1 - (total_actual_mb / total_fp32_equiv_mb)) * 100
+            print(f"Overall size reduction compared to full FP32: {savings_percentage:.2f}%")
+        else:
+            print("Overall size reduction cannot be calculated (no FP32 equivalent data or zero size).")
+    except Exception as e:
+        print(f"An error occurred while processing '{filepath}':")
+        print(f"  {e}")
+        print("Please ensure it's a valid .safetensors file and the 'safetensors' (and 'torch') libraries are installed correctly.")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Inspect tensor precision (dtype) and size in a .safetensors file."
+    )
+    parser.add_argument(
+        "filepath",
+        help="Path to the .safetensors file to inspect."
+    )
+    args = parser.parse_args()
+    inspect_safetensors_precision_and_size(args.filepath)