πŸš€ Falcon-7b-sharded-bf16-finetuned-sft

Phase-Technologies/falcon-7b-sharded-bf16-finetuned-sft is a sharded version of the Falcon-7B model, optimized with BF16 (Brain Floating Point 16-bit precision) for efficient inference and training on limited-memory GPUs.

πŸ”₯ Key Features:

  • πŸ¦… Based on Falcon-7B, a powerful transformer model
  • πŸ— Sharded for multi-GPU loading
  • 🎯 BF16 precision for lower memory usage
  • ⚑ Optimized for inference & fine-tuning

πŸ“‚ Model Details

Feature πŸ† Details πŸ“œ
Architecture πŸ— Falcon-7B (Transformer-based)
Parameters πŸ”’ 3.84B
Precision 🎯 Brain Floating Point 16 (BF16)
Tokenizer πŸ”€ Hugging Face AutoTokenizer
Use Cases 🎯 Chatbots πŸ€–, Summarization πŸ“š, Q&A ❓, Text Generation ✍️
License πŸ“„ Apache 2.0
Developer 🏒 Phase Technologies

πŸš€ Installation, Setup & Model Loading

πŸ”Ή Install Dependencies

pip install transformers accelerate torch

πŸ”Ή Load the Model in Python

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Phase-Technologies/falcon-7b-sharded-bf16"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model (optimized for BF16 & sharded loading)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

print("βœ… Model Loaded Successfully!")

🎯 Usage

πŸ”Ή Text Generation

prompt = "Once upon a time, in a futuristic world..."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate output
output = model.generate(**inputs, max_length=100)

# Decode and print result
print(tokenizer.decode(output[0], skip_special_tokens=True))

πŸ”Ή Running on Multiple GPUs

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    offload_folder="offload"  # For CPU offloading
)

πŸ“ˆ Performance


πŸ”— References

πŸ› Falcon Model Paper

πŸš€ Hugging Face Documentation

πŸ”₯ Phase Technologies

πŸ“’ Contributions & Issues: If you find a bug or have a feature request, feel free to open an issue! 😊


πŸš€ Happy Coding! πŸ’»πŸŽ‰

Downloads last month
14
Safetensors
Model size
3.84B params
Tensor type
F32
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support