You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🧬 STELLA-VLM-JoVE-7B: Laboratory Protocol Vision-Language Model

Model Size Base Model License Python

🎯 Model Description

STELLA-VLM-JoVE-7B is a specialized vision-language model fine-tuned from NVIDIA's Cosmos-Reason1-7B on laboratory protocol videos from JoVE (Journal of Visualized Experiments). This model bridges the gap between visual laboratory demonstrations and written experimental protocols, enabling automated protocol extraction, safety assessment, and error detection from laboratory media.

Key Features

  • πŸ”¬ Protocol Extraction: Automatically generate step-by-step laboratory protocols from videos
  • πŸ“Έ Image Analysis: Comprehensive analysis of laboratory images
  • ⚠️ Error Detection: Identify experimental errors and safety violations
  • πŸ›‘οΈ Safety Assessment: Generate detailed safety reports
  • πŸ§ͺ Equipment Identification: Catalog laboratory equipment and reagents
  • πŸ“Š Batch Processing: Efficiently process multiple videos

πŸš€ Quick Start

Installation

# Install dependencies
pip install torch transformers opencv-python pillow numpy

# Clone this model
git clone https://huggingface.co/Zaixi/STELLA-VLM-JoVE-7B
cd STELLA-VLM-JoVE-7B

Basic Usage

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

# Load model
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Zaixi/STELLA-VLM-JoVE-7B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "Zaixi/STELLA-VLM-JoVE-7B",
    trust_remote_code=True
)

# Analyze laboratory image
image = Image.open("lab_image.jpg")
messages = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "Extract the laboratory protocol from this image:"},
        {"type": "image", "image": image}
    ]
}]

text_input = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = processor(text=[text_input], images=[image], return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)

response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

πŸ”§ Using STELLA_VLM Tool

This repository includes STELLA_VLM (Scientific Tool for Experiment Lab Learning and Analysis), a comprehensive toolkit for laboratory media analysis.

Tool Installation

from stella_vlm_tool import (
    extract_protocol_from_video,
    analyze_lab_image,
    detect_experimental_errors,
    generate_safety_assessment,
    identify_equipment_and_reagents
)

Extract Protocol from Video

# Extract protocol from laboratory video
result = extract_protocol_from_video(
    video_path="experiment.mp4",
    max_frames=8,
    output_format="markdown"
)
print(result)

Analyze Laboratory Image

# Comprehensive image analysis
analysis = analyze_lab_image(
    image_path="lab_setup.jpg",
    analysis_type="comprehensive"  # or "equipment", "procedure", "safety"
)
print(analysis)

Detect Experimental Errors

# Detect errors and safety violations
errors = detect_experimental_errors(
    media_path="experiment.mp4",
    error_categories="all"  # or "technique", "safety", "contamination"
)
print(errors)

Command Line Interface

# Extract protocol
python stella_vlm_tool.py video.mp4 protocol

# Analyze image
python stella_vlm_tool.py image.jpg image

# Detect errors
python stella_vlm_tool.py video.mp4 errors

# Safety assessment
python stella_vlm_tool.py video.mp4 safety

πŸ“Š Model Performance

Capabilities by Domain

Domain Capability Performance
Cell Biology Protocol extraction, sterility assessment Excellent
Chemistry Safety hazard detection, equipment ID Very Good
Molecular Biology Technique validation, contamination detection Excellent
General Lab Equipment identification, PPE compliance Very Good

Recommended Settings

  • Frames per video: 8-12 for optimal detail
  • Max tokens: 1024-2048 for complete protocols
  • Temperature: 0.7 for balanced creativity/accuracy
  • GPU Memory: ~16GB VRAM recommended

πŸ”¬ Example Outputs

Protocol Extraction

Step 1: Prepare sterile PBS buffer at room temperature
Step 2: Add 5 mL of cell culture medium to 15 mL conical tube
Step 3: Centrifuge at 300g for 5 minutes at 4Β°C
Step 4: Carefully aspirate supernatant without disturbing pellet
...

Safety Assessment

PPE Status: βœ… Lab coat, gloves observed
Hazards Identified: Chemical (ethanol), Biological (cell culture)
Safety Violations: None detected
Recommendations: Ensure eye protection when handling chemicals

πŸ“š Training Details

  • Base Model: nvidia/Cosmos-Reason1-7B
  • Training Data: JoVE laboratory protocol videos
  • Fine-tuning Method: LoRA (merged into base model)
  • Training Duration: ~50 hours on 8xA100 GPUs
  • Dataset Size: 10,000+ laboratory videos

⚑ System Requirements

  • GPU: NVIDIA GPU with 16GB+ VRAM (A100, A6000, RTX 4090)
  • RAM: 32GB+ system memory
  • Storage: 30GB for model weights
  • Python: 3.8 or higher
  • CUDA: 11.7 or higher (for GPU acceleration)

πŸ“ Limitations

  • Optimized for laboratory/scientific content
  • Best performance with clear, well-lit videos
  • May require domain expertise to validate outputs
  • Limited to English language protocols

🀝 Contributing

We welcome contributions! Please see our contributing guidelines for details.

πŸ“„ License

This model is released under the MIT License. See LICENSE for details.

πŸ™ Acknowledgments

  • NVIDIA for the Cosmos-Reason base model
  • JoVE (Journal of Visualized Experiments) for laboratory protocol data
  • Open-source community for transformers and vision libraries

πŸ“– Citation

If you use this model in your research, please cite:

@software{cosmos_reason_jove_2024,
  title = {STELLA-VLM-JoVE-7B: Laboratory Protocol Vision-Language Model},
  author = {Zaixi Zhang},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Zaixi/STELLA-VLM-JoVE-7B}
}

πŸ”— Links

πŸ“§ Contact

For questions or support, please open an issue on the Hugging Face repository.


Built with ❀️ for the scientific community

Downloads last month
36
Safetensors
Model size
8.29B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Zaixi/STELLA-VLM-JoVE-7B

Finetuned
(6)
this model
Quantizations
2 models