Fine-tune Vision AI Model for Volume Recognition

This project demonstrates how to fine-tune a vision AI model for recognizing fluid volumes in test tubes, with applications across medical, laboratory, and industrial settings.

Prerequisites

1. HuggingFace Setup (Required)

Create an account at huggingface.co
Go to Settings → Access Tokens
Create a new token (read access)
Copy and save your token - you'll need it later

Quick Start

Open terminal in your JarvisLabs workspace:
```
File > New Launcher > Terminal
```

Clone the repository:

git clone https://github.com/ictBioRtc/finetune_florence2_vision_language_model.git

Navigate to project directory:
```
cd finetune_vision_ai_model
```
Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```
Copy the public URL provided (e.g., https://ff20bc33e416f3319f.gradio.live)
Open in a new browser tab

Using the Application

Step 1: Test Initial Model (Inference Tab)

Unzip the provided test_images.zip
Go to "Inference" tab
Upload a test image
Leave other settings at default
Click "Run Inference"
Observe how the untrained model performs

Step 2: Train the Model (Training Tab)

Dataset: ictbiortc/beaker-volume-recognition-dataset
Change epochs to 15 (for workshop purposes)
Click "Start Training"
Note: Full training could take ~5 hours

Step 3: Upload Model to HuggingFace

After training completes, click "Upload to Hub"
Enter your model name (e.g., your-username/beaker-volume-recognition-model)
Paste your HuggingFace token
Click "Upload"

Step 4: Important Configuration Update

Go to your model on HuggingFace
Navigate to "Files and versions"
Find config.json

Edit line 165 from:

"model_type": "",

to:

"model_type": "davit",

Step 5: Evaluate Your Model

Return to the app
Go to "Evaluate" tab
Upload a test image
Use your trained model
Compare results with the initial inference

Applications

This volume recognition model has potential applications in:

IV Fluid Monitoring
Laboratory Automation
Medication Dosing
Urine Monitoring
Manufacturing Quality Control
Chemical Processing
Beverage Industry
Petroleum Industry

Training Notes

Full training typically takes days
Workshop version uses 15 epochs (~5 hours)
Larger epoch numbers yield better results
GPU acceleration is recommended

Troubleshooting

Common issues:

"Model not loading": Check your internet connection
"Training too slow": Verify GPU availability
"Upload failed": Verify your HuggingFace token
"Config error": Double-check the davit model_type update

Next Steps

After successful training:

Experiment with different epochs
Try different image types
Test various fluid volumes
Integrate with your specific use case

Congratulations! You've successfully:

Tested a base vision model
Fine-tuned it for volume recognition
Uploaded it to HuggingFace
Created a practical AI solution for real-world applications

This workshop demonstrates how vision language models can be adapted for specific industrial and medical applications.