profitmonk
/

FASTVLM-0.5B-vlm-notebook

vision-language-model

Apple FastVLM-0.5B

image-captioning

Model card Files Files and versions Community

profitmonk commited on 10 days ago

Commit

bf6a997

·

verified ·

1 Parent(s): 1703c24

Create README.md

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+tags:
+- colab-notebook
+- vision-language-model
+- fastvlm
+- Apple FastVLM-0.5B
+- video-analysis
+- image-captioning
+license: mit
+language:
+- en
+base_model:
+- apple/FastVLM-0.5B
+---
+# FastVLM-0.5B Video Analysis and Captioning
+This Colab notebook demonstrates how to use the Apple FastVLM-0.5B model from Hugging Face (`apple/FastVLM-0.5B`) to perform video analysis and generate captions for video frames.
+The notebook covers the following steps:
+1.  **Model Loading**: Loading the FastVLM-0.5B model and its processor using the Hugging Face `transformers` library.
+2.  **Image Captioning**: Testing the model on sample images.
+3.  **Video Processing**: Reading a video file (specifically `/content/drive/MyDrive/VLMs/vlm_warehouse.mp4` in this case) and extracting frames.
+4.  **Inference on Video Frames**: Running the FastVLM model on selected video frames to generate descriptions.
+5.  **Caption Overlay and Video Generation**: Creating a new video file where the original video frames are displayed with the generated captions overlaid or stacked below. The captions update based on the inference performed on key frames.
+## Usage
+You can open this notebook directly in Google Colab by clicking the "Open in Colab" badge on the repository page.
+To run the video analysis section, make sure you have a video file available in your Google Drive at the path specified in the notebook (currently set to `/content/drive/MyDrive/VLMs/vlm_warehouse.mp4`).
+## Model Details
+-   **Model ID**: `apple/FastVLM-0.5B`
+-   **Model Type**: Vision-Language Model
+-   **Library**: Hugging Face `transformers`
+## Datasets Used
+-   Conceptual Captions (used for initial model testing)
+-   Custom video file (`vlm_warehouse.mp4` from Google Drive)
+## Example Output
+Stacked video with original frames that are available with generated captions at the bottom
+## Acknowledgements
+-   The developers of the FastVLM-0.5B model.
+-   The Hugging Face team for the `transformers` and `huggingface_hub` libraries.
+-   Google Colab for providing the environment.
+Feel free to explore and adapt this notebook for your own video analysis tasks!