profitmonk commited on
Commit
bf6a997
·
verified ·
1 Parent(s): 1703c24

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - colab-notebook
4
+ - vision-language-model
5
+ - fastvlm
6
+ - Apple FastVLM-0.5B
7
+ - video-analysis
8
+ - image-captioning
9
+ license: mit
10
+ language:
11
+ - en
12
+ base_model:
13
+ - apple/FastVLM-0.5B
14
+ ---
15
+
16
+ # FastVLM-0.5B Video Analysis and Captioning
17
+
18
+ This Colab notebook demonstrates how to use the Apple FastVLM-0.5B model from Hugging Face (`apple/FastVLM-0.5B`) to perform video analysis and generate captions for video frames.
19
+
20
+ The notebook covers the following steps:
21
+
22
+ 1. **Model Loading**: Loading the FastVLM-0.5B model and its processor using the Hugging Face `transformers` library.
23
+ 2. **Image Captioning**: Testing the model on sample images.
24
+ 3. **Video Processing**: Reading a video file (specifically `/content/drive/MyDrive/VLMs/vlm_warehouse.mp4` in this case) and extracting frames.
25
+ 4. **Inference on Video Frames**: Running the FastVLM model on selected video frames to generate descriptions.
26
+ 5. **Caption Overlay and Video Generation**: Creating a new video file where the original video frames are displayed with the generated captions overlaid or stacked below. The captions update based on the inference performed on key frames.
27
+
28
+ ## Usage
29
+
30
+ You can open this notebook directly in Google Colab by clicking the "Open in Colab" badge on the repository page.
31
+
32
+ To run the video analysis section, make sure you have a video file available in your Google Drive at the path specified in the notebook (currently set to `/content/drive/MyDrive/VLMs/vlm_warehouse.mp4`).
33
+
34
+ ## Model Details
35
+
36
+ - **Model ID**: `apple/FastVLM-0.5B`
37
+ - **Model Type**: Vision-Language Model
38
+ - **Library**: Hugging Face `transformers`
39
+
40
+ ## Datasets Used
41
+
42
+ - Conceptual Captions (used for initial model testing)
43
+ - Custom video file (`vlm_warehouse.mp4` from Google Drive)
44
+
45
+ ## Example Output
46
+
47
+ Stacked video with original frames that are available with generated captions at the bottom
48
+
49
+ ## Acknowledgements
50
+
51
+ - The developers of the FastVLM-0.5B model.
52
+ - The Hugging Face team for the `transformers` and `huggingface_hub` libraries.
53
+ - Google Colab for providing the environment.
54
+
55
+ Feel free to explore and adapt this notebook for your own video analysis tasks!