sebastianhariman commited on
Commit
07961af
·
verified ·
1 Parent(s): 01b52e7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **Image Captioning Models with ResNet50+LSTM, ViT+BERT, and ViT+GPT2**
2
+
3
+ This repository contains the implementation of three advanced image captioning models:
4
+ 1. **ResNet50 + LSTM**: A classic approach using Convolutional Neural Networks (CNNs) for image encoding and LSTMs for sequential caption generation.
5
+ 2. **Vision Transformer (ViT) + BERT**: A transformer-based approach leveraging Vision Transformers (ViT) for image encoding and BERT for text generation.
6
+ 3. **Vision Transformer (ViT) + GPT2**: A generative model combining ViT for image encoding with GPT2’s autoregressive capabilities for text generation.
7
+
8
+ Each model integrates a robust visual encoder and a natural language processing decoder to generate descriptive captions for input images.
9
+
10
+ ---
11
+
12
+ ## **Hyperparameters**
13
+
14
+ The following table summarizes the key training configurations used for each model:
15
+
16
+ | **Parameter** | **ResNet50 + LSTM** | **ViT + BERT** | **ViT + GPT2** |
17
+ |-------------------|---------------------|-----------------|-----------------|
18
+ | **Epochs** | 10 | 10 | 10 |
19
+ | **Batch Size** | 128 | 32 | 32 |
20
+ | **Learning Rate** | 0.0001 | 0.00001 | 0.00001 |
21
+ | **Optimizer** | Adam | Adam | Adam |
22
+ | **Scheduler** | N/A | OneCycleLR | OneCycleLR |
23
+
24
+ ---
25
+
26
+ ## **Evaluation Results**
27
+
28
+ The models were evaluated using popular metrics for image captioning: **BLEU (1-4)**, **METEOR**, and **ROUGE-L**. The table below provides the performance scores for each model:
29
+
30
+ | **Model** | **BLEU-1** | **BLEU-2** | **BLEU-3** | **BLEU-4** | **METEOR** | **ROUGE-L** |
31
+ |--------------------|------------|------------|------------|------------|------------|-------------|
32
+ | **ResNet50 + LSTM**| 0.648 | 0.451 | 0.300 | 0.202 | 0.421 | 0.506 |
33
+ | **ViT + BERT** | 0.725 | **0.551** | **0.395** | **0.278** | 0.501 | **0.546** |
34
+ | **ViT + GPT2** | **0.728** | 0.545 | 0.385 | 0.265 | **0.502** | 0.532 |
35
+
36
+ ---