Spaces:

LPX55
/

QwenStoryteller

Running on Zero

App Files Files Community

LPX55 commited on 20 days ago

Commit

7de8390

verified ·

1 Parent(s): 24db381

Update README.md

Browse files

Files changed (1) hide show

README.md +79 -5

README.md CHANGED Viewed

@@ -1,12 +1,86 @@
 ---
-title: QwenStoryteller
 emoji: 📚
-colorFrom: indigo
-colorTo: indigo
 sdk: gradio
 sdk_version: 5.30.0
 app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Qwen2.5-VL | 📔 Storyteller
 emoji: 📚
+colorFrom: red
+colorTo: red
 sdk: gradio
 sdk_version: 5.30.0
 app_file: app.py
+pinned: true
+tags:
+- vision-language-model
+- visual-storytelling
+- chain-of-thought
+- grounded-text-generation
+- cross-frame-consistency
+- storytelling
+- image-to-text
+license: apache-2.0
+datasets:
+- daniel3303/StoryReasoning
+base_model:
+- Qwen/Qwen2.5-VL-7B-Instruct
+pipeline_tag: image-to-text
+model-index:
+- name: QwenStoryteller
+  results:
+  - task:
+      type: visual-storytelling
+      name: Visual Storytelling
+    dataset:
+      name: StoryReasoning
+      type: daniel3303/StoryReasoning
+      split: test
+language: en, zh
 ---
+# QwenStoryteller
+This HF Space is a simple implementation of [2505.10292](https://arxiv.org/abs/2505.10292) by Daniel A. P. Oliveira and David Martins de Matos. BibTeX citation provided below. The space was created as a POC, all other credits go to Daniel and David.
+QwenStoryteller is a fine-tuned version of Qwen2.5-VL 7B specialized for grounded visual storytelling with cross-frame consistency, capable of generating coherent narratives from multiple images while maintaining character and object identity throughout the story.
+## Model Description
+**Base Model:** Qwen2.5-VL 7B
+**Training Method:** LoRA fine-tuning (rank 2048, alpha 4096)
+**Training Dataset:** [StoryReasoning](https://huggingface.co/datasets/daniel3303/StoryReasoning)
+QwenStoryteller processes sequences of images to perform:
+- End-to-end object detection
+- Cross-frame object re-identification
+- Landmark detection
+- Chain-of-thought reasoning for scene understanding
+- Grounded story generation with explicit visual references
+The model was fine-tuned on the StoryReasoning dataset using LoRA with a rank of 2048 and alpha scaling factor of 4096, targeting self-attention layers of the language components. Training used a peak learning rate of 1×10⁻⁴ with batch size 32, warmup for the first 3% of steps for 4 epochs, AdamW optimizer with weight decay 0.01, and bfloat16 precision.
+## System Prompt
+The model was trained with the following system prompt, and we recommend using it as it is for inference.
+```
+You are an AI storyteller that can analyze sequences of images and create creative narratives.
+First think step-by-step to analyze characters, objects, settings, and narrative structure.
+Then create a grounded story that maintains consistent character identity and object references across frames.
+Use <think></think> tags to show your reasoning process before writing the final story.
+```
+## Key Features
+- **Cross-Frame Consistency:** Maintains consistent character and object identity across multiple frames through visual similarity and face recognition techniques
+- **Structured Reasoning:** Employs chain-of-thought reasoning to analyze scenes with explicit modeling of characters, objects, settings, and narrative structure
+- **Grounded Storytelling:** Uses specialized XML tags to link narrative elements directly to visual entities
+- **Reduced Hallucinations:** Achieves 12.3% fewer hallucinations compared to the non-fine-tuned base model
+```
+@misc{oliveira2025storyreasoningdatasetusingchainofthought,
+      title={StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation},
+      author={Daniel A. P. Oliveira and David Martins de Matos},
+      year={2025},
+      eprint={2505.10292},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2505.10292},
+}
+```