Emotionally-Aware AI Companion

Fine-tuned VideoLLaMA3 for Digital Arts Analysis

Created by Institution Art

A specialized multimodal AI model for understanding and analyzing digital artwork with emotional intelligence.

🎨 About This Model

Emotionally-Aware AI Companion is a fine-tuned version of VideoLLaMA3-7B, specifically optimized for digital arts analysis and emotional understanding. This model has been trained to recognize artistic styles, interpret visual emotions, identify artists, and provide insightful commentary on digital artwork.

🌟 Key Features

🎭 Emotional Intelligence: Understands and analyzes emotional content in artwork
🖼️ Artwork Recognition: Identifies artists, styles, and artistic movements
🎨 Digital Arts Expertise: Specialized knowledge of digital art techniques and mediums
💬 Conversational Interface: Natural language interaction about artwork
🔍 Detailed Analysis: Provides comprehensive analysis of visual elements, composition, and artistic intent

🎯 Fine-tuning Details

Base Model: VideoLLaMA3-7B (DAMO-NLP-SG)
Training Epochs: 20 epochs
Specialized Dataset: Custom artwork dataset with artist annotations and emotional labels
Components Trained:
- ✅ Vision Encoder (fine-tuned for artwork understanding)
- ✅ Multimodal Projector (enhanced visual-language alignment)
- ✅ Language Model (specialized for art terminology and analysis)

🚀 Quick Start

import torch
from transformers import AutoModelForCausalLM, AutoProcessor

model_name = "OneEyeDJ/videollama3-artwork-institution"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

# Artwork analysis example
image_path = "path/to/your/artwork.jpg"
question = "Can you analyze this artwork and identify the artist and emotional themes?"

conversation = [
    {"role": "system", "content": "You are an emotionally-aware AI art companion specialized in analyzing digital artwork and understanding artistic emotions."},
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image_path},
            {"type": "text", "text": question},
        ]
    },
]

inputs = processor(conversation=conversation, return_tensors="pt")
inputs = {k: v.cuda() if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
if "pixel_values" in inputs:
    inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

output_ids = model.generate(**inputs, max_new_tokens=256)
response = processor.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
print(response)

🎨 Use Cases

Artwork Analysis

Artist Identification: Recognize artistic styles and identify potential artists
Style Analysis: Analyze artistic movements, techniques, and influences
Composition Analysis: Understand visual elements, color theory, and composition

Emotional Understanding

Mood Detection: Identify emotional themes and feelings conveyed in artwork
Sentiment Analysis: Analyze the emotional impact and viewer response
Symbolic Interpretation: Understand symbolic elements and their emotional significance

Educational Applications

Art History: Learn about different artistic periods and movements
Technique Explanation: Understand digital art techniques and tools
Creative Inspiration: Generate ideas and artistic direction

🏛️ Institution Art

This model was developed by Institution Art, an organization dedicated to advancing the intersection of artificial intelligence and creative arts. Our mission is to create AI tools that enhance artistic understanding and creative expression.

Our Vision

To democratize art education and appreciation through AI-powered tools that make artistic knowledge accessible to everyone.

📊 Training Information

Training Duration: 20 epochs
Dataset Size: Custom artwork dataset with professional annotations
Model Size: ~16GB
Training Focus: Digital arts, emotional recognition, artist identification
Special Features: Enhanced vision encoder for artistic detail recognition

🔧 Technical Details

Architecture

Vision Encoder: Fine-tuned SigLIP for artwork understanding
Multimodal Projector: Enhanced for visual-language alignment in art context
Language Model: Qwen2.5-7B with specialized art vocabulary

Performance Optimizations

Flash Attention 2 support for efficient inference
Optimized for artwork analysis tasks
Balanced training for both technical and emotional understanding

📝 License & Usage

This model is released under the Apache 2.0 license. It builds upon the original VideoLLaMA3 work by DAMO-NLP-SG.

🙏 Acknowledgments

This work builds upon the excellent foundation provided by:

VideoLLaMA3 by DAMO-NLP-SG
Qwen2.5 by Alibaba Group
The broader open-source AI and computer vision community

Citation

If you use this model in your research or applications, please cite:

@misc{emotionally-aware-ai-companion-2025,
  title={Emotionally-Aware AI Companion: Fine-tuned VideoLLaMA3 for Digital Arts Analysis},
  author={Institution Art},
  year={2025},
  howpublished={\url{https://huggingface.co/OneEyeDJ/videollama3-artwork-institution}},
}

Original VideoLLaMA3 Citation

@article{damonlpsg2025videollama3,
  title={VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding},
  author={Boqiang Zhang, Kehan Li, Zesen Cheng, Zhiqiang Hu, Yuqian Yuan, Guanzheng Chen, Sicong Leng, Yuming Jiang, Hang Zhang, Xin Li, Peng Jin, Wenqi Zhang, Fan Wang, Lidong Bing, Deli Zhao},
  journal={arXiv preprint arXiv:2501.13106},
  year={2025},
  url = {https://arxiv.org/abs/2501.13106}
}

Emotionally-Aware AI Companion - Bridging the gap between artificial intelligence and artistic understanding 🎨✨

OneEyeDJ
/

Emotionally-Aware_AI_Companion