Spaces:

walker11
/

RawiKids

Sleeping

App Files Files Community

walker11 commited on 10 days ago

Commit

1d96975

verified ·

1 Parent(s): 9469c91

Upload 8 files

Browse files

Files changed (8) hide show

README.md +115 -13
app.py +80 -0
evaluate_model.py +151 -0
huggingface-metadata.json +12 -0
requirements.txt +8 -0
run_server.py +51 -0
story_generator.py +142 -0
test_server.py +101 -0

README.md CHANGED Viewed

@@ -1,13 +1,115 @@
----
-title: RawiKids
-emoji: 💻
-colorFrom: pink
-colorTo: green
-sdk: gradio
-sdk_version: 5.34.1
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Rawi Kids Vision-Language Model
+A vision-language model that generates engaging short stories for children (ages 6-12) based on images. This project is designed to be integrated with the Rawi Kids Flutter application and uses the DeepSeek Vision API.
+## Features
+- Generate age-appropriate stories from images
+- Support for different age groups (6-8 and 9-12 years)
+- Optional themes to influence story generation (adventure, fantasy, animals, etc.)
+- Gradio web interface for easy testing
+- Integration with Flutter app
+- Uses DeepSeek Vision API (no local model needed)
+## Demo
+This model can be tested using the Gradio web interface included in the project.
+## Setup and Installation
+### Prerequisites
+- Python 3.8 or higher
+- pip (Python package manager)
+- Virtual environment (recommended)
+- DeepSeek API Key
+### Getting a DeepSeek API Key
+1. Visit the [DeepSeek website](https://www.deepseek.com/) and sign up for an account
+2. Navigate to your API settings page to obtain an API key
+3. Copy the API key for use in the next steps
+### Installation
+1. Clone this repository
+   ```
+   git clone <repository-url>
+   cd rawi-kids-vlm
+   ```
+2. Create and activate a virtual environment
+   ```
+   python -m venv venv
+   # On Windows
+   venv\Scripts\activate
+   # On macOS/Linux
+   source venv/bin/activate
+   ```
+3. Install the required packages
+   ```
+   pip install -r requirements.txt
+   ```
+4. Create a `.env` file and add your DeepSeek API key
+   ```
+   echo "DEEPSEEK_API_KEY=your_api_key_here" > .env
+   ```
+5. Run the Gradio app
+   ```
+   python app.py
+   ```
+The interface will be available at http://localhost:7860
+## Using the Interface
+1. Upload an image using the file uploader
+2. Select the target age group (6-8 or 9-12 years)
+3. Choose a story theme (optional)
+4. Click "Generate Story"
+5. The model will analyze the image and generate an age-appropriate story
+## Flutter Integration
+See the `test_server.py` file for examples of how to integrate with your Flutter app. You'll need to implement an API client in your Flutter app that sends images to this service and receives the generated stories.
+## Testing
+You can test the model using the provided test script:
+```
+python test_server.py --url http://localhost:7860 --image path/to/test_image.jpg
+```
+## Evaluation
+For more detailed evaluation of the model's performance, use the evaluation script:
+```
+python evaluate_model.py --images test_images --output evaluation_results.json
+```
+## Deploying to Hugging Face Spaces
+This project is designed to work with Hugging Face Spaces, which provides free hosting for machine learning demos.
+1. Create a new Space on Hugging Face
+2. Select "Gradio" as the SDK
+3. Push this repository to the Space
+4. Add your DeepSeek API key as a secret in the Space configuration
+5. The app will automatically deploy and be available at your Space URL
+## Important Note on API Usage
+The DeepSeek API is a commercial service and may have usage limits or costs associated with it. Make sure to check their pricing and terms of service to understand any potential costs for your usage level.
+## License
+[Add your license information here]
+## Contact
+[Add your contact information here]

app.py ADDED Viewed

	@@ -0,0 +1,80 @@

+import gradio as gr
+from story_generator import StoryGenerator
+import tempfile
+import os
+# Initialize the story generator
+story_generator = StoryGenerator()
+# Define the available themes
+THEMES = ["None", "Adventure", "Fantasy", "Animals", "Friendship", "Science"]
+def generate_story(image, age_group, theme):
+    """
+    Generate a story from an image using the story generator
+    Args:
+        image: The uploaded image
+        age_group: The target age group
+        theme: The story theme
+    Returns:
+        str: The generated story
+    """
+    # Save the image to a temporary file
+    with tempfile.NamedTemporaryFile(delete=False, suffix='.jpg') as temp:
+        image.save(temp.name)
+        temp_filename = temp.name
+    try:
+        # Process the theme (convert "None" to None)
+        processed_theme = None if theme == "None" else theme.lower()
+        # Open the image file and generate the story
+        with open(temp_filename, 'rb') as img_file:
+            story = story_generator.generate(img_file, age_group, processed_theme)
+        return story
+    except Exception as e:
+        return f"Error generating story: {str(e)}"
+    finally:
+        # Clean up the temporary file
+        if os.path.exists(temp_filename):
+            os.unlink(temp_filename)
+# Create the Gradio interface
+with gr.Blocks(title="Rawi Kids Story Generator") as demo:
+    gr.Markdown("# Rawi Kids Story Generator")
+    gr.Markdown("Upload an image and get a story for kids!")
+    with gr.Row():
+        with gr.Column(scale=1):
+            # Input components
+            image_input = gr.Image(type="pil", label="Upload Image")
+            age_group = gr.Radio(choices=["6-8", "9-12"], value="6-8", label="Age Group (years)")
+            theme = gr.Dropdown(choices=THEMES, value="None", label="Story Theme")
+            submit_btn = gr.Button("Generate Story", variant="primary")
+        with gr.Column(scale=1):
+            # Output component
+            story_output = gr.Textbox(label="Generated Story", lines=10)
+    # Set up the button click event
+    submit_btn.click(
+        fn=generate_story,
+        inputs=[image_input, age_group, theme],
+        outputs=story_output
+    )
+    gr.Markdown("""
+    ### How it works
+    1. Upload a picture or take a photo
+    2. Select the age group (6-8 or 9-12 years)
+    3. Choose a theme for the story (optional)
+    4. Click "Generate Story"
+    5. The AI will analyze the image and create a story for kids!
+    """)
+# For Hugging Face Spaces
+demo.launch(server_name="0.0.0.0", server_port=7860)

evaluate_model.py ADDED Viewed

	@@ -0,0 +1,151 @@

+#!/usr/bin/env python
+import os
+import argparse
+from PIL import Image
+import glob
+import json
+import time
+from story_generator import StoryGenerator
+from dotenv import load_dotenv
+import sys
+# Load environment variables for API key
+load_dotenv()
+class ModelEvaluator:
+    def __init__(self, images_dir, output_file):
+        """Initialize the evaluator with paths to images and output file"""
+        self.images_dir = images_dir
+        self.output_file = output_file
+        # Check for API key
+        if not os.getenv("DEEPSEEK_API_KEY"):
+            print("ERROR: DEEPSEEK_API_KEY environment variable not found.")
+            print("Please set your DeepSeek API key using:")
+            print("  - Create a .env file with DEEPSEEK_API_KEY=your_key_here")
+            print("  - Or set the environment variable directly")
+            sys.exit(1)
+        # Initialize story generator
+        self.generator = StoryGenerator()
+        # Create output directory if it doesn't exist
+        os.makedirs(os.path.dirname(os.path.abspath(output_file)), exist_ok=True)
+    def evaluate_all(self, limit=None):
+        """Evaluate the model on all images in the directory"""
+        image_files = glob.glob(os.path.join(self.images_dir, "*.jpg")) + \
+                     glob.glob(os.path.join(self.images_dir, "*.jpeg")) + \
+                     glob.glob(os.path.join(self.images_dir, "*.png"))
+        # Limit the number of images if specified (to control API usage)
+        if limit and limit > 0:
+            image_files = image_files[:limit]
+        print(f"Found {len(image_files)} images for evaluation")
+        print(f"NOTE: Using DeepSeek API - API call charges may apply")
+        results = []
+        for idx, img_path in enumerate(image_files):
+            print(f"\nProcessing image {idx + 1}/{len(image_files)}: {os.path.basename(img_path)}")
+            # Test with different age groups and themes (use fewer combinations to limit API calls)
+            for age_group in ["6-8", "9-12"]:
+                # Limit theme testing to save on API calls
+                for theme in [None, "adventure"]:
+                    theme_str = theme if theme else "none"
+                    print(f"  Generating story for age group: {age_group}, theme: {theme_str}")
+                    try:
+                        start_time = time.time()
+                        with open(img_path, 'rb') as img_file:
+                            story = self.generator.generate(img_file, age_group, theme)
+                        generation_time = time.time() - start_time
+                        # Record the result
+                        result = {
+                            "image_path": img_path,
+                            "age_group": age_group,
+                            "theme": theme_str,
+                            "generation_time_seconds": round(generation_time, 2),
+                            "story_length_chars": len(story),
+                            "story_words": len(story.split()),
+                            "story": story
+                        }
+                        results.append(result)
+                        # Print summary
+                        print(f"    Time: {result['generation_time_seconds']:.2f}s, "
+                              f"Words: {result['story_words']}")
+                    except Exception as e:
+                        print(f"    Error generating story: {str(e)}")
+                        results.append({
+                            "image_path": img_path,
+                            "age_group": age_group,
+                            "theme": theme_str,
+                            "error": str(e)
+                        })
+        # Save all results to file
+        with open(self.output_file, 'w') as f:
+            json.dump(results, f, indent=2)
+        print(f"\nEvaluation complete. Results saved to {self.output_file}")
+        return results
+    def print_summary(self, results):
+        """Print a summary of the evaluation results"""
+        if not results:
+            print("No results to summarize")
+            return
+        successful_generations = [r for r in results if "error" not in r]
+        error_generations = [r for r in results if "error" in r]
+        print(f"\nEvaluation Summary:")
+        print(f"  Total images processed: {len(set([r['image_path'] for r in results]))}")
+        print(f"  Total story generations: {len(results)}")
+        print(f"  Successful generations: {len(successful_generations)}")
+        print(f"  Failed generations: {len(error_generations)}")
+        if successful_generations:
+            avg_time = sum([r["generation_time_seconds"] for r in successful_generations]) / len(successful_generations)
+            avg_words = sum([r["story_words"] for r in successful_generations]) / len(successful_generations)
+            print(f"  Average generation time: {avg_time:.2f} seconds")
+            print(f"  Average story length: {avg_words:.1f} words")
+        # Analysis by age group
+        age_groups = ["6-8", "9-12"]
+        for age_group in age_groups:
+            age_results = [r for r in successful_generations if r["age_group"] == age_group]
+            if age_results:
+                avg_words = sum([r["story_words"] for r in age_results]) / len(age_results)
+                print(f"  Age group {age_group}: {len(age_results)} stories, avg {avg_words:.1f} words")
+        # Analysis by theme
+        themes = ["none", "adventure", "fantasy", "animals"]
+        for theme in themes:
+            theme_results = [r for r in successful_generations if r["theme"] == theme]
+            if theme_results:
+                avg_words = sum([r["story_words"] for r in theme_results]) / len(theme_results)
+                print(f"  Theme {theme}: {len(theme_results)} stories, avg {avg_words:.1f} words")
+def main():
+    parser = argparse.ArgumentParser(description='Evaluate the story generation model')
+    parser.add_argument('--images', default='test_images', help='Directory containing test images')
+    parser.add_argument('--output', default='evaluation_results.json', help='Output file for results')
+    parser.add_argument('--limit', type=int, default=2, help='Limit the number of images to process (to control API usage)')
+    args = parser.parse_args()
+    evaluator = ModelEvaluator(args.images, args.output)
+    results = evaluator.evaluate_all(limit=args.limit)
+    evaluator.print_summary(results)
+if __name__ == '__main__':
+    main()

huggingface-metadata.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+    "title": "Rawi Kids Story Generator",
+    "emoji": "📚",
+    "colorFrom": "blue",
+    "colorTo": "purple",
+    "sdk": "gradio",
+    "sdk_version": "3.50.2",
+    "python_version": "3.10",
+    "app_file": "app.py",
+    "pinned": false,
+    "license": "mit"
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+flask==2.0.1
+pillow==9.5.0
+python-dotenv==1.0.0
+flask-cors==3.0.10
+gunicorn==20.1.0
+numpy==1.24.3
+requests==2.31.0
+gradio==3.50.2

run_server.py ADDED Viewed

	@@ -0,0 +1,51 @@

+#!/usr/bin/env python
+import os
+import sys
+import argparse
+import subprocess
+from dotenv import load_dotenv
+def main():
+    parser = argparse.ArgumentParser(description='Run the Vision Language Model server')
+    parser.add_argument('--port', type=int, help='Port to run the server on')
+    parser.add_argument('--debug', action='store_true', help='Run in debug mode')
+    parser.add_argument('--host', default='0.0.0.0', help='Host to run the server on')
+    parser.add_argument('--workers', type=int, default=1, help='Number of Gunicorn workers')
+    parser.add_argument('--use-gunicorn', action='store_true', help='Use Gunicorn for production')
+    args = parser.parse_args()
+    # Load environment variables
+    load_dotenv()
+    # Set environment variables from command line arguments
+    if args.port:
+        os.environ['PORT'] = str(args.port)
+    if args.debug:
+        os.environ['DEBUG'] = 'True'
+    port = int(os.environ.get('PORT', 5000))
+    debug = os.environ.get('DEBUG', 'False').lower() == 'true'
+    print(f"Starting server on {args.host}:{port}")
+    print(f"Debug mode: {debug}")
+    if args.use_gunicorn:
+        # Use Gunicorn for production
+        cmd = [
+            'gunicorn',
+            '--bind', f"{args.host}:{port}",
+            '--workers', str(args.workers),
+            'app:app'
+        ]
+        print(f"Running with gunicorn: {' '.join(cmd)}")
+        subprocess.call(cmd)
+    else:
+        # Use Flask's built-in server
+        from app import app
+        app.run(host=args.host, port=port, debug=debug)
+if __name__ == '__main__':
+    main()

story_generator.py ADDED Viewed

	@@ -0,0 +1,142 @@

+import os
+import requests
+import base64
+from PIL import Image
+from io import BytesIO
+import json
+from dotenv import load_dotenv
+# Load environment variables for API keys
+load_dotenv()
+class StoryGenerator:
+    def __init__(self):
+        """Initialize the story generator with DeepSeek API configuration"""
+        self.api_key = os.getenv("DEEPSEEK_API_KEY")
+        if not self.api_key:
+            print("Warning: DEEPSEEK_API_KEY not found in environment variables. Please set it.")
+        self.api_url = "https://api.deepseek.com/v1/chat/completions"
+        # Story templates for different age groups
+        self.templates = {
+            "6-8": "Write a simple and fun short story for a 6-8 year old child about this image: ",
+            "9-12": "Write an engaging short story with a simple moral for a 9-12 year old about this image: "
+        }
+        # Themes and associated vocabulary to enhance stories
+        self.themes = {
+            "adventure": ["journey", "discover", "explore", "treasure", "map"],
+            "fantasy": ["magic", "dragon", "wizard", "fairy", "kingdom"],
+            "animals": ["forest", "pets", "wildlife", "jungle", "farm"],
+            "friendship": ["friends", "sharing", "helping", "together", "team"],
+            "science": ["experiment", "invention", "discovery", "robot", "space"]
+        }
+    def generate(self, image_file, age_group="6-12", theme=None):
+        """
+        Generate a story based on the input image using DeepSeek API
+        Args:
+            image_file: The uploaded image file
+            age_group: Age group target ("6-8" or "9-12")
+            theme: Optional theme to influence the story
+        Returns:
+            str: A generated story suitable for the specified age group
+        """
+        try:
+            # Process the image
+            image = Image.open(image_file).convert('RGB')
+            # Resize image if too large
+            max_size = 1024
+            if max(image.size) > max_size:
+                ratio = max_size / max(image.size)
+                new_size = (int(image.size[0] * ratio), int(image.size[1] * ratio))
+                image = image.resize(new_size, Image.LANCZOS)
+            # Convert image to base64
+            buffered = BytesIO()
+            image.save(buffered, format="JPEG", quality=85)
+            img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
+            # Determine the template based on age group
+            template = self.templates.get(age_group, self.templates["6-8"])
+            # Enhance the prompt with theme if provided
+            if theme and theme in self.themes:
+                theme_words = ", ".join(self.themes[theme][:3])  # Use first 3 theme words
+                prompt = f"{template} Please include elements of {theme} like {theme_words}."
+            else:
+                prompt = template
+            # Create the API payload
+            payload = {
+                "model": "deepseek-vision",
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": [
+                            {
+                                "type": "text",
+                                "text": prompt
+                            },
+                            {
+                                "type": "image_url",
+                                "image_url": {
+                                    "url": f"data:image/jpeg;base64,{img_base64}"
+                                }
+                            }
+                        ]
+                    }
+                ],
+                "max_tokens": 1000,
+                "temperature": 0.7
+            }
+            # Set the headers with authorization
+            headers = {
+                "Content-Type": "application/json",
+                "Authorization": f"Bearer {self.api_key}"
+            }
+            # Make the API request
+            response = requests.post(self.api_url, headers=headers, json=payload)
+            response.raise_for_status()
+            # Parse the API response
+            result = response.json()
+            story = result.get("choices", [{}])[0].get("message", {}).get("content", "")
+            if not story:
+                raise ValueError("No story was generated from the API")
+            # Format the story
+            story = self._format_story(story, age_group)
+            return story
+        except Exception as e:
+            print(f"Error generating story: {str(e)}")
+            raise e
+    def _format_story(self, story, age_group):
+        """Format the story based on age group"""
+        # Add paragraph breaks every 2-3 sentences
+        sentences = story.split('.')
+        formatted_text = ""
+        for i, sentence in enumerate(sentences):
+            if sentence.strip():  # Skip empty sentences
+                formatted_text += sentence.strip() + "."
+                if i % 3 == 2:  # Add paragraph break every 3 sentences
+                    formatted_text += "\n\n"
+        # For younger kids, keep it shorter by taking just the first few paragraphs
+        if age_group == "6-8":
+            paragraphs = formatted_text.split("\n\n")
+            if len(paragraphs) > 3:
+                formatted_text = "\n\n".join(paragraphs[:3])
+        return formatted_text

test_server.py ADDED Viewed

	@@ -0,0 +1,101 @@

+import requests
+import os
+import argparse
+from PIL import Image
+import json
+import base64
+from io import BytesIO
+def test_health_endpoint(base_url):
+    """Test if the Gradio server is running"""
+    try:
+        response = requests.get(f"{base_url}")
+        print(f"Server health check: {response.status_code}")
+        assert response.status_code == 200
+        return response.status_code == 200
+    except Exception as e:
+        print(f"Error connecting to server: {str(e)}")
+        return False
+def test_story_generation(base_url, image_path, age_group="6-8", theme="adventure"):
+    """Test story generation with an image using the Gradio API"""
+    if not os.path.exists(image_path):
+        print(f"Error: Image file not found at {image_path}")
+        return False
+    # Ensure the image can be opened
+    try:
+        img = Image.open(image_path)
+        img_format = img.format if img.format else "JPEG"
+        img_buffer = BytesIO()
+        img.save(img_buffer, format=img_format)
+        img_bytes = img_buffer.getvalue()
+        img_base64 = base64.b64encode(img_bytes).decode('utf-8')
+    except Exception as e:
+        print(f"Error processing image: {str(e)}")
+        return False
+    # Prepare the API request for Gradio
+    url = f"{base_url}/api/predict"
+    # Convert theme to the correct format for Gradio
+    if theme.lower() == "none":
+        theme = "None"
+    else:
+        theme = theme.capitalize()
+    # Build the payload
+    payload = {
+        "data": [
+            f"data:image/{img_format.lower()};base64,{img_base64}",
+            age_group,
+            theme
+        ]
+    }
+    print(f"Sending request to {url}...")
+    print(f"Age group: {age_group}")
+    print(f"Theme: {theme}")
+    try:
+        response = requests.post(url, json=payload)
+        print(f"Status code: {response.status_code}")
+        if response.status_code == 200:
+            result = response.json()
+            story = result.get('data', [''])[0]
+            print("\nGenerated Story:")
+            print("=" * 50)
+            print(story)
+            print("=" * 50)
+            return True
+        else:
+            print(f"Error: {response.text}")
+            return False
+    except Exception as e:
+        print(f"Error during request: {str(e)}")
+        return False
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Test the story generation server')
+    parser.add_argument('--url', default='http://localhost:7860', help='Base URL of the Gradio server')
+    parser.add_argument('--image', required=True, help='Path to the test image')
+    parser.add_argument('--age', default='6-8', choices=['6-8', '9-12'], help='Age group target')
+    parser.add_argument('--theme', default='adventure',
+                        choices=['none', 'adventure', 'fantasy', 'animals', 'friendship', 'science'],
+                        help='Story theme')
+    args = parser.parse_args()
+    print(f"Testing Gradio server at {args.url}")
+    if test_health_endpoint(args.url):
+        print("\nServer is running!")
+        print("\nTesting story generation...")
+        if test_story_generation(args.url, args.image, args.age, args.theme):
+            print("\nStory generation test passed!")
+        else:
+            print("\nStory generation test failed!")
+    else:
+        print("\nServer health check failed, server may not be running correctly.")