Wirayudhia commited on
Commit
3d3ee76
·
1 Parent(s): 77d2135
Files changed (4) hide show
  1. README.md +130 -11
  2. app.py +317 -0
  3. app_config.yaml +22 -0
  4. requirements.txt +8 -0
README.md CHANGED
@@ -1,13 +1,132 @@
1
- ---
2
- title: Gemma
3
- emoji: 😻
4
- colorFrom: purple
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: 5.34.2
8
- app_file: app.py
9
- pinned: false
10
- short_description: gemma
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # 🚀 Gemma-3 Multimodal Chat Application
2
+
3
+ A sophisticated Gradio-based chat application featuring multimodal capabilities with Google's Gemma-3 model (placeholder implementation).
4
+
5
+ ## ✨ Features
6
+
7
+ - 💬 **Interactive Chat Interface**: Persistent conversation history with context awareness
8
+ - 🖼️ **Vision Capabilities**: Upload and analyze images with AI-powered insights
9
+ - 📄 **File Processing**: Support for PDF and TXT file uploads with text extraction
10
+ - 🧠 **Contextual Responses**: Maintains conversation context for follow-up questions
11
+ - 🎨 **Modern UI**: Clean, responsive interface built with Gradio
12
+ - 🔄 **State Management**: Persistent chat history and file context across interactions
13
+
14
+ ## 🛠️ Technologies Used
15
+
16
+ - **Frontend**: Gradio 4.0+
17
+ - **AI Model**: Gemma-3 (placeholder implementation)
18
+ - **File Processing**: PyPDF2 for PDFs, PIL for images
19
+ - **Backend**: Python with Hugging Face Transformers
20
+ - **Deployment**: Hugging Face Spaces
21
+
22
+ ## 🚀 Quick Start
23
+
24
+ ### Local Development
25
+
26
+ 1. **Clone the repository**:
27
+ ```bash
28
+ git clone <repository-url>
29
+ cd gemma
30
+ ```
31
+
32
+ 2. **Install dependencies**:
33
+ ```bash
34
+ pip install -r requirements.txt
35
+ ```
36
+
37
+ 3. **Run the application**:
38
+ ```bash
39
+ python app.py
40
+ ```
41
+
42
+ 4. **Open your browser** and navigate to `http://localhost:7860`
43
+
44
+ ### Hugging Face Spaces Deployment
45
+
46
+ 1. Create a new Space on [Hugging Face Spaces](https://huggingface.co/spaces)
47
+ 2. Choose "Gradio" as the SDK
48
+ 3. Upload the files from this repository
49
+ 4. The app will automatically deploy and be accessible via your Space URL
50
+
51
+ ## 📖 How to Use
52
+
53
+ ### Basic Chat
54
+ 1. Type your message in the text input box
55
+ 2. Click "Submit" or press Enter
56
+ 3. View the AI response in the chat history
57
+
58
+ ### Image Analysis
59
+ 1. Upload an image using the image upload component
60
+ 2. Type a question about the image (e.g., "What do you see in this image?")
61
+ 3. Submit to get AI-powered image analysis
62
+
63
+ ### File Processing
64
+ 1. Upload a PDF or TXT file using the file upload component
65
+ 2. Ask questions about the file content
66
+ 3. The extracted text will be used as context for responses
67
+
68
+ ### Advanced Features
69
+ - **Persistent Context**: Previous conversations are remembered
70
+ - **File Context**: Uploaded file content persists for follow-up questions
71
+ - **Clear Chat**: Reset conversation history and uploaded files
72
+
73
+ ## 🔧 Configuration
74
+
75
+ ### Model Configuration
76
+ The application currently uses a placeholder for Gemma-3. To integrate the actual model:
77
+
78
+ 1. Replace the `gemma_3_placeholder_inference` function in `app.py`
79
+ 2. Add proper model loading and inference logic
80
+ 3. Update dependencies if needed
81
+
82
+ ### Customization
83
+ - Modify the UI theme in the `gr.Blocks` configuration
84
+ - Adjust file size limits and supported formats
85
+ - Customize the chat history display format
86
+ - Add additional file processing capabilities
87
+
88
+ ## 📁 Project Structure
89
+
90
+ ```
91
+ gemma/
92
+ ├── app.py # Main Gradio application
93
+ ├── requirements.txt # Python dependencies
94
+ ├── README.md # Project documentation
95
+ └── .gitattributes # Git configuration
96
+ ```
97
+
98
+ ## 🔮 Future Enhancements
99
+
100
+ - [ ] Integration with actual Gemma-3 model
101
+ - [ ] Support for additional file formats (DOCX, XLSX)
102
+ - [ ] Advanced image processing capabilities
103
+ - [ ] User authentication and personalized chat history
104
+ - [ ] Export chat conversations
105
+ - [ ] Multi-language support
106
+ - [ ] Voice input/output capabilities
107
+
108
+ ## 🤝 Contributing
109
+
110
+ 1. Fork the repository
111
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
112
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
113
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
114
+ 5. Open a Pull Request
115
+
116
+ ## 📄 License
117
+
118
+ This project is licensed under the MIT License - see the LICENSE file for details.
119
+
120
+ ## 🙏 Acknowledgments
121
+
122
+ - Google for the Gemma model family
123
+ - Hugging Face for the amazing ecosystem and Spaces platform
124
+ - Gradio team for the intuitive UI framework
125
+
126
+ ## 📞 Support
127
+
128
+ If you encounter any issues or have questions, please open an issue on the repository or contact the maintainers.
129
+
130
  ---
131
 
132
+ **Note**: This application currently uses a placeholder implementation for Gemma-3. Replace the placeholder functions with actual model integration when the model becomes available.
app.py ADDED
@@ -0,0 +1,317 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import os
3
+ from PIL import Image
4
+ import tempfile
5
+ import PyPDF2
6
+ import io
7
+ from typing import List, Tuple, Optional
8
+ from transformers import AutoTokenizer, AutoModelForCausalLM
9
+ import torch
10
+
11
+ # Global variables for model and tokenizer
12
+ model = None
13
+ tokenizer = None
14
+
15
+ def load_gemma_model():
16
+ """
17
+ Load Gemma model and tokenizer from Hugging Face.
18
+ """
19
+ global model, tokenizer
20
+ try:
21
+ model_name = "google/gemma-2-2b-it" # Using Gemma-2 as it's available
22
+ print(f"Loading {model_name}...")
23
+
24
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
25
+ model = AutoModelForCausalLM.from_pretrained(
26
+ model_name,
27
+ torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
28
+ device_map="auto" if torch.cuda.is_available() else None
29
+ )
30
+
31
+ # Add padding token if it doesn't exist
32
+ if tokenizer.pad_token is None:
33
+ tokenizer.pad_token = tokenizer.eos_token
34
+
35
+ print("Model loaded successfully!")
36
+ return True
37
+ except Exception as e:
38
+ print(f"Error loading model: {e}")
39
+ return False
40
+
41
+ def gemma_3_inference(prompt_text: str, pil_image: Optional[Image.Image] = None, chat_history: Optional[List[Tuple[str, str]]] = None) -> str:
42
+ """
43
+ Real Gemma model inference function.
44
+ """
45
+ global model, tokenizer
46
+
47
+ # Load model if not already loaded
48
+ if model is None or tokenizer is None:
49
+ if not load_gemma_model():
50
+ return "❌ Error: Could not load Gemma model. Please check your internet connection and try again."
51
+
52
+ try:
53
+ # Build conversation context
54
+ conversation = []
55
+
56
+ # Add chat history for context (last 3 exchanges)
57
+ if chat_history:
58
+ for user_msg, bot_msg in chat_history[-3:]:
59
+ conversation.append({"role": "user", "content": user_msg})
60
+ conversation.append({"role": "assistant", "content": bot_msg})
61
+
62
+ # Handle image input (note: Gemma-2 doesn't have native vision, so we'll describe the limitation)
63
+ if pil_image:
64
+ prompt_text = f"[Image uploaded - Note: This model doesn't have vision capabilities, but I can help with text-based questions about images] {prompt_text}"
65
+
66
+ # Add current user message
67
+ conversation.append({"role": "user", "content": prompt_text})
68
+
69
+ # Format conversation for Gemma
70
+ formatted_prompt = tokenizer.apply_chat_template(
71
+ conversation,
72
+ tokenize=False,
73
+ add_generation_prompt=True
74
+ )
75
+
76
+ # Tokenize input
77
+ inputs = tokenizer(
78
+ formatted_prompt,
79
+ return_tensors="pt",
80
+ truncation=True,
81
+ max_length=2048
82
+ )
83
+
84
+ # Move to same device as model
85
+ if torch.cuda.is_available() and model.device.type == 'cuda':
86
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
87
+
88
+ # Generate response
89
+ with torch.no_grad():
90
+ outputs = model.generate(
91
+ **inputs,
92
+ max_new_tokens=512,
93
+ temperature=0.7,
94
+ do_sample=True,
95
+ pad_token_id=tokenizer.eos_token_id,
96
+ eos_token_id=tokenizer.eos_token_id
97
+ )
98
+
99
+ # Decode response
100
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
101
+
102
+ # Extract only the new generated part
103
+ response = response[len(formatted_prompt):].strip()
104
+
105
+ return f"🤖 Gemma Response: {response}"
106
+
107
+ except Exception as e:
108
+ return f"❌ Error generating response: {str(e)}. Please try again."
109
+
110
+ def extract_text_from_pdf(file_path: str) -> str:
111
+ """
112
+ Extract text from PDF file using PyPDF2.
113
+ """
114
+ try:
115
+ with open(file_path, 'rb') as file:
116
+ pdf_reader = PyPDF2.PdfReader(file)
117
+ text = ""
118
+ for page in pdf_reader.pages:
119
+ text += page.extract_text() + "\n"
120
+ return text.strip()
121
+ except Exception as e:
122
+ return f"Error reading PDF: {str(e)}"
123
+
124
+ def extract_text_from_txt(file_path: str) -> str:
125
+ """
126
+ Extract text from TXT file.
127
+ """
128
+ try:
129
+ with open(file_path, 'r', encoding='utf-8') as file:
130
+ return file.read().strip()
131
+ except Exception as e:
132
+ try:
133
+ # Try with different encoding if UTF-8 fails
134
+ with open(file_path, 'r', encoding='latin-1') as file:
135
+ return file.read().strip()
136
+ except Exception as e2:
137
+ return f"Error reading text file: {str(e2)}"
138
+
139
+ def process_file_input(file_input) -> str:
140
+ """
141
+ Process uploaded file and extract text content.
142
+ """
143
+ if file_input is None:
144
+ return ""
145
+
146
+ file_path = file_input.name
147
+ file_extension = os.path.splitext(file_path)[1].lower()
148
+
149
+ if file_extension == '.pdf':
150
+ extracted_text = extract_text_from_pdf(file_path)
151
+ return f"📄 Content from PDF ({os.path.basename(file_path)}):\n{extracted_text[:1000]}{'...' if len(extracted_text) > 1000 else ''}"
152
+ elif file_extension == '.txt':
153
+ extracted_text = extract_text_from_txt(file_path)
154
+ return f"📝 Content from text file ({os.path.basename(file_path)}):\n{extracted_text[:1000]}{'...' if len(extracted_text) > 1000 else ''}"
155
+ else:
156
+ return f"❌ Unsupported file type: {file_extension}. Please upload PDF or TXT files only."
157
+
158
+ def process_input(user_text: str, image_input: Optional[Image.Image], file_input, chat_history: List[Tuple[str, str]], file_context: str) -> Tuple[List[Tuple[str, str]], str, None, None, str]:
159
+ """
160
+ Main function to process user input and generate response.
161
+ Returns: (updated_chat_history, cleared_text, cleared_image, cleared_file, updated_file_context)
162
+ """
163
+ if not user_text.strip() and image_input is None and file_input is None:
164
+ return chat_history, "", None, None, file_context
165
+
166
+ # Process file input if provided
167
+ current_file_context = ""
168
+ if file_input is not None:
169
+ current_file_context = process_file_input(file_input)
170
+
171
+ # Combine file context with user text
172
+ combined_prompt = ""
173
+ if current_file_context:
174
+ combined_prompt = f"{current_file_context}\n\nUser Query: {user_text}"
175
+ # Update persistent file context
176
+ file_context = current_file_context
177
+ elif file_context and user_text.strip(): # Use previous file context if available
178
+ combined_prompt = f"{file_context}\n\nUser Query: {user_text}"
179
+ else:
180
+ combined_prompt = user_text
181
+
182
+ # Generate response using Gemma model
183
+ if image_input is not None:
184
+ # Handle image + text input
185
+ bot_response = gemma_3_inference(combined_prompt, pil_image=image_input, chat_history=chat_history)
186
+ user_display = f"{user_text} [Image uploaded]"
187
+ else:
188
+ # Handle text-only input (potentially with file context)
189
+ bot_response = gemma_3_inference(combined_prompt, chat_history=chat_history)
190
+ if current_file_context:
191
+ user_display = f"{user_text} [File: {os.path.basename(file_input.name) if file_input else 'Unknown'}]"
192
+ else:
193
+ user_display = user_text
194
+
195
+ # Update chat history
196
+ chat_history.append((user_display, bot_response))
197
+
198
+ # Return updated history and clear inputs
199
+ return chat_history, "", None, None, file_context if current_file_context else file_context
200
+
201
+ def clear_chat(chat_history: List[Tuple[str, str]], file_context: str) -> Tuple[List[Tuple[str, str]], str, None, None, str]:
202
+ """
203
+ Clear chat history and reset all inputs.
204
+ """
205
+ return [], "", None, None, ""
206
+
207
+ # Create Gradio interface
208
+ with gr.Blocks(title="Gemma-3 Multimodal Chat", theme=gr.themes.Soft()) as demo:
209
+ gr.Markdown(
210
+ """
211
+ # 🚀 Gemma-2 Multimodal Chat Application
212
+
213
+ Welcome to the sophisticated Gemma-2 chat interface powered by Google's Gemma-2-2B-IT model! This application supports:
214
+ - 💬 **Text conversations** with persistent chat history
215
+ - 🖼️ **File processing** - upload PDF or TXT files for context
216
+ - 📄 **Document analysis** - extract and analyze text from uploaded files
217
+ - 🧠 **Contextual responses** - the model remembers your conversation
218
+
219
+ **How to use:**
220
+ 1. Type your message in the text box
221
+ 2. Optionally upload a file (PDF/TXT) for document analysis
222
+ 3. Click Submit or press Enter
223
+ 4. Use Clear to reset the conversation
224
+
225
+ *Note: This application uses the real Gemma-2-2B-IT model from Hugging Face. First message may take longer as the model loads.*
226
+ """
227
+ )
228
+
229
+ # State variables
230
+ chat_history_state = gr.State([])
231
+ file_context_state = gr.State("")
232
+
233
+ with gr.Row():
234
+ with gr.Column(scale=2):
235
+ # Chat interface
236
+ chatbot = gr.Chatbot(
237
+ label="Chat History",
238
+ height=400,
239
+ show_label=True,
240
+ container=True,
241
+ bubble_full_width=False
242
+ )
243
+
244
+ # Input area
245
+ with gr.Row():
246
+ user_input = gr.Textbox(
247
+ label="Your message",
248
+ placeholder="Type your message here...",
249
+ lines=2,
250
+ scale=4
251
+ )
252
+ submit_btn = gr.Button("Submit", variant="primary", scale=1)
253
+
254
+ # Clear button
255
+ clear_btn = gr.Button("🗑️ Clear Chat", variant="secondary")
256
+
257
+ with gr.Column(scale=1):
258
+ # File upload area
259
+ gr.Markdown("### 📎 Upload Content")
260
+
261
+ image_input = gr.Image(
262
+ label="Upload Image (for vision tasks)",
263
+ type="pil",
264
+ height=200
265
+ )
266
+
267
+ file_input = gr.File(
268
+ label="Upload File (PDF or TXT)",
269
+ file_types=[".pdf", ".txt"],
270
+ height=100
271
+ )
272
+
273
+ gr.Markdown(
274
+ """
275
+ **Tips:**
276
+ - Upload either an image OR a file per message
277
+ - PDF files will have their text extracted
278
+ - File content persists as context for follow-up questions
279
+ - Images are processed with vision capabilities
280
+ """
281
+ )
282
+
283
+ # Event handlers
284
+ submit_btn.click(
285
+ fn=process_input,
286
+ inputs=[user_input, image_input, file_input, chat_history_state, file_context_state],
287
+ outputs=[chatbot, user_input, image_input, file_input, file_context_state]
288
+ ).then(
289
+ lambda: gr.update(value=chat_history_state.value),
290
+ outputs=[chat_history_state]
291
+ )
292
+
293
+ user_input.submit(
294
+ fn=process_input,
295
+ inputs=[user_input, image_input, file_input, chat_history_state, file_context_state],
296
+ outputs=[chatbot, user_input, image_input, file_input, file_context_state]
297
+ ).then(
298
+ lambda: gr.update(value=chat_history_state.value),
299
+ outputs=[chat_history_state]
300
+ )
301
+
302
+ clear_btn.click(
303
+ fn=clear_chat,
304
+ inputs=[chat_history_state, file_context_state],
305
+ outputs=[chatbot, user_input, image_input, file_input, file_context_state]
306
+ ).then(
307
+ lambda: gr.update(value=[]),
308
+ outputs=[chat_history_state]
309
+ )
310
+
311
+ if __name__ == "__main__":
312
+ demo.launch(
313
+ server_name="0.0.0.0",
314
+ server_port=7860,
315
+ share=True,
316
+ show_error=True
317
+ )
app_config.yaml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ title: Gemma-3 Multimodal Chat
2
+ emoji: 🚀
3
+ colorFrom: blue
4
+ colorTo: purple
5
+ sdk: gradio
6
+ sdk_version: 4.0.0
7
+ app_file: app.py
8
+ pinned: false
9
+ license: mit
10
+ short_description: A sophisticated multimodal chat application with vision and file processing capabilities
11
+ tags:
12
+ - chatbot
13
+ - multimodal
14
+ - vision
15
+ - file-processing
16
+ - gemma
17
+ - gradio
18
+ python_version: 3.9
19
+ models:
20
+ - google/gemma-2b-it
21
+ hardware: cpu-basic
22
+ suggested_storage: small
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ Pillow>=9.0.0
3
+ PyPDF2>=3.0.0
4
+ transformers>=4.30.0
5
+ huggingface-hub>=0.16.0
6
+ torch>=2.0.0
7
+ numpy>=1.21.0
8
+ requests>=2.28.0