walker11 commited on
Commit
d061b7b
·
verified ·
1 Parent(s): b8a9be4

Upload 4 files

Browse files
Files changed (4) hide show
  1. Dockerfile +18 -0
  2. README.md +142 -11
  3. app.py +347 -0
  4. requirements.txt +4 -0
Dockerfile ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.9-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Copy requirements first for better caching
6
+ COPY requirements.txt .
7
+
8
+ # Install dependencies
9
+ RUN pip install --no-cache-dir -r requirements.txt
10
+
11
+ # Copy application code
12
+ COPY . .
13
+
14
+ # Expose port
15
+ EXPOSE 7860
16
+
17
+ # Run the application
18
+ CMD ["gunicorn", "--bind", "0.0.0.0:7860", "--workers", "2", "--timeout", "120", "app:app"]
README.md CHANGED
@@ -1,11 +1,142 @@
1
- ---
2
- title: RawiPostReview
3
- emoji: 🚀
4
- colorFrom: green
5
- colorTo: yellow
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- ---
10
-
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Arabic Story Content Moderator
2
+
3
+ An AI-powered content moderation service for Arabic short stories that checks for cultural violations and inappropriate content using the Deepseek API.
4
+
5
+ ## 🌟 Features
6
+
7
+ - **Cultural Sensitivity**: Checks stories against Arabic and Islamic cultural norms
8
+ - **Content Safety**: Detects inappropriate sexual content, excessive violence, and profanity
9
+ - **Real-time Moderation**: Fast API response for instant content validation
10
+ - **Batch Processing**: Support for moderating multiple stories at once
11
+ - **Arabic Language Support**: Specialized for Arabic text processing
12
+
13
+ ## 🚀 API Endpoints
14
+
15
+ ### POST `/moderate`
16
+ Moderate a single Arabic story.
17
+
18
+ **Request:**
19
+ ```json
20
+ {
21
+ "story_content": "نص القصة العربية هنا"
22
+ }
23
+ ```
24
+
25
+ **Response:**
26
+ ```json
27
+ {
28
+ "approved": true,
29
+ "response": "true",
30
+ "timestamp": "2024-01-15T10:30:00"
31
+ }
32
+ ```
33
+
34
+ ### POST `/moderate/batch`
35
+ Moderate multiple stories at once.
36
+
37
+ **Request:**
38
+ ```json
39
+ {
40
+ "stories": ["قصة أولى", "قصة ثانية", "قصة ثالثة"]
41
+ }
42
+ ```
43
+
44
+ ### GET `/health`
45
+ Check service health status.
46
+
47
+ ## 📋 Response Format
48
+
49
+ The API returns consistent responses:
50
+
51
+ - `approved`: Boolean indicating if content is approved
52
+ - `response`: String value "true" (approved) or "no" (rejected)
53
+ - `timestamp`: ISO timestamp of the moderation
54
+ - `reason`: Description if content is rejected
55
+
56
+ ## 🔧 Moderation Criteria
57
+
58
+ ### 1. Cultural and Religious Content
59
+ - No mockery of Islamic religion or Arabic traditions
60
+ - Respect for religious and social symbols
61
+ - Adherence to Islamic values
62
+
63
+ ### 2. Sexual Content and Violence
64
+ - No explicit sexual content or suggestive material
65
+ - No excessive or graphic violence
66
+ - No profanity or inappropriate language
67
+
68
+ ### 3. Sensitive Political Content
69
+ - No sectarian or ethnic incitement
70
+ - Avoidance of controversial political topics
71
+
72
+ ### 4. Social Values
73
+ - Respect for family and community values
74
+ - No promotion of socially destructive behaviors
75
+
76
+ ## 🛠️ Integration Example
77
+
78
+ ### cURL Example
79
+ ```bash
80
+ curl -X POST "https://your-huggingface-space-url/moderate" \
81
+ -H "Content-Type: application/json" \
82
+ -d '{"story_content": "قصة قصيرة عن الصداقة والوفاء"}'
83
+ ```
84
+
85
+ ### Python Example
86
+ ```python
87
+ import requests
88
+
89
+ url = "https://your-huggingface-space-url/moderate"
90
+ data = {
91
+ "story_content": "نص القصة العربية"
92
+ }
93
+
94
+ response = requests.post(url, json=data)
95
+ result = response.json()
96
+
97
+ if result["approved"]:
98
+ print("Story approved for posting")
99
+ else:
100
+ print("Story violates community guidelines")
101
+ ```
102
+
103
+ ## 📝 Setup Instructions
104
+
105
+ ### Environment Variables
106
+ Set your Deepseek API key as an environment variable:
107
+ ```
108
+ DEEPSEEK_API_KEY=your_deepseek_api_key_here
109
+ ```
110
+
111
+ ### Local Testing
112
+ ```bash
113
+ pip install -r requirements.txt
114
+ export DEEPSEEK_API_KEY=your_api_key
115
+ python app.py
116
+ ```
117
+
118
+ ## 🔒 Privacy and Security
119
+
120
+ - Stories are processed in real-time and not stored
121
+ - API communications are encrypted
122
+ - No personal data is retained
123
+ - Compliant with data protection standards
124
+
125
+ ## 📈 Performance
126
+
127
+ - Average response time: < 2 seconds
128
+ - Supports concurrent requests
129
+ - Optimized for Arabic text processing
130
+ - Scalable architecture
131
+
132
+ ## 🤝 Integration with .NET Backend
133
+
134
+ This service is designed to integrate seamlessly with your .NET story posting API. When a user attempts to post a story, send it to this moderation service first.
135
+
136
+ ## 📞 Support
137
+
138
+ For technical support or questions about the moderation criteria, please refer to the documentation or contact the development team.
139
+
140
+ ---
141
+
142
+ **Made with ❤️ for the Arabic literary community**
app.py ADDED
@@ -0,0 +1,347 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import logging
4
+ from typing import Dict, Any, List
5
+ import requests
6
+ from datetime import datetime
7
+ import re
8
+ from flask import Flask, request, jsonify
9
+
10
+ # Configure logging
11
+ logging.basicConfig(level=logging.INFO)
12
+ logger = logging.getLogger(__name__)
13
+
14
+ class ArabicContentModerator:
15
+ """
16
+ Arabic Story Content Moderation Model using Deepseek API
17
+ Checks for cultural violations and inappropriate content
18
+ """
19
+
20
+ def __init__(self, deepseek_api_key: str = None):
21
+ """
22
+ Initialize the content moderator
23
+
24
+ Args:
25
+ deepseek_api_key: Deepseek API key
26
+ """
27
+ self.api_key = deepseek_api_key or os.getenv('DEEPSEEK_API_KEY')
28
+ if not self.api_key:
29
+ raise ValueError("Deepseek API key is required")
30
+
31
+ self.api_url = "https://api.deepseek.com/chat/completions"
32
+ self.headers = {
33
+ "Authorization": f"Bearer {self.api_key}",
34
+ "Content-Type": "application/json"
35
+ }
36
+
37
+ # Cultural and content guidelines for Arabic stories
38
+ self.moderation_prompt = """
39
+ You are a content reviewer specialized in Arabic literature and culture. Your task is to review short Arabic stories to ensure they comply with Arab and Islamic cultural values and do not contain inappropriate content.
40
+
41
+ Review criteria:
42
+
43
+ 1. Cultural and religious content:
44
+ - No mockery of Islam or Arab traditions
45
+ - No disrespectful approach to topics that contradict Islamic values
46
+ - Respect for social and religious symbols
47
+
48
+ 2. Sexual content and violence:
49
+ - No explicit sexual content or overt sexual innuendos
50
+ - No excessive or graphic violence
51
+ - No profanity or obscene language
52
+
53
+ 3. Sensitive political content:
54
+ - Avoid sectarian or ethnic incitement
55
+ - No approach to controversial political topics in an offensive manner
56
+
57
+ 4. Social values:
58
+ - Respect for family values and Arab society
59
+ - No promotion of socially destructive behaviors
60
+
61
+ Response instructions:
62
+ - If the story complies with all criteria, answer with "true"
63
+ - If the story violates any of the criteria, answer with "no"
64
+ - Your answer must only be "true" or "no" without any additional text
65
+
66
+ Story to review:
67
+ """
68
+
69
+ def _call_deepseek_api(self, story_content: str) -> Dict[str, Any]:
70
+ """
71
+ Call Deepseek API for content moderation
72
+
73
+ Args:
74
+ story_content: The Arabic story content to moderate
75
+
76
+ Returns:
77
+ API response dictionary
78
+ """
79
+ try:
80
+ payload = {
81
+ "model": "deepseek-chat",
82
+ "messages": [
83
+ {
84
+ "role": "system",
85
+ "content": "You are a content reviewer specialized in Arabic literature. Your task is to review stories to ensure they comply with Arab cultural values."
86
+ },
87
+ {
88
+ "role": "user",
89
+ "content": f"{self.moderation_prompt}\n\n{story_content}"
90
+ }
91
+ ],
92
+ "max_tokens": 10,
93
+ "temperature": 0.1,
94
+ "stream": False
95
+ }
96
+
97
+ response = requests.post(
98
+ self.api_url,
99
+ headers=self.headers,
100
+ json=payload,
101
+ timeout=30
102
+ )
103
+
104
+ if response.status_code == 200:
105
+ return response.json()
106
+ else:
107
+ logger.error(f"API Error: {response.status_code} - {response.text}")
108
+ return {"error": f"API Error: {response.status_code}"}
109
+
110
+ except Exception as e:
111
+ logger.error(f"Exception calling Deepseek API: {str(e)}")
112
+ return {"error": str(e)}
113
+
114
+ def _validate_story_format(self, story_content: str) -> bool:
115
+ """
116
+ Basic validation of story format and content
117
+
118
+ Args:
119
+ story_content: Story content to validate
120
+
121
+ Returns:
122
+ Boolean indicating if format is valid
123
+ """
124
+ if not story_content or not isinstance(story_content, str):
125
+ return False
126
+
127
+ # Check minimum length (at least 10 characters)
128
+ if len(story_content.strip()) < 10:
129
+ return False
130
+
131
+ # Check for Arabic characters
132
+ arabic_pattern = re.compile(r'[\u0600-\u06FF\u0750-\u077F\u08A0-\u08FF\uFB50-\uFDFF\uFE70-\uFEFF]')
133
+ if not arabic_pattern.search(story_content):
134
+ return False
135
+
136
+ return True
137
+
138
+ def moderate_story(self, story_content: str) -> Dict[str, Any]:
139
+ """
140
+ Main method to moderate Arabic story content
141
+
142
+ Args:
143
+ story_content: The Arabic story to moderate
144
+
145
+ Returns:
146
+ Dictionary with moderation result
147
+ """
148
+ # Validate input
149
+ if not self._validate_story_format(story_content):
150
+ return {
151
+ "approved": False,
152
+ "response": "no",
153
+ "reason": "Invalid story format or missing Arabic content",
154
+ "timestamp": datetime.now().isoformat()
155
+ }
156
+
157
+ # Clean and prepare content
158
+ cleaned_content = story_content.strip()
159
+
160
+ # Call Deepseek API
161
+ api_response = self._call_deepseek_api(cleaned_content)
162
+
163
+ if "error" in api_response:
164
+ logger.error(f"Moderation failed: {api_response['error']}")
165
+ return {
166
+ "approved": False,
167
+ "response": "no",
168
+ "reason": "Moderation service error",
169
+ "error": api_response["error"],
170
+ "timestamp": datetime.now().isoformat()
171
+ }
172
+
173
+ try:
174
+ # Extract the moderation decision
175
+ ai_response = api_response.get("choices", [{}])[0].get("message", {}).get("content", "").strip().lower()
176
+
177
+ # Determine if content is approved
178
+ approved = ai_response == "true"
179
+ response_value = "true" if approved else "no"
180
+
181
+ result = {
182
+ "approved": approved,
183
+ "response": response_value,
184
+ "ai_decision": ai_response,
185
+ "timestamp": datetime.now().isoformat()
186
+ }
187
+
188
+ if not approved:
189
+ result["reason"] = "Content violates community guidelines or cultural norms"
190
+
191
+ logger.info(f"Moderation completed: {response_value}")
192
+ return result
193
+
194
+ except Exception as e:
195
+ logger.error(f"Error processing API response: {str(e)}")
196
+ return {
197
+ "approved": False,
198
+ "response": "no",
199
+ "reason": "Error processing moderation result",
200
+ "error": str(e),
201
+ "timestamp": datetime.now().isoformat()
202
+ }
203
+
204
+
205
+ # Flask application
206
+ app = Flask(__name__)
207
+
208
+ # Initialize the moderator (API key will be set via environment variable)
209
+ try:
210
+ moderator = ArabicContentModerator()
211
+ logger.info("Arabic Content Moderator initialized successfully")
212
+ except ValueError as e:
213
+ logger.error(f"Failed to initialize moderator: {e}")
214
+ moderator = None
215
+
216
+ @app.route('/', methods=['GET'])
217
+ def home():
218
+ """Home endpoint with API documentation"""
219
+ return jsonify({
220
+ "service": "Arabic Story Content Moderator",
221
+ "version": "1.0.0",
222
+ "description": "AI-powered moderation for Arabic short stories",
223
+ "endpoints": {
224
+ "/health": "Health check",
225
+ "/moderate": "POST - Moderate single story",
226
+ "/moderate/batch": "POST - Moderate multiple stories"
227
+ },
228
+ "usage": {
229
+ "moderate": {
230
+ "method": "POST",
231
+ "payload": {"story_content": "Arabic story text"},
232
+ "response": {"approved": "boolean", "response": "true/no"}
233
+ }
234
+ },
235
+ "status": "healthy" if moderator else "service unavailable"
236
+ })
237
+
238
+ @app.route('/health', methods=['GET'])
239
+ def health_check():
240
+ """Health check endpoint"""
241
+ return jsonify({
242
+ "status": "healthy" if moderator else "unhealthy",
243
+ "service": "Arabic Content Moderator",
244
+ "timestamp": datetime.now().isoformat(),
245
+ "api_available": moderator is not None
246
+ })
247
+
248
+ @app.route('/moderate', methods=['POST'])
249
+ def moderate_content():
250
+ """
251
+ Main moderation endpoint
252
+
253
+ Expected JSON payload:
254
+ {
255
+ "story_content": "Arabic story text here"
256
+ }
257
+
258
+ Returns:
259
+ {
260
+ "approved": true/false,
261
+ "response": "true"/"no",
262
+ "timestamp": "ISO timestamp"
263
+ }
264
+ """
265
+ if not moderator:
266
+ return jsonify({
267
+ "error": "Moderation service not available - API key not configured",
268
+ "approved": False,
269
+ "response": "no"
270
+ }), 500
271
+
272
+ try:
273
+ data = request.get_json()
274
+
275
+ if not data or 'story_content' not in data:
276
+ return jsonify({
277
+ "error": "Missing story_content in request",
278
+ "approved": False,
279
+ "response": "no"
280
+ }), 400
281
+
282
+ story_content = data['story_content']
283
+ result = moderator.moderate_story(story_content)
284
+
285
+ return jsonify(result)
286
+
287
+ except Exception as e:
288
+ logger.error(f"Error in moderate_content: {str(e)}")
289
+ return jsonify({
290
+ "error": "Internal server error",
291
+ "approved": False,
292
+ "response": "no",
293
+ "details": str(e)
294
+ }), 500
295
+
296
+ @app.route('/moderate/batch', methods=['POST'])
297
+ def moderate_batch():
298
+ """
299
+ Batch moderation endpoint
300
+
301
+ Expected JSON payload:
302
+ {
303
+ "stories": ["story1", "story2", "story3"]
304
+ }
305
+ """
306
+ if not moderator:
307
+ return jsonify({
308
+ "error": "Moderation service not available - API key not configured"
309
+ }), 500
310
+
311
+ try:
312
+ data = request.get_json()
313
+
314
+ if not data or 'stories' not in data:
315
+ return jsonify({
316
+ "error": "Missing stories array in request"
317
+ }), 400
318
+
319
+ stories = data['stories']
320
+ if not isinstance(stories, list):
321
+ return jsonify({
322
+ "error": "Stories must be an array"
323
+ }), 400
324
+
325
+ results = []
326
+ for i, story in enumerate(stories):
327
+ logger.info(f"Moderating story {i+1}/{len(stories)}")
328
+ result = moderator.moderate_story(story)
329
+ results.append(result)
330
+
331
+ return jsonify({
332
+ "results": results,
333
+ "total_processed": len(results),
334
+ "timestamp": datetime.now().isoformat()
335
+ })
336
+
337
+ except Exception as e:
338
+ logger.error(f"Error in moderate_batch: {str(e)}")
339
+ return jsonify({
340
+ "error": "Internal server error",
341
+ "details": str(e)
342
+ }), 500
343
+
344
+ if __name__ == '__main__':
345
+ # For local testing
346
+ port = int(os.environ.get('PORT', 7860))
347
+ app.run(host='0.0.0.0', port=port, debug=False)
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ flask==2.3.3
2
+ requests==2.31.0
3
+ python-dotenv==1.0.0
4
+ gunicorn==21.2.0