Add ContentGuardian - Content Audit Agent with Hanyu Xinxie Style
Browse files- README.md +99 -7
- app.py +194 -0
- requirements.txt +1 -0
README.md
CHANGED
@@ -1,11 +1,103 @@
|
|
1 |
---
|
2 |
-
title: ContentGuardian
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
-
sdk:
|
|
|
|
|
7 |
pinned: false
|
8 |
-
license:
|
|
|
9 |
---
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: ContentGuardian - Content Audit Agent
|
3 |
+
emoji: ๐ก๏ธ
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: purple
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 4.44.0
|
8 |
+
app_file: app.py
|
9 |
pinned: false
|
10 |
+
license: mit
|
11 |
+
short_description: China ยท Simplified Chinese ยท Hanyu Xinxie Style Content Audit Agent
|
12 |
---
|
13 |
|
14 |
+
# ๐ก๏ธ ContentGuardian - Content Audit Agent
|
15 |
+
|
16 |
+
**China ยท Simplified Chinese ยท Hanyu Xinxie Style ยท Text-based Version**
|
17 |
+
|
18 |
+
An intelligent content audit system that integrates Chinese culture with modern design concepts, embodying the restrained elegance of Chinese aesthetics.
|
19 |
+
|
20 |
+
## โจ Key Features
|
21 |
+
|
22 |
+
- ๐ซ **Inappropriate Advertising Language Detection**: Detects 32 common violation words
|
23 |
+
- ๐ **Precise Keyword Marking**: Marks user-specified keyword positions
|
24 |
+
- ๐ **Structured Audit Reports**: Standardized text output format
|
25 |
+
- ๐จ **Hanyu Xinxie Aesthetic Style**: Embodies the restrained elegance of Chinese culture
|
26 |
+
- โก **Stable and Reliable**: Avoids segmentation faults, ensures system stability
|
27 |
+
|
28 |
+
## ๐ Detection Scope
|
29 |
+
|
30 |
+
### Inappropriate Advertising Language (32 types)
|
31 |
+
|
32 |
+
- **Absolute terms**: ็ปๅฏน, ๅฎๅ
จ, 100%, ๆฐธ่ฟ, ไปไธ, ๅฟ
้กป, ไธๅฎ
|
33 |
+
- **Superlative terms**: ็ฌฌไธ, ๆๅฅฝ, ๆไฝณ, ๆไผ, ๆๅผบ, ๆๅคง, ๆๅฐ
|
34 |
+
- **Exaggerated terms**: ๆๅบฆ, ่ถ
็บง, ้กถ็บง, ็็, ๅ ๅ, ้ธไธป
|
35 |
+
- **Time-sensitive terms**: ็ซๅณ, ้ฉฌไธ, ็ฌ้ด, ็งๆ, ๆฅ้, ้ฃ้
|
36 |
+
- **Mystical terms**: ็ฅๅฅ, ๅฅ่ฟน, ไธ่ฝ, ๆ ๆ, ๅฎ็พ, ็ปๆ
|
37 |
+
|
38 |
+
### Keyword Marking
|
39 |
+
|
40 |
+
- Precisely locates keywords in text
|
41 |
+
- Counts keyword occurrences
|
42 |
+
- Supports multiple keyword marking simultaneously
|
43 |
+
|
44 |
+
## ๐ Report Format
|
45 |
+
|
46 |
+
The system generates audit reports containing:
|
47 |
+
|
48 |
+
- **Audit Overview**: Total issues, severity level, text length, audit time
|
49 |
+
- **Inappropriate Advertising Language Detection**: Detailed words, occurrence counts, position information
|
50 |
+
- **Keyword Marking**: Keyword occurrence counts and positions
|
51 |
+
- **Detailed Analysis**: Compliance recommendations, risk level, handling suggestions
|
52 |
+
- **Design Philosophy**: Reflects the cultural connotations of Hanyu Xinxie
|
53 |
+
|
54 |
+
## ๐จ Design Philosophy
|
55 |
+
|
56 |
+
This system adopts the Hanyu Xinxie style, integrating Chinese culture with modern design:
|
57 |
+
|
58 |
+
- **Cultural Connotation**: Embodies the restrained elegance of Chinese culture
|
59 |
+
- **Visual Aesthetics**: Uses ASCII art borders and dividers
|
60 |
+
- **Text Expression**: Focuses on accuracy and standardization of expression
|
61 |
+
- **Structured Output**: Clear information organization and presentation
|
62 |
+
|
63 |
+
## ๐ Example
|
64 |
+
|
65 |
+
**Input Text:**
|
66 |
+
```
|
67 |
+
This product is absolutely effective, completely side-effect free, the first brand! Buy immediately, instant results!
|
68 |
+
```
|
69 |
+
|
70 |
+
**Keywords:**
|
71 |
+
```
|
72 |
+
product,effect
|
73 |
+
```
|
74 |
+
|
75 |
+
**Output Report:**
|
76 |
+
```
|
77 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
78 |
+
โ ๐ ๅ
ๅฎนๅฎกๆ ธๆฅๅ โ
|
79 |
+
โ ไธญๅฝยท็ฎไฝไธญๆยทๆฑ่ฏญๆฐ่งฃ้ฃๆ ผ โ
|
80 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
81 |
+
|
82 |
+
๐ ๅฎกๆ ธๆฆ่ง
|
83 |
+
โโ ๆป้ฎ้ขๆฐ๏ผ5 ๅค
|
84 |
+
โโ ไธฅ้็จๅบฆ๏ผ้ซ
|
85 |
+
โโ ๆๆฌ้ฟๅบฆ๏ผ95 ๅญ็ฌฆ
|
86 |
+
โโ ๅฎกๆ ธๆถ้ด๏ผ2024-01-01 12:00:00
|
87 |
+
...
|
88 |
+
```
|
89 |
+
|
90 |
+
## ๐ Usage
|
91 |
+
|
92 |
+
1. **Input Text**: Enter the text content to be audited
|
93 |
+
2. **Keyword Marking**: Enter keywords to mark (separated by commas)
|
94 |
+
3. **Start Audit**: Click the "Submit" button
|
95 |
+
4. **View Report**: The system generates a detailed structured audit report
|
96 |
+
|
97 |
+
## ๐ License
|
98 |
+
|
99 |
+
MIT License
|
100 |
+
|
101 |
+
---
|
102 |
+
|
103 |
+
**ContentGuardian** - Guarding content quality, embodying cultural beauty
|
app.py
ADDED
@@ -0,0 +1,194 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
ContentGuardian - Content Audit Agent
|
3 |
+
China ยท Simplified Chinese ยท Hanyu Xinxie Style ยท Text-based Version
|
4 |
+
Optimized for Hugging Face Spaces
|
5 |
+
"""
|
6 |
+
import gradio as gr
|
7 |
+
import datetime
|
8 |
+
|
9 |
+
def comprehensive_text_audit(text, keywords=""):
|
10 |
+
"""
|
11 |
+
Comprehensive text audit - structured text output
|
12 |
+
"""
|
13 |
+
if not text.strip():
|
14 |
+
return "โ Please enter text content"
|
15 |
+
|
16 |
+
# Parse keywords
|
17 |
+
keyword_list = [k.strip() for k in keywords.split(",") if k.strip()] if keywords else []
|
18 |
+
|
19 |
+
# 1. Inappropriate advertising language detection
|
20 |
+
inappropriate_words = [
|
21 |
+
"็ปๅฏน", "ๅฎๅ
จ", "100%", "ๆฐธ่ฟ", "ไปไธ", "ๅฟ
้กป", "ไธๅฎ",
|
22 |
+
"็ฌฌไธ", "ๆๅฅฝ", "ๆไฝณ", "ๆไผ", "ๆๅผบ", "ๆๅคง", "ๆๅฐ",
|
23 |
+
"ๆๅบฆ", "่ถ
็บง", "้กถ็บง", "็็", "ๅ ๅ", "้ธไธป",
|
24 |
+
"็ซๅณ", "้ฉฌไธ", "็ฌ้ด", "็งๆ", "ๆฅ้", "้ฃ้",
|
25 |
+
"็ฅๅฅ", "ๅฅ่ฟน", "ไธ่ฝ", "ๆ ๆ", "ๅฎ็พ", "็ปๆ",
|
26 |
+
"absolutely", "completely", "perfect", "ultimate", "best", "first",
|
27 |
+
"immediately", "instantly", "magical", "miracle", "supreme"
|
28 |
+
]
|
29 |
+
|
30 |
+
found_inappropriate = []
|
31 |
+
for word in inappropriate_words:
|
32 |
+
if word.lower() in text.lower():
|
33 |
+
positions = []
|
34 |
+
start = 0
|
35 |
+
text_lower = text.lower()
|
36 |
+
word_lower = word.lower()
|
37 |
+
while True:
|
38 |
+
pos = text_lower.find(word_lower, start)
|
39 |
+
if pos == -1:
|
40 |
+
break
|
41 |
+
positions.append(pos)
|
42 |
+
start = pos + 1
|
43 |
+
if positions:
|
44 |
+
found_inappropriate.append({
|
45 |
+
"word": word,
|
46 |
+
"count": len(positions),
|
47 |
+
"positions": positions
|
48 |
+
})
|
49 |
+
|
50 |
+
# 2. Keyword marking
|
51 |
+
found_keywords = []
|
52 |
+
for keyword in keyword_list:
|
53 |
+
if keyword.lower() in text.lower():
|
54 |
+
positions = []
|
55 |
+
start = 0
|
56 |
+
text_lower = text.lower()
|
57 |
+
keyword_lower = keyword.lower()
|
58 |
+
while True:
|
59 |
+
pos = text_lower.find(keyword_lower, start)
|
60 |
+
if pos == -1:
|
61 |
+
break
|
62 |
+
positions.append(pos)
|
63 |
+
start = pos + 1
|
64 |
+
if positions:
|
65 |
+
found_keywords.append({
|
66 |
+
"keyword": keyword,
|
67 |
+
"count": len(positions),
|
68 |
+
"positions": positions
|
69 |
+
})
|
70 |
+
|
71 |
+
# 3. Generate structured report
|
72 |
+
total_issues = len(found_inappropriate)
|
73 |
+
severity = "ไฝ" if total_issues < 3 else "ไธญ" if total_issues < 6 else "้ซ"
|
74 |
+
|
75 |
+
# Build standardized text report
|
76 |
+
report = f"""
|
77 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
78 |
+
โ ๐ ๅ
ๅฎนๅฎกๆ ธๆฅๅ โ
|
79 |
+
โ ไธญๅฝยท็ฎไฝไธญๆยทๆฑ่ฏญๆฐ่งฃ้ฃๆ ผ โ
|
80 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
81 |
+
|
82 |
+
๐ ๅฎกๆ ธๆฆ่ง
|
83 |
+
โโ ๆป้ฎ้ขๆฐ๏ผ{total_issues} ๅค
|
84 |
+
โโ ไธฅ้็จๅบฆ๏ผ{severity}
|
85 |
+
โโ ๆๆฌ้ฟๅบฆ๏ผ{len(text)} ๅญ็ฌฆ
|
86 |
+
โโ ๅฎกๆ ธๆถ้ด๏ผ{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
|
87 |
+
|
88 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
89 |
+
|
90 |
+
๐ซ ไธๅฝๅนฟๅ็จ่ฏญๆฃๆต ({len(found_inappropriate)} ๅค)
|
91 |
+
"""
|
92 |
+
|
93 |
+
if found_inappropriate:
|
94 |
+
for i, item in enumerate(found_inappropriate, 1):
|
95 |
+
report += f"""
|
96 |
+
{i}. ่ฏๆฑ๏ผ"{item['word']}"
|
97 |
+
โโ ๅบ็ฐๆฌกๆฐ๏ผ{item['count']} ๆฌก
|
98 |
+
โโ ไฝ็ฝฎ๏ผ{', '.join(map(str, item['positions']))}
|
99 |
+
"""
|
100 |
+
else:
|
101 |
+
report += "\n โ
ๆชๅ็ฐไธๅฝๅนฟๅ็จ่ฏญ\n"
|
102 |
+
|
103 |
+
report += f"""
|
104 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
105 |
+
|
106 |
+
๐ ๅ
ณ้ฎ่ฏๆ ่ฎฐ ({len(found_keywords)} ๅค)
|
107 |
+
"""
|
108 |
+
|
109 |
+
if found_keywords:
|
110 |
+
for i, item in enumerate(found_keywords, 1):
|
111 |
+
report += f"""
|
112 |
+
{i}. ๅ
ณ้ฎ่ฏ๏ผ"{item['keyword']}"
|
113 |
+
โโ ๅบ็ฐๆฌกๆฐ๏ผ{item['count']} ๆฌก
|
114 |
+
โโ ไฝ็ฝฎ๏ผ{', '.join(map(str, item['positions']))}
|
115 |
+
"""
|
116 |
+
else:
|
117 |
+
report += "\n โน๏ธ ๆชๆๅฎๅ
ณ้ฎ่ฏๆๅ
ณ้ฎ่ฏๆชๅบ็ฐ\n"
|
118 |
+
|
119 |
+
report += f"""
|
120 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
121 |
+
|
122 |
+
๐ ่ฏฆ็ปๅๆ
|
123 |
+
โโ ๅ่งๅปบ่ฎฎ๏ผ
|
124 |
+
"""
|
125 |
+
|
126 |
+
if found_inappropriate:
|
127 |
+
report += f"""โ โข ๅปบ่ฎฎไฟฎ๏ฟฝ๏ฟฝ๏ฟฝๆๅ ้ค {len(found_inappropriate)} ไธชไธๅฝ็จ่ฏ
|
128 |
+
โ โข ไฝฟ็จ็ธๅฏนๆง่กจ่ฟฐๆฟไปฃ็ปๅฏนๅ็จ่ฏ
|
129 |
+
โ โข ้ฟๅ
ๅคธๅคงๅฎฃไผ ๅ่ฏฏๅฏผๆง่กจ่ฟฐ
|
130 |
+
"""
|
131 |
+
else:
|
132 |
+
report += "โ โข ๅ
ๅฎน่กจ่ฟฐ่ง่๏ผ็ฌฆๅๅนฟๅๆณ่ฆๆฑ\n"
|
133 |
+
|
134 |
+
report += f"""โโ ้ฃ้ฉ็ญ็บง๏ผ{severity}
|
135 |
+
โโ ๅค็ๅปบ่ฎฎ๏ผ{"้่ฆ้็นๅ
ณๆณจๅไฟฎๆน" if severity == "้ซ" else "ๅปบ่ฎฎ้ๅฝ่ฐๆด" if severity == "ไธญ" else "ๅฏไปฅๆญฃๅธธไฝฟ็จ"}
|
136 |
+
|
137 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
138 |
+
|
139 |
+
๐จ ่ฎพ่ฎก็ๅฟต
|
140 |
+
ๆฌๅฎกๆ ธ็ณป็ป่ๅไธญๅๆๅไธ็ฐไปฃ่ฎพ่ฎก๏ผไฝ็ฐๅ
ๆๅ
ธ้
ไน็พใ
|
141 |
+
้็จๆฑ่ฏญๆฐ่งฃ้ฃๆ ผ๏ผๆณจ้ๆๅญ็ๅ็กฎๆงๅ่กจ่พพ็่ง่ๆงใ
|
142 |
+
|
143 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
144 |
+
โ ๅฎกๆ ธๅฎๆ - ๆ่ฐขไฝฟ็จ ContentGuardian ๅ
ๅฎนๅฎกๆ ธๆบ่ฝไฝ โ
|
145 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
146 |
+
"""
|
147 |
+
|
148 |
+
return report
|
149 |
+
|
150 |
+
# Create Gradio interface optimized for HF Spaces
|
151 |
+
demo = gr.Interface(
|
152 |
+
fn=comprehensive_text_audit,
|
153 |
+
inputs=[
|
154 |
+
gr.Textbox(
|
155 |
+
label="Text to Audit",
|
156 |
+
placeholder="Please enter the text content to be audited...",
|
157 |
+
lines=8
|
158 |
+
),
|
159 |
+
gr.Textbox(
|
160 |
+
label="Keywords (Optional)",
|
161 |
+
placeholder="Please enter keywords to mark, separated by commas",
|
162 |
+
lines=2
|
163 |
+
)
|
164 |
+
],
|
165 |
+
outputs=gr.Textbox(
|
166 |
+
label="Audit Report",
|
167 |
+
lines=25,
|
168 |
+
max_lines=30
|
169 |
+
),
|
170 |
+
title="๐ก๏ธ ContentGuardian - Content Audit Agent",
|
171 |
+
description="""
|
172 |
+
**China ยท Simplified Chinese ยท Hanyu Xinxie Style ยท Text-based Version**
|
173 |
+
|
174 |
+
This system uses standardized structured text output, integrating Chinese culture with modern design concepts.
|
175 |
+
Detects inappropriate advertising language, marks specified keywords, and generates detailed audit reports.
|
176 |
+
|
177 |
+
**Key Features:**
|
178 |
+
- ๐ซ Inappropriate Advertising Language Detection (32+ common violation words)
|
179 |
+
- ๐ Precise Keyword Marking
|
180 |
+
- ๐ Structured Audit Reports
|
181 |
+
- ๐จ Hanyu Xinxie Aesthetic Style
|
182 |
+
""",
|
183 |
+
examples=[
|
184 |
+
["This product is absolutely effective, completely side-effect free, the first brand! Buy immediately, instant results!", "product,effect"],
|
185 |
+
["Our product quality is very good, trustworthy, welcome to purchase.", "product,quality"],
|
186 |
+
["Buy now, instant effect, 100% effective, absolutely satisfying!", "buy,effect"],
|
187 |
+
["่ฟๆฌพไบงๅๆๆ็ปๅฏนๅฅฝ๏ผๅฎๅ
จๆ ๅฏไฝ็จ๏ผ็ฌฌไธๅ็๏ผ็ซๅณ่ดญไนฐ๏ผ้ฉฌไธ่งๆ๏ผ", "ไบงๅ,ๆๆ"]
|
188 |
+
],
|
189 |
+
theme=gr.themes.Soft(),
|
190 |
+
flagging_mode="never"
|
191 |
+
)
|
192 |
+
|
193 |
+
if __name__ == "__main__":
|
194 |
+
demo.launch()
|
requirements.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
gradio>=4.0.0
|