Nanny7 commited on
Commit
dac7f95
ยท
1 Parent(s): 7cd341b

Add ContentGuardian - Content Audit Agent with Hanyu Xinxie Style

Browse files
Files changed (3) hide show
  1. README.md +99 -7
  2. app.py +194 -0
  3. requirements.txt +1 -0
README.md CHANGED
@@ -1,11 +1,103 @@
1
  ---
2
- title: ContentGuardian
3
- emoji: ๐Ÿƒ
4
- colorFrom: purple
5
- colorTo: gray
6
- sdk: docker
 
 
7
  pinned: false
8
- license: apache-2.0
 
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: ContentGuardian - Content Audit Agent
3
+ emoji: ๐Ÿ›ก๏ธ
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ short_description: China ยท Simplified Chinese ยท Hanyu Xinxie Style Content Audit Agent
12
  ---
13
 
14
+ # ๐Ÿ›ก๏ธ ContentGuardian - Content Audit Agent
15
+
16
+ **China ยท Simplified Chinese ยท Hanyu Xinxie Style ยท Text-based Version**
17
+
18
+ An intelligent content audit system that integrates Chinese culture with modern design concepts, embodying the restrained elegance of Chinese aesthetics.
19
+
20
+ ## โœจ Key Features
21
+
22
+ - ๐Ÿšซ **Inappropriate Advertising Language Detection**: Detects 32 common violation words
23
+ - ๐Ÿ” **Precise Keyword Marking**: Marks user-specified keyword positions
24
+ - ๐Ÿ“Š **Structured Audit Reports**: Standardized text output format
25
+ - ๐ŸŽจ **Hanyu Xinxie Aesthetic Style**: Embodies the restrained elegance of Chinese culture
26
+ - โšก **Stable and Reliable**: Avoids segmentation faults, ensures system stability
27
+
28
+ ## ๐Ÿ” Detection Scope
29
+
30
+ ### Inappropriate Advertising Language (32 types)
31
+
32
+ - **Absolute terms**: ็ปๅฏน, ๅฎŒๅ…จ, 100%, ๆฐธ่ฟœ, ไปŽไธ, ๅฟ…้กป, ไธ€ๅฎš
33
+ - **Superlative terms**: ็ฌฌไธ€, ๆœ€ๅฅฝ, ๆœ€ไฝณ, ๆœ€ไผ˜, ๆœ€ๅผบ, ๆœ€ๅคง, ๆœ€ๅฐ
34
+ - **Exaggerated terms**: ๆžๅบฆ, ่ถ…็บง, ้กถ็บง, ็Ž‹็‰Œ, ๅ† ๅ†›, ้œธไธป
35
+ - **Time-sensitive terms**: ็ซ‹ๅณ, ้ฉฌไธŠ, ็žฌ้—ด, ็ง’ๆ€, ๆ€ฅ้€Ÿ, ้ฃž้€Ÿ
36
+ - **Mystical terms**: ็ฅžๅฅ‡, ๅฅ‡่ฟน, ไธ‡่ƒฝ, ๆ— ๆ•Œ, ๅฎŒ็พŽ, ็ปˆๆž
37
+
38
+ ### Keyword Marking
39
+
40
+ - Precisely locates keywords in text
41
+ - Counts keyword occurrences
42
+ - Supports multiple keyword marking simultaneously
43
+
44
+ ## ๐Ÿ“Š Report Format
45
+
46
+ The system generates audit reports containing:
47
+
48
+ - **Audit Overview**: Total issues, severity level, text length, audit time
49
+ - **Inappropriate Advertising Language Detection**: Detailed words, occurrence counts, position information
50
+ - **Keyword Marking**: Keyword occurrence counts and positions
51
+ - **Detailed Analysis**: Compliance recommendations, risk level, handling suggestions
52
+ - **Design Philosophy**: Reflects the cultural connotations of Hanyu Xinxie
53
+
54
+ ## ๐ŸŽจ Design Philosophy
55
+
56
+ This system adopts the Hanyu Xinxie style, integrating Chinese culture with modern design:
57
+
58
+ - **Cultural Connotation**: Embodies the restrained elegance of Chinese culture
59
+ - **Visual Aesthetics**: Uses ASCII art borders and dividers
60
+ - **Text Expression**: Focuses on accuracy and standardization of expression
61
+ - **Structured Output**: Clear information organization and presentation
62
+
63
+ ## ๐ŸŒŸ Example
64
+
65
+ **Input Text:**
66
+ ```
67
+ This product is absolutely effective, completely side-effect free, the first brand! Buy immediately, instant results!
68
+ ```
69
+
70
+ **Keywords:**
71
+ ```
72
+ product,effect
73
+ ```
74
+
75
+ **Output Report:**
76
+ ```
77
+ โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
78
+ โ•‘ ๐Ÿ“‹ ๅ†…ๅฎนๅฎกๆ ธๆŠฅๅ‘Š โ•‘
79
+ โ•‘ ไธญๅ›ฝยท็ฎ€ไฝ“ไธญๆ–‡ยทๆฑ‰่ฏญๆ–ฐ่งฃ้ฃŽๆ ผ โ•‘
80
+ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
81
+
82
+ ๐Ÿ“Š ๅฎกๆ ธๆฆ‚่งˆ
83
+ โ”œโ”€ ๆ€ป้—ฎ้ข˜ๆ•ฐ๏ผš5 ๅค„
84
+ โ”œโ”€ ไธฅ้‡็จ‹ๅบฆ๏ผš้ซ˜
85
+ โ”œโ”€ ๆ–‡ๆœฌ้•ฟๅบฆ๏ผš95 ๅญ—็ฌฆ
86
+ โ””โ”€ ๅฎกๆ ธๆ—ถ้—ด๏ผš2024-01-01 12:00:00
87
+ ...
88
+ ```
89
+
90
+ ## ๐Ÿš€ Usage
91
+
92
+ 1. **Input Text**: Enter the text content to be audited
93
+ 2. **Keyword Marking**: Enter keywords to mark (separated by commas)
94
+ 3. **Start Audit**: Click the "Submit" button
95
+ 4. **View Report**: The system generates a detailed structured audit report
96
+
97
+ ## ๐Ÿ“„ License
98
+
99
+ MIT License
100
+
101
+ ---
102
+
103
+ **ContentGuardian** - Guarding content quality, embodying cultural beauty
app.py ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ContentGuardian - Content Audit Agent
3
+ China ยท Simplified Chinese ยท Hanyu Xinxie Style ยท Text-based Version
4
+ Optimized for Hugging Face Spaces
5
+ """
6
+ import gradio as gr
7
+ import datetime
8
+
9
+ def comprehensive_text_audit(text, keywords=""):
10
+ """
11
+ Comprehensive text audit - structured text output
12
+ """
13
+ if not text.strip():
14
+ return "โŒ Please enter text content"
15
+
16
+ # Parse keywords
17
+ keyword_list = [k.strip() for k in keywords.split(",") if k.strip()] if keywords else []
18
+
19
+ # 1. Inappropriate advertising language detection
20
+ inappropriate_words = [
21
+ "็ปๅฏน", "ๅฎŒๅ…จ", "100%", "ๆฐธ่ฟœ", "ไปŽไธ", "ๅฟ…้กป", "ไธ€ๅฎš",
22
+ "็ฌฌไธ€", "ๆœ€ๅฅฝ", "ๆœ€ไฝณ", "ๆœ€ไผ˜", "ๆœ€ๅผบ", "ๆœ€ๅคง", "ๆœ€ๅฐ",
23
+ "ๆžๅบฆ", "่ถ…็บง", "้กถ็บง", "็Ž‹็‰Œ", "ๅ† ๅ†›", "้œธไธป",
24
+ "็ซ‹ๅณ", "้ฉฌไธŠ", "็žฌ้—ด", "็ง’ๆ€", "ๆ€ฅ้€Ÿ", "้ฃž้€Ÿ",
25
+ "็ฅžๅฅ‡", "ๅฅ‡่ฟน", "ไธ‡่ƒฝ", "ๆ— ๆ•Œ", "ๅฎŒ็พŽ", "็ปˆๆž",
26
+ "absolutely", "completely", "perfect", "ultimate", "best", "first",
27
+ "immediately", "instantly", "magical", "miracle", "supreme"
28
+ ]
29
+
30
+ found_inappropriate = []
31
+ for word in inappropriate_words:
32
+ if word.lower() in text.lower():
33
+ positions = []
34
+ start = 0
35
+ text_lower = text.lower()
36
+ word_lower = word.lower()
37
+ while True:
38
+ pos = text_lower.find(word_lower, start)
39
+ if pos == -1:
40
+ break
41
+ positions.append(pos)
42
+ start = pos + 1
43
+ if positions:
44
+ found_inappropriate.append({
45
+ "word": word,
46
+ "count": len(positions),
47
+ "positions": positions
48
+ })
49
+
50
+ # 2. Keyword marking
51
+ found_keywords = []
52
+ for keyword in keyword_list:
53
+ if keyword.lower() in text.lower():
54
+ positions = []
55
+ start = 0
56
+ text_lower = text.lower()
57
+ keyword_lower = keyword.lower()
58
+ while True:
59
+ pos = text_lower.find(keyword_lower, start)
60
+ if pos == -1:
61
+ break
62
+ positions.append(pos)
63
+ start = pos + 1
64
+ if positions:
65
+ found_keywords.append({
66
+ "keyword": keyword,
67
+ "count": len(positions),
68
+ "positions": positions
69
+ })
70
+
71
+ # 3. Generate structured report
72
+ total_issues = len(found_inappropriate)
73
+ severity = "ไฝŽ" if total_issues < 3 else "ไธญ" if total_issues < 6 else "้ซ˜"
74
+
75
+ # Build standardized text report
76
+ report = f"""
77
+ โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
78
+ โ•‘ ๐Ÿ“‹ ๅ†…ๅฎนๅฎกๆ ธๆŠฅๅ‘Š โ•‘
79
+ โ•‘ ไธญๅ›ฝยท็ฎ€ไฝ“ไธญๆ–‡ยทๆฑ‰่ฏญๆ–ฐ่งฃ้ฃŽๆ ผ โ•‘
80
+ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
81
+
82
+ ๐Ÿ“Š ๅฎกๆ ธๆฆ‚่งˆ
83
+ โ”œโ”€ ๆ€ป้—ฎ้ข˜ๆ•ฐ๏ผš{total_issues} ๅค„
84
+ โ”œโ”€ ไธฅ้‡็จ‹ๅบฆ๏ผš{severity}
85
+ โ”œโ”€ ๆ–‡ๆœฌ้•ฟๅบฆ๏ผš{len(text)} ๅญ—็ฌฆ
86
+ โ””โ”€ ๅฎกๆ ธๆ—ถ้—ด๏ผš{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
87
+
88
+ โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
89
+
90
+ ๐Ÿšซ ไธๅฝ“ๅนฟๅ‘Š็”จ่ฏญๆฃ€ๆต‹ ({len(found_inappropriate)} ๅค„)
91
+ """
92
+
93
+ if found_inappropriate:
94
+ for i, item in enumerate(found_inappropriate, 1):
95
+ report += f"""
96
+ {i}. ่ฏๆฑ‡๏ผš"{item['word']}"
97
+ โ””โ”€ ๅ‡บ็Žฐๆฌกๆ•ฐ๏ผš{item['count']} ๆฌก
98
+ โ””โ”€ ไฝ็ฝฎ๏ผš{', '.join(map(str, item['positions']))}
99
+ """
100
+ else:
101
+ report += "\n โœ… ๆœชๅ‘็Žฐไธๅฝ“ๅนฟๅ‘Š็”จ่ฏญ\n"
102
+
103
+ report += f"""
104
+ โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
105
+
106
+ ๐Ÿ” ๅ…ณ้”ฎ่ฏๆ ‡่ฎฐ ({len(found_keywords)} ๅค„)
107
+ """
108
+
109
+ if found_keywords:
110
+ for i, item in enumerate(found_keywords, 1):
111
+ report += f"""
112
+ {i}. ๅ…ณ้”ฎ่ฏ๏ผš"{item['keyword']}"
113
+ โ””โ”€ ๅ‡บ็Žฐๆฌกๆ•ฐ๏ผš{item['count']} ๆฌก
114
+ โ””โ”€ ไฝ็ฝฎ๏ผš{', '.join(map(str, item['positions']))}
115
+ """
116
+ else:
117
+ report += "\n โ„น๏ธ ๆœชๆŒ‡ๅฎšๅ…ณ้”ฎ่ฏๆˆ–ๅ…ณ้”ฎ่ฏๆœชๅ‡บ็Žฐ\n"
118
+
119
+ report += f"""
120
+ โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
121
+
122
+ ๐Ÿ“‹ ่ฏฆ็ป†ๅˆ†ๆž
123
+ โ”œโ”€ ๅˆ่ง„ๅปบ่ฎฎ๏ผš
124
+ """
125
+
126
+ if found_inappropriate:
127
+ report += f"""โ”‚ โ€ข ๅปบ่ฎฎไฟฎ๏ฟฝ๏ฟฝ๏ฟฝๆˆ–ๅˆ ้™ค {len(found_inappropriate)} ไธชไธๅฝ“็”จ่ฏ
128
+ โ”‚ โ€ข ไฝฟ็”จ็›ธๅฏนๆ€ง่กจ่ฟฐๆ›ฟไปฃ็ปๅฏนๅŒ–็”จ่ฏ
129
+ โ”‚ โ€ข ้ฟๅ…ๅคธๅคงๅฎฃไผ ๅ’Œ่ฏฏๅฏผๆ€ง่กจ่ฟฐ
130
+ """
131
+ else:
132
+ report += "โ”‚ โ€ข ๅ†…ๅฎน่กจ่ฟฐ่ง„่Œƒ๏ผŒ็ฌฆๅˆๅนฟๅ‘Šๆณ•่ฆๆฑ‚\n"
133
+
134
+ report += f"""โ”œโ”€ ้ฃŽ้™ฉ็ญ‰็บง๏ผš{severity}
135
+ โ””โ”€ ๅค„็†ๅปบ่ฎฎ๏ผš{"้œ€่ฆ้‡็‚นๅ…ณๆณจๅ’Œไฟฎๆ”น" if severity == "้ซ˜" else "ๅปบ่ฎฎ้€‚ๅฝ“่ฐƒๆ•ด" if severity == "ไธญ" else "ๅฏไปฅๆญฃๅธธไฝฟ็”จ"}
136
+
137
+ โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
138
+
139
+ ๐ŸŽจ ่ฎพ่ฎก็†ๅฟต
140
+ ๆœฌๅฎกๆ ธ็ณป็ปŸ่žๅˆไธญๅŽๆ–‡ๅŒ–ไธŽ็Žฐไปฃ่ฎพ่ฎก๏ผŒไฝ“็Žฐๅ†…ๆ•›ๅ…ธ้›…ไน‹็พŽใ€‚
141
+ ้‡‡็”จๆฑ‰่ฏญๆ–ฐ่งฃ้ฃŽๆ ผ๏ผŒๆณจ้‡ๆ–‡ๅญ—็š„ๅ‡†็กฎๆ€งๅ’Œ่กจ่พพ็š„่ง„่Œƒๆ€งใ€‚
142
+
143
+ โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
144
+ โ•‘ ๅฎกๆ ธๅฎŒๆˆ - ๆ„Ÿ่ฐขไฝฟ็”จ ContentGuardian ๅ†…ๅฎนๅฎกๆ ธๆ™บ่ƒฝไฝ“ โ•‘
145
+ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
146
+ """
147
+
148
+ return report
149
+
150
+ # Create Gradio interface optimized for HF Spaces
151
+ demo = gr.Interface(
152
+ fn=comprehensive_text_audit,
153
+ inputs=[
154
+ gr.Textbox(
155
+ label="Text to Audit",
156
+ placeholder="Please enter the text content to be audited...",
157
+ lines=8
158
+ ),
159
+ gr.Textbox(
160
+ label="Keywords (Optional)",
161
+ placeholder="Please enter keywords to mark, separated by commas",
162
+ lines=2
163
+ )
164
+ ],
165
+ outputs=gr.Textbox(
166
+ label="Audit Report",
167
+ lines=25,
168
+ max_lines=30
169
+ ),
170
+ title="๐Ÿ›ก๏ธ ContentGuardian - Content Audit Agent",
171
+ description="""
172
+ **China ยท Simplified Chinese ยท Hanyu Xinxie Style ยท Text-based Version**
173
+
174
+ This system uses standardized structured text output, integrating Chinese culture with modern design concepts.
175
+ Detects inappropriate advertising language, marks specified keywords, and generates detailed audit reports.
176
+
177
+ **Key Features:**
178
+ - ๐Ÿšซ Inappropriate Advertising Language Detection (32+ common violation words)
179
+ - ๐Ÿ” Precise Keyword Marking
180
+ - ๐Ÿ“Š Structured Audit Reports
181
+ - ๐ŸŽจ Hanyu Xinxie Aesthetic Style
182
+ """,
183
+ examples=[
184
+ ["This product is absolutely effective, completely side-effect free, the first brand! Buy immediately, instant results!", "product,effect"],
185
+ ["Our product quality is very good, trustworthy, welcome to purchase.", "product,quality"],
186
+ ["Buy now, instant effect, 100% effective, absolutely satisfying!", "buy,effect"],
187
+ ["่ฟ™ๆฌพไบงๅ“ๆ•ˆๆžœ็ปๅฏนๅฅฝ๏ผŒๅฎŒๅ…จๆ— ๅ‰ฏไฝœ็”จ๏ผŒ็ฌฌไธ€ๅ“็‰Œ๏ผ็ซ‹ๅณ่ดญไนฐ๏ผŒ้ฉฌไธŠ่งๆ•ˆ๏ผ", "ไบงๅ“,ๆ•ˆๆžœ"]
188
+ ],
189
+ theme=gr.themes.Soft(),
190
+ flagging_mode="never"
191
+ )
192
+
193
+ if __name__ == "__main__":
194
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ gradio>=4.0.0