johnlockejrr commited on
Commit
b8d47bd
Β·
verified Β·
1 Parent(s): 93cdf79

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +56 -13
  2. app.py +335 -0
README.md CHANGED
@@ -1,20 +1,63 @@
1
  ---
2
- title: Samaritan Hebrew Aramaic
3
- emoji: πŸš€
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
  pinned: false
11
- short_description: Translate from Samaritan Hebrew to Samaritan Aramaic
12
  license: mit
13
  ---
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Hebrew-Aramaic Translator
3
+ emoji: πŸ“š
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: streamlit
7
+ sdk_version: 1.28.0
8
+ app_file: app.py
 
9
  pinned: false
 
10
  license: mit
11
  ---
12
 
13
+ # Hebrew-Aramaic Translator
14
 
15
+ A modern, interactive web application for translating from Samaritan Hebrew to Samaritan Aramaic using the Hugging Face model `johnlockejrr/opus-mt-arc-heb`.
16
 
17
+ ## 🌟 Features
18
+
19
+ - **Samaritan Hebrew to Samaritan Aramaic Translation**: Specialized translation for Samaritan texts
20
+ - **Modern UI**: Clean, responsive design with gradient styling
21
+ - **Real-time Translation**: Instant translation with progress indicators
22
+ - **Batch Processing**: Upload text files for bulk translation
23
+ - **Copy to Clipboard**: Easy copying of translation results
24
+ - **Customizable Settings**: Adjustable output length
25
+
26
+ ## πŸš€ Try it out!
27
+
28
+ 1. **Enter your Samaritan Hebrew text** in the input area
29
+ 2. **Click "πŸ”„ Translate"** to get instant results
30
+ 3. **Copy or download** your translations
31
+
32
+ ## πŸ“ Batch Translation
33
+
34
+ Upload a text file with multiple lines to translate them all at once. Results can be downloaded as a CSV file.
35
+
36
+ ## πŸ”§ Model Information
37
+
38
+ - **Model**: `johnlockejrr/opus-mt-arc-heb`
39
+ - **Type**: Sequence-to-Sequence Translation Model
40
+ - **Framework**: Hugging Face Transformers + PyTorch
41
+ - **Languages**: Samaritan Hebrew β†’ Samaritan Aramaic
42
+
43
+ ## πŸ› οΈ Technical Details
44
+
45
+ - **Backend**: Streamlit + PyTorch + Transformers
46
+ - **Caching**: Model is cached for faster subsequent loads
47
+ - **Device**: Automatically uses GPU if available, falls back to CPU
48
+ - **Styling**: Custom CSS with gradient backgrounds and modern design
49
+
50
+ ## πŸ“– Usage Examples
51
+
52
+ ### Samaritan Hebrew to Samaritan Aramaic
53
+ Input: `Χ•Χ™ΧΧžΧ¨ ΧΧœΧ”Χ™Χ Χ™Χ”Χ™ Χ¨Χ§Χ™Χ’ Χ‘ΧͺΧ•Χš Χ”ΧžΧ™Χ`
54
+ Output: `Χ•ΧΧžΧ¨ ΧΧœΧ”Χ™Χ Χ™Χ”Χ™ Χ¨Χ§Χ™Χ’ Χ‘Χ’Χ• ΧžΧ™Χ”`
55
+
56
+ ## πŸ”— Related Links
57
+
58
+ - [Model on Hugging Face](https://huggingface.co/johnlockejrr/opus-mt-arc-heb)
59
+ - [Source Code](https://github.com/your-repo/hebrew-aramaic-translator)
60
+
61
+ ## πŸ“„ License
62
+
63
+ This application uses the `johnlockejrr/opus-mt-arc-heb` model from Hugging Face. Please refer to the model's license for usage terms.
app.py ADDED
@@ -0,0 +1,335 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import torch
3
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
4
+ import time
5
+ from typing import Optional
6
+ import json
7
+
8
+ # Page configuration
9
+ st.set_page_config(
10
+ page_title="Hebrew-Aramaic Translator",
11
+ page_icon="πŸ“š",
12
+ layout="wide",
13
+ initial_sidebar_state="expanded"
14
+ )
15
+
16
+ # Custom CSS for modern styling
17
+ st.markdown("""
18
+ <style>
19
+ .main-header {
20
+ font-size: 3rem;
21
+ font-weight: 700;
22
+ background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
23
+ -webkit-background-clip: text;
24
+ -webkit-text-fill-color: transparent;
25
+ text-align: center;
26
+ margin-bottom: 2rem;
27
+ }
28
+
29
+ .sub-header {
30
+ font-size: 1.2rem;
31
+ color: #666;
32
+ text-align: center;
33
+ margin-bottom: 3rem;
34
+ }
35
+
36
+ .translation-box {
37
+ background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
38
+ padding: 2rem;
39
+ border-radius: 15px;
40
+ box-shadow: 0 8px 32px rgba(0,0,0,0.1);
41
+ margin: 1rem 0;
42
+ }
43
+
44
+ .input-area {
45
+ background: white;
46
+ border-radius: 10px;
47
+ padding: 1.5rem;
48
+ box-shadow: 0 4px 16px rgba(0,0,0,0.05);
49
+ }
50
+
51
+ .output-area {
52
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
53
+ color: white;
54
+ border-radius: 10px;
55
+ padding: 1.5rem;
56
+ box-shadow: 0 4px 16px rgba(0,0,0,0.1);
57
+ }
58
+
59
+ .direction-selector {
60
+ background: white;
61
+ border-radius: 10px;
62
+ padding: 1rem;
63
+ box-shadow: 0 4px 16px rgba(0,0,0,0.05);
64
+ margin-bottom: 1rem;
65
+ }
66
+
67
+ .stButton > button {
68
+ background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
69
+ color: white;
70
+ border: none;
71
+ border-radius: 25px;
72
+ padding: 0.75rem 2rem;
73
+ font-weight: 600;
74
+ transition: all 0.3s ease;
75
+ }
76
+
77
+ .stButton > button:hover {
78
+ transform: translateY(-2px);
79
+ box-shadow: 0 8px 25px rgba(102, 126, 234, 0.4);
80
+ }
81
+
82
+ .model-info {
83
+ background: #f8f9fa;
84
+ border-radius: 10px;
85
+ padding: 1rem;
86
+ margin: 1rem 0;
87
+ border-left: 4px solid #667eea;
88
+ }
89
+ </style>
90
+ """, unsafe_allow_html=True)
91
+
92
+ @st.cache_resource
93
+ def load_model():
94
+ """Load the Hugging Face model and tokenizer with caching."""
95
+ model_name = "johnlockejrr/opus-mt-arc-heb"
96
+
97
+ with st.spinner("Loading translation model..."):
98
+ try:
99
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
100
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
101
+
102
+ # Move to GPU if available
103
+ device = "cuda" if torch.cuda.is_available() else "cpu"
104
+ model.to(device)
105
+ model.eval()
106
+
107
+ return tokenizer, model, device
108
+ except Exception as e:
109
+ st.error(f"Error loading model: {str(e)}")
110
+ return None, None, None
111
+
112
+ def translate_text(text: str, direction: str, tokenizer, model, device: str, max_length: int = 512) -> Optional[str]:
113
+ """Translate text using the loaded model."""
114
+ if not text.strip():
115
+ return None
116
+
117
+ try:
118
+ # Add language prefix based on direction (using the working back/inference.py logic)
119
+ if direction == "Hebrew to Aramaic":
120
+ input_text = f"<he> {text}"
121
+ else: # Aramaic to Hebrew
122
+ input_text = f"<ar> {text}"
123
+
124
+ # Tokenize input
125
+ inputs = tokenizer(
126
+ input_text,
127
+ return_tensors="pt",
128
+ max_length=max_length,
129
+ truncation=True,
130
+ padding=True
131
+ ).to(device)
132
+
133
+ # Generate translation
134
+ with torch.no_grad():
135
+ outputs = model.generate(
136
+ **inputs,
137
+ max_length=max_length,
138
+ num_beams=4,
139
+ length_penalty=0.6,
140
+ early_stopping=True,
141
+ do_sample=False
142
+ )
143
+
144
+ # Decode output
145
+ translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
146
+ return translation
147
+
148
+ except Exception as e:
149
+ st.error(f"Translation error: {str(e)}")
150
+ return None
151
+
152
+ def main():
153
+ # Header
154
+ st.markdown('<h1 class="main-header">πŸ“š Samaritan Hebrew-Aramaic Translator</h1>', unsafe_allow_html=True)
155
+ st.markdown('<p class="sub-header">Powered by the johnlockejrr/opus-mt-arc-heb model</p>', unsafe_allow_html=True)
156
+
157
+ # Load model
158
+ tokenizer, model, device = load_model()
159
+
160
+ if tokenizer is None or model is None:
161
+ st.error("Failed to load the translation model. Please check your internet connection and try again.")
162
+ return
163
+
164
+ # Sidebar for settings
165
+ with st.sidebar:
166
+ st.markdown("### βš™οΈ Settings")
167
+
168
+ # Max length setting
169
+ max_length = st.slider(
170
+ "Maximum Output Length",
171
+ min_value=64,
172
+ max_value=512,
173
+ value=256,
174
+ step=32,
175
+ help="Maximum length of the generated translation"
176
+ )
177
+
178
+ # Model info
179
+ st.markdown("### πŸ“Š Model Information")
180
+ st.markdown(f"**Model:** johnlockejrr/opus-mt-arc-heb")
181
+ st.markdown(f"**Device:** {device.upper()}")
182
+ st.markdown(f"**Tokenizer:** {tokenizer.__class__.__name__}")
183
+ st.markdown(f"**Model Type:** {model.__class__.__name__}")
184
+ st.markdown(f"**Direction:** Samaritan Hebrew β†’ Samaritan Aramaic")
185
+
186
+ # Clear button
187
+ if st.button("πŸ—‘οΈ Clear All"):
188
+ st.rerun()
189
+
190
+ # Main content area
191
+ col1, col2 = st.columns([1, 1])
192
+
193
+ with col1:
194
+ st.markdown('<div class="input-area">', unsafe_allow_html=True)
195
+ st.markdown("### πŸ“ Input Text")
196
+
197
+ # Text input
198
+ input_text = st.text_area(
199
+ "Enter Samaritan Hebrew text to translate",
200
+ height=200,
201
+ placeholder="Enter your Samaritan Hebrew text here...",
202
+ help="Type or paste the Samaritan Hebrew text you want to translate to Samaritan Aramaic"
203
+ )
204
+
205
+ # Translate button
206
+ translate_button = st.button(
207
+ "πŸ”„ Translate to Samaritan Aramaic",
208
+ type="primary",
209
+ use_container_width=True
210
+ )
211
+ st.markdown('</div>', unsafe_allow_html=True)
212
+
213
+ with col2:
214
+ st.markdown('<div class="output-area">', unsafe_allow_html=True)
215
+ st.markdown("### 🎯 Samaritan Aramaic Translation")
216
+
217
+ if translate_button and input_text.strip():
218
+ with st.spinner("Translating to Samaritan Aramaic..."):
219
+ # Add a small delay for better UX
220
+ time.sleep(0.5)
221
+
222
+ translation = translate_text(
223
+ input_text,
224
+ "Hebrew to Aramaic",
225
+ tokenizer,
226
+ model,
227
+ device,
228
+ max_length
229
+ )
230
+
231
+ if translation:
232
+ st.markdown(f"**Samaritan Aramaic:**")
233
+ st.markdown(f"```\n{translation}\n```")
234
+
235
+ # Copy button
236
+ st.markdown(f"""
237
+ <div style="margin-top: 1rem;">
238
+ <button onclick="navigator.clipboard.writeText('{translation.replace("'", "\\'")}')"
239
+ style="background: rgba(255,255,255,0.2); border: 1px solid rgba(255,255,255,0.3);
240
+ color: white; padding: 0.5rem 1rem; border-radius: 5px; cursor: pointer;">
241
+ πŸ“‹ Copy Translation
242
+ </button>
243
+ </div>
244
+ """, unsafe_allow_html=True)
245
+ else:
246
+ st.error("Translation failed. Please try again.")
247
+ else:
248
+ st.markdown("*Samaritan Aramaic translation will appear here*")
249
+ st.markdown('</div>', unsafe_allow_html=True)
250
+
251
+ # Additional features
252
+ st.markdown("---")
253
+
254
+ # Batch translation section
255
+ st.markdown("### πŸ“š Batch Translation")
256
+ st.markdown("Upload a text file with multiple Samaritan Hebrew lines to translate them all to Samaritan Aramaic.")
257
+
258
+ uploaded_file = st.file_uploader(
259
+ "Choose a text file",
260
+ type=['txt'],
261
+ help="Upload a .txt file with one Samaritan Hebrew text per line"
262
+ )
263
+
264
+ if uploaded_file is not None:
265
+ try:
266
+ # Read file content
267
+ content = uploaded_file.read().decode('utf-8')
268
+ lines = [line.strip() for line in content.split('\n') if line.strip()]
269
+
270
+ if lines:
271
+ st.success(f"πŸ“„ Loaded {len(lines)} lines from {uploaded_file.name}")
272
+
273
+ if st.button("πŸ”„ Translate All to Samaritan Aramaic", type="primary"):
274
+ st.markdown("### πŸ“‹ Batch Translation Results")
275
+
276
+ # Create a progress bar
277
+ progress_bar = st.progress(0)
278
+ status_text = st.empty()
279
+
280
+ results = []
281
+ for i, line in enumerate(lines):
282
+ status_text.text(f"Translating line {i+1}/{len(lines)}: {line[:50]}...")
283
+
284
+ translation = translate_text(
285
+ line,
286
+ "Hebrew to Aramaic",
287
+ tokenizer,
288
+ model,
289
+ device,
290
+ max_length
291
+ )
292
+
293
+ results.append({
294
+ 'original': line,
295
+ 'translation': translation or "Translation failed"
296
+ })
297
+
298
+ # Update progress
299
+ progress_bar.progress((i + 1) / len(lines))
300
+
301
+ status_text.text("βœ… Translation complete!")
302
+
303
+ # Display results
304
+ for i, result in enumerate(results):
305
+ with st.expander(f"Line {i+1}: {result['original'][:50]}..."):
306
+ st.markdown(f"**Samaritan Hebrew:** {result['original']}")
307
+ st.markdown(f"**Samaritan Aramaic:** {result['translation']}")
308
+
309
+ # Download results
310
+ csv_content = "Samaritan Hebrew,Samaritan Aramaic\n"
311
+ for result in results:
312
+ csv_content += f'"{result["original"]}","{result["translation"]}"\n'
313
+
314
+ st.download_button(
315
+ label="πŸ“₯ Download Results as CSV",
316
+ data=csv_content,
317
+ file_name="samaritan_translations.csv",
318
+ mime="text/csv"
319
+ )
320
+
321
+ except Exception as e:
322
+ st.error(f"Error reading file: {str(e)}")
323
+
324
+ # Footer
325
+ st.markdown("---")
326
+ st.markdown("""
327
+ <div style="text-align: center; color: #666; padding: 2rem;">
328
+ <p>Built with ❀️ using Streamlit and Hugging Face Transformers</p>
329
+ <p>Samaritan Hebrew to Samaritan Aramaic Translation</p>
330
+ <p>Model: johnlockejrr/opus-mt-arc-heb</p>
331
+ </div>
332
+ """, unsafe_allow_html=True)
333
+
334
+ if __name__ == "__main__":
335
+ main()