Upload 2 files
Browse files
README.md
CHANGED
@@ -1,20 +1,63 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
-
sdk:
|
7 |
-
|
8 |
-
|
9 |
-
- streamlit
|
10 |
pinned: false
|
11 |
-
short_description: Translate from Samaritan Hebrew to Samaritan Aramaic
|
12 |
license: mit
|
13 |
---
|
14 |
|
15 |
-
#
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: Hebrew-Aramaic Translator
|
3 |
+
emoji: π
|
4 |
+
colorFrom: purple
|
5 |
+
colorTo: blue
|
6 |
+
sdk: streamlit
|
7 |
+
sdk_version: 1.28.0
|
8 |
+
app_file: app.py
|
|
|
9 |
pinned: false
|
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
+
# Hebrew-Aramaic Translator
|
14 |
|
15 |
+
A modern, interactive web application for translating from Samaritan Hebrew to Samaritan Aramaic using the Hugging Face model `johnlockejrr/opus-mt-arc-heb`.
|
16 |
|
17 |
+
## π Features
|
18 |
+
|
19 |
+
- **Samaritan Hebrew to Samaritan Aramaic Translation**: Specialized translation for Samaritan texts
|
20 |
+
- **Modern UI**: Clean, responsive design with gradient styling
|
21 |
+
- **Real-time Translation**: Instant translation with progress indicators
|
22 |
+
- **Batch Processing**: Upload text files for bulk translation
|
23 |
+
- **Copy to Clipboard**: Easy copying of translation results
|
24 |
+
- **Customizable Settings**: Adjustable output length
|
25 |
+
|
26 |
+
## π Try it out!
|
27 |
+
|
28 |
+
1. **Enter your Samaritan Hebrew text** in the input area
|
29 |
+
2. **Click "π Translate"** to get instant results
|
30 |
+
3. **Copy or download** your translations
|
31 |
+
|
32 |
+
## π Batch Translation
|
33 |
+
|
34 |
+
Upload a text file with multiple lines to translate them all at once. Results can be downloaded as a CSV file.
|
35 |
+
|
36 |
+
## π§ Model Information
|
37 |
+
|
38 |
+
- **Model**: `johnlockejrr/opus-mt-arc-heb`
|
39 |
+
- **Type**: Sequence-to-Sequence Translation Model
|
40 |
+
- **Framework**: Hugging Face Transformers + PyTorch
|
41 |
+
- **Languages**: Samaritan Hebrew β Samaritan Aramaic
|
42 |
+
|
43 |
+
## π οΈ Technical Details
|
44 |
+
|
45 |
+
- **Backend**: Streamlit + PyTorch + Transformers
|
46 |
+
- **Caching**: Model is cached for faster subsequent loads
|
47 |
+
- **Device**: Automatically uses GPU if available, falls back to CPU
|
48 |
+
- **Styling**: Custom CSS with gradient backgrounds and modern design
|
49 |
+
|
50 |
+
## π Usage Examples
|
51 |
+
|
52 |
+
### Samaritan Hebrew to Samaritan Aramaic
|
53 |
+
Input: `ΧΧΧΧΧ¨ ΧΧΧΧΧ ΧΧΧ Χ¨Χ§ΧΧ’ ΧΧͺΧΧ ΧΧΧΧ`
|
54 |
+
Output: `ΧΧΧΧ¨ ΧΧΧΧΧ ΧΧΧ Χ¨Χ§ΧΧ’ ΧΧΧ ΧΧΧ`
|
55 |
+
|
56 |
+
## π Related Links
|
57 |
+
|
58 |
+
- [Model on Hugging Face](https://huggingface.co/johnlockejrr/opus-mt-arc-heb)
|
59 |
+
- [Source Code](https://github.com/your-repo/hebrew-aramaic-translator)
|
60 |
+
|
61 |
+
## π License
|
62 |
+
|
63 |
+
This application uses the `johnlockejrr/opus-mt-arc-heb` model from Hugging Face. Please refer to the model's license for usage terms.
|
app.py
ADDED
@@ -0,0 +1,335 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
import torch
|
3 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
4 |
+
import time
|
5 |
+
from typing import Optional
|
6 |
+
import json
|
7 |
+
|
8 |
+
# Page configuration
|
9 |
+
st.set_page_config(
|
10 |
+
page_title="Hebrew-Aramaic Translator",
|
11 |
+
page_icon="π",
|
12 |
+
layout="wide",
|
13 |
+
initial_sidebar_state="expanded"
|
14 |
+
)
|
15 |
+
|
16 |
+
# Custom CSS for modern styling
|
17 |
+
st.markdown("""
|
18 |
+
<style>
|
19 |
+
.main-header {
|
20 |
+
font-size: 3rem;
|
21 |
+
font-weight: 700;
|
22 |
+
background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
|
23 |
+
-webkit-background-clip: text;
|
24 |
+
-webkit-text-fill-color: transparent;
|
25 |
+
text-align: center;
|
26 |
+
margin-bottom: 2rem;
|
27 |
+
}
|
28 |
+
|
29 |
+
.sub-header {
|
30 |
+
font-size: 1.2rem;
|
31 |
+
color: #666;
|
32 |
+
text-align: center;
|
33 |
+
margin-bottom: 3rem;
|
34 |
+
}
|
35 |
+
|
36 |
+
.translation-box {
|
37 |
+
background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
|
38 |
+
padding: 2rem;
|
39 |
+
border-radius: 15px;
|
40 |
+
box-shadow: 0 8px 32px rgba(0,0,0,0.1);
|
41 |
+
margin: 1rem 0;
|
42 |
+
}
|
43 |
+
|
44 |
+
.input-area {
|
45 |
+
background: white;
|
46 |
+
border-radius: 10px;
|
47 |
+
padding: 1.5rem;
|
48 |
+
box-shadow: 0 4px 16px rgba(0,0,0,0.05);
|
49 |
+
}
|
50 |
+
|
51 |
+
.output-area {
|
52 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
53 |
+
color: white;
|
54 |
+
border-radius: 10px;
|
55 |
+
padding: 1.5rem;
|
56 |
+
box-shadow: 0 4px 16px rgba(0,0,0,0.1);
|
57 |
+
}
|
58 |
+
|
59 |
+
.direction-selector {
|
60 |
+
background: white;
|
61 |
+
border-radius: 10px;
|
62 |
+
padding: 1rem;
|
63 |
+
box-shadow: 0 4px 16px rgba(0,0,0,0.05);
|
64 |
+
margin-bottom: 1rem;
|
65 |
+
}
|
66 |
+
|
67 |
+
.stButton > button {
|
68 |
+
background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
|
69 |
+
color: white;
|
70 |
+
border: none;
|
71 |
+
border-radius: 25px;
|
72 |
+
padding: 0.75rem 2rem;
|
73 |
+
font-weight: 600;
|
74 |
+
transition: all 0.3s ease;
|
75 |
+
}
|
76 |
+
|
77 |
+
.stButton > button:hover {
|
78 |
+
transform: translateY(-2px);
|
79 |
+
box-shadow: 0 8px 25px rgba(102, 126, 234, 0.4);
|
80 |
+
}
|
81 |
+
|
82 |
+
.model-info {
|
83 |
+
background: #f8f9fa;
|
84 |
+
border-radius: 10px;
|
85 |
+
padding: 1rem;
|
86 |
+
margin: 1rem 0;
|
87 |
+
border-left: 4px solid #667eea;
|
88 |
+
}
|
89 |
+
</style>
|
90 |
+
""", unsafe_allow_html=True)
|
91 |
+
|
92 |
+
@st.cache_resource
|
93 |
+
def load_model():
|
94 |
+
"""Load the Hugging Face model and tokenizer with caching."""
|
95 |
+
model_name = "johnlockejrr/opus-mt-arc-heb"
|
96 |
+
|
97 |
+
with st.spinner("Loading translation model..."):
|
98 |
+
try:
|
99 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
100 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
|
101 |
+
|
102 |
+
# Move to GPU if available
|
103 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
104 |
+
model.to(device)
|
105 |
+
model.eval()
|
106 |
+
|
107 |
+
return tokenizer, model, device
|
108 |
+
except Exception as e:
|
109 |
+
st.error(f"Error loading model: {str(e)}")
|
110 |
+
return None, None, None
|
111 |
+
|
112 |
+
def translate_text(text: str, direction: str, tokenizer, model, device: str, max_length: int = 512) -> Optional[str]:
|
113 |
+
"""Translate text using the loaded model."""
|
114 |
+
if not text.strip():
|
115 |
+
return None
|
116 |
+
|
117 |
+
try:
|
118 |
+
# Add language prefix based on direction (using the working back/inference.py logic)
|
119 |
+
if direction == "Hebrew to Aramaic":
|
120 |
+
input_text = f"<he> {text}"
|
121 |
+
else: # Aramaic to Hebrew
|
122 |
+
input_text = f"<ar> {text}"
|
123 |
+
|
124 |
+
# Tokenize input
|
125 |
+
inputs = tokenizer(
|
126 |
+
input_text,
|
127 |
+
return_tensors="pt",
|
128 |
+
max_length=max_length,
|
129 |
+
truncation=True,
|
130 |
+
padding=True
|
131 |
+
).to(device)
|
132 |
+
|
133 |
+
# Generate translation
|
134 |
+
with torch.no_grad():
|
135 |
+
outputs = model.generate(
|
136 |
+
**inputs,
|
137 |
+
max_length=max_length,
|
138 |
+
num_beams=4,
|
139 |
+
length_penalty=0.6,
|
140 |
+
early_stopping=True,
|
141 |
+
do_sample=False
|
142 |
+
)
|
143 |
+
|
144 |
+
# Decode output
|
145 |
+
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
146 |
+
return translation
|
147 |
+
|
148 |
+
except Exception as e:
|
149 |
+
st.error(f"Translation error: {str(e)}")
|
150 |
+
return None
|
151 |
+
|
152 |
+
def main():
|
153 |
+
# Header
|
154 |
+
st.markdown('<h1 class="main-header">π Samaritan Hebrew-Aramaic Translator</h1>', unsafe_allow_html=True)
|
155 |
+
st.markdown('<p class="sub-header">Powered by the johnlockejrr/opus-mt-arc-heb model</p>', unsafe_allow_html=True)
|
156 |
+
|
157 |
+
# Load model
|
158 |
+
tokenizer, model, device = load_model()
|
159 |
+
|
160 |
+
if tokenizer is None or model is None:
|
161 |
+
st.error("Failed to load the translation model. Please check your internet connection and try again.")
|
162 |
+
return
|
163 |
+
|
164 |
+
# Sidebar for settings
|
165 |
+
with st.sidebar:
|
166 |
+
st.markdown("### βοΈ Settings")
|
167 |
+
|
168 |
+
# Max length setting
|
169 |
+
max_length = st.slider(
|
170 |
+
"Maximum Output Length",
|
171 |
+
min_value=64,
|
172 |
+
max_value=512,
|
173 |
+
value=256,
|
174 |
+
step=32,
|
175 |
+
help="Maximum length of the generated translation"
|
176 |
+
)
|
177 |
+
|
178 |
+
# Model info
|
179 |
+
st.markdown("### π Model Information")
|
180 |
+
st.markdown(f"**Model:** johnlockejrr/opus-mt-arc-heb")
|
181 |
+
st.markdown(f"**Device:** {device.upper()}")
|
182 |
+
st.markdown(f"**Tokenizer:** {tokenizer.__class__.__name__}")
|
183 |
+
st.markdown(f"**Model Type:** {model.__class__.__name__}")
|
184 |
+
st.markdown(f"**Direction:** Samaritan Hebrew β Samaritan Aramaic")
|
185 |
+
|
186 |
+
# Clear button
|
187 |
+
if st.button("ποΈ Clear All"):
|
188 |
+
st.rerun()
|
189 |
+
|
190 |
+
# Main content area
|
191 |
+
col1, col2 = st.columns([1, 1])
|
192 |
+
|
193 |
+
with col1:
|
194 |
+
st.markdown('<div class="input-area">', unsafe_allow_html=True)
|
195 |
+
st.markdown("### π Input Text")
|
196 |
+
|
197 |
+
# Text input
|
198 |
+
input_text = st.text_area(
|
199 |
+
"Enter Samaritan Hebrew text to translate",
|
200 |
+
height=200,
|
201 |
+
placeholder="Enter your Samaritan Hebrew text here...",
|
202 |
+
help="Type or paste the Samaritan Hebrew text you want to translate to Samaritan Aramaic"
|
203 |
+
)
|
204 |
+
|
205 |
+
# Translate button
|
206 |
+
translate_button = st.button(
|
207 |
+
"π Translate to Samaritan Aramaic",
|
208 |
+
type="primary",
|
209 |
+
use_container_width=True
|
210 |
+
)
|
211 |
+
st.markdown('</div>', unsafe_allow_html=True)
|
212 |
+
|
213 |
+
with col2:
|
214 |
+
st.markdown('<div class="output-area">', unsafe_allow_html=True)
|
215 |
+
st.markdown("### π― Samaritan Aramaic Translation")
|
216 |
+
|
217 |
+
if translate_button and input_text.strip():
|
218 |
+
with st.spinner("Translating to Samaritan Aramaic..."):
|
219 |
+
# Add a small delay for better UX
|
220 |
+
time.sleep(0.5)
|
221 |
+
|
222 |
+
translation = translate_text(
|
223 |
+
input_text,
|
224 |
+
"Hebrew to Aramaic",
|
225 |
+
tokenizer,
|
226 |
+
model,
|
227 |
+
device,
|
228 |
+
max_length
|
229 |
+
)
|
230 |
+
|
231 |
+
if translation:
|
232 |
+
st.markdown(f"**Samaritan Aramaic:**")
|
233 |
+
st.markdown(f"```\n{translation}\n```")
|
234 |
+
|
235 |
+
# Copy button
|
236 |
+
st.markdown(f"""
|
237 |
+
<div style="margin-top: 1rem;">
|
238 |
+
<button onclick="navigator.clipboard.writeText('{translation.replace("'", "\\'")}')"
|
239 |
+
style="background: rgba(255,255,255,0.2); border: 1px solid rgba(255,255,255,0.3);
|
240 |
+
color: white; padding: 0.5rem 1rem; border-radius: 5px; cursor: pointer;">
|
241 |
+
π Copy Translation
|
242 |
+
</button>
|
243 |
+
</div>
|
244 |
+
""", unsafe_allow_html=True)
|
245 |
+
else:
|
246 |
+
st.error("Translation failed. Please try again.")
|
247 |
+
else:
|
248 |
+
st.markdown("*Samaritan Aramaic translation will appear here*")
|
249 |
+
st.markdown('</div>', unsafe_allow_html=True)
|
250 |
+
|
251 |
+
# Additional features
|
252 |
+
st.markdown("---")
|
253 |
+
|
254 |
+
# Batch translation section
|
255 |
+
st.markdown("### π Batch Translation")
|
256 |
+
st.markdown("Upload a text file with multiple Samaritan Hebrew lines to translate them all to Samaritan Aramaic.")
|
257 |
+
|
258 |
+
uploaded_file = st.file_uploader(
|
259 |
+
"Choose a text file",
|
260 |
+
type=['txt'],
|
261 |
+
help="Upload a .txt file with one Samaritan Hebrew text per line"
|
262 |
+
)
|
263 |
+
|
264 |
+
if uploaded_file is not None:
|
265 |
+
try:
|
266 |
+
# Read file content
|
267 |
+
content = uploaded_file.read().decode('utf-8')
|
268 |
+
lines = [line.strip() for line in content.split('\n') if line.strip()]
|
269 |
+
|
270 |
+
if lines:
|
271 |
+
st.success(f"π Loaded {len(lines)} lines from {uploaded_file.name}")
|
272 |
+
|
273 |
+
if st.button("π Translate All to Samaritan Aramaic", type="primary"):
|
274 |
+
st.markdown("### π Batch Translation Results")
|
275 |
+
|
276 |
+
# Create a progress bar
|
277 |
+
progress_bar = st.progress(0)
|
278 |
+
status_text = st.empty()
|
279 |
+
|
280 |
+
results = []
|
281 |
+
for i, line in enumerate(lines):
|
282 |
+
status_text.text(f"Translating line {i+1}/{len(lines)}: {line[:50]}...")
|
283 |
+
|
284 |
+
translation = translate_text(
|
285 |
+
line,
|
286 |
+
"Hebrew to Aramaic",
|
287 |
+
tokenizer,
|
288 |
+
model,
|
289 |
+
device,
|
290 |
+
max_length
|
291 |
+
)
|
292 |
+
|
293 |
+
results.append({
|
294 |
+
'original': line,
|
295 |
+
'translation': translation or "Translation failed"
|
296 |
+
})
|
297 |
+
|
298 |
+
# Update progress
|
299 |
+
progress_bar.progress((i + 1) / len(lines))
|
300 |
+
|
301 |
+
status_text.text("β
Translation complete!")
|
302 |
+
|
303 |
+
# Display results
|
304 |
+
for i, result in enumerate(results):
|
305 |
+
with st.expander(f"Line {i+1}: {result['original'][:50]}..."):
|
306 |
+
st.markdown(f"**Samaritan Hebrew:** {result['original']}")
|
307 |
+
st.markdown(f"**Samaritan Aramaic:** {result['translation']}")
|
308 |
+
|
309 |
+
# Download results
|
310 |
+
csv_content = "Samaritan Hebrew,Samaritan Aramaic\n"
|
311 |
+
for result in results:
|
312 |
+
csv_content += f'"{result["original"]}","{result["translation"]}"\n'
|
313 |
+
|
314 |
+
st.download_button(
|
315 |
+
label="π₯ Download Results as CSV",
|
316 |
+
data=csv_content,
|
317 |
+
file_name="samaritan_translations.csv",
|
318 |
+
mime="text/csv"
|
319 |
+
)
|
320 |
+
|
321 |
+
except Exception as e:
|
322 |
+
st.error(f"Error reading file: {str(e)}")
|
323 |
+
|
324 |
+
# Footer
|
325 |
+
st.markdown("---")
|
326 |
+
st.markdown("""
|
327 |
+
<div style="text-align: center; color: #666; padding: 2rem;">
|
328 |
+
<p>Built with β€οΈ using Streamlit and Hugging Face Transformers</p>
|
329 |
+
<p>Samaritan Hebrew to Samaritan Aramaic Translation</p>
|
330 |
+
<p>Model: johnlockejrr/opus-mt-arc-heb</p>
|
331 |
+
</div>
|
332 |
+
""", unsafe_allow_html=True)
|
333 |
+
|
334 |
+
if __name__ == "__main__":
|
335 |
+
main()
|