Spaces:
Build error
Build error
metadata
title: Tilmash Translator
sdk: docker
pinned: false
$yaml = @"
title: Tilmash Translator sdk: streamlit app_file: main.py python_version: 3.11 pinned: false
"@ $orig = Get-Content -Raw README.md Set-Content README.md $yaml -Encoding UTF8 Add-Content README.md $orig
TilmashВ Translator
Tilmash Translator is an offline‑first, privacy‑preserving translation and readability toolkit for Russian, English and Kazakh.
It ships as a Streamlit web‑app and offers two core capabilities:
- Neural Machine Translation
• Primary model — ISSAI/tilmash (Seq2Seq) for RU ↔ EN ↔ KK
• Long‑text fallback — Gemma‑3 12B (GGUF) running locally withllama‑cpp-python (+ optional GPU layers)
• Smart chunking & streaming make multi‑page documents feel snappy - Readability Analysis
• Calculates Flesch Reading Ease, Flesch‑Kincaid, Gunning Fog and SMOG
• Highlights complex words and supports RU/EN/KK
QuickВ Start
# 1. Clone & create a virtual environment
$ git clone https://github.com/medetshatayev/Tilmash_Translator.git
$ cd Tilmash_Translator
$ python3 -m venv .venv && source .venv/bin/activate
# 2. Install dependencies
$ pip install -r requirements.txt
# 3. (optional) authenticate once to download the Tilmash weights
$ echo "HF_TOKEN=рџЄ„your_huggingface_token" > .env
# 4. Launch the Streamlit app
$ streamlit run main.py
💡 The helper script start.sh automates the above and sets safe memory limits for llama‑cpp-python.
GPU Off‑loading (Gemma‑3)
Set GEMMA_GPU_LAYERS=<num_layers> in your environment (defaults to 48) to off‑load those layers to Metal/CUDA.
ProjectВ Layout
.
в”њв”Ђв”Ђ main.py # Streamlit UI
в”њв”Ђв”Ђ utils/ # Translation & analysis helpers
в”‚ в”њв”Ђв”Ђ tilmash_translation.py
в”‚ в”њв”Ђв”Ђ gemma_translation.py
в”‚ в”њв”Ђв”Ђ readability_indices.py
в”‚ в””в”Ђв”Ђ ...
в”њв”Ђв”Ђ models/ # Extra resources (NLTK, etc.)
├── config.py # Default env‑vars
в”њв”Ђв”Ђ start.sh # Convenience launcher
в””в”Ђв”Ђ requirements.txt # Python deps
ConfigurationВ Keys
| Variable | Default | Purpose |
|---|---|---|
GEMMA_GPU_LAYERS |
48 | Layers to move to GPU (0 = CPU‑only) |
GEMMA_CONTEXT_SIZE |
8192 | Context window for Gemma‑3 |
MAX_PARALLEL_MODELS |
4 | Concurrency guard |
MAX_TOKENS |
4096 | Generation cap per request |
CHUNK_SIZE |
3000 | Token threshold before auto‑chunking |
Override any of these via the environment or edit config.py.
HowВ ItВ Works
- File ingestion —
.txt,.docx,.pdfloaded viautils/file_readers.py - Language detection —
langdetect(auto‑detect option in UI) - Translation pipeline — <3000 tokens translate directly; longer texts are chunked (
utils/chunking.py) and streamed through Tilmash or Gemma‑3 - Readability analysis — scores computed in
utils/readability_indices.pyand color‑coded in the app.
License
Distributed under the MIT License — see LICENSE for details.