Spaces:

asasasaasasa
/

tilmash-gemma3-translator

Build error

App Files Files Community

tilmash-gemma3-translator / README.md

asasasaasasa

Upload README.md with huggingface_hub

1359d8e verified 3 months ago

preview code

raw

history blame

3.92 kB

metadata

title: Tilmash Translator
sdk: docker
pinned: false

$yaml = @"

title: Tilmash Translator sdk: streamlit app_file: main.py python_version: 3.11 pinned: false

"@ $orig = Get-Content -Raw README.md Set-Content README.md $yaml -Encoding UTF8 Add-Content README.md $orig

TilmashВ Translator

TilmashВ Translator is an offlineвЂ‘first, privacyвЂ‘preserving translation and readability toolkit for Russian, EnglishВ andВ Kazakh.

It ships as a Streamlit webвЂ‘app and offers two core capabilities:

Neural Machine Translation
вЂўВ Primary modelВ вЂ” ISSAI/tilmash (Seq2Seq) for RUВ в†”В ENВ в†”В KK
вЂўВ LongвЂ‘text fallbackВ вЂ” GemmaвЂ‘3 12B (GGUF) running locally with llamaвЂ‘cpp-pythonВ (+ optional GPU layers)
вЂўВ Smart chunking & streaming make multiвЂ‘page documents feel snappy
Readability Analysis
вЂўВ Calculates FleschВ ReadingВ Ease, FleschвЂ‘Kincaid, GunningВ Fog andВ SMOG
вЂўВ Highlights complex words and supports RU/EN/KK

QuickВ Start

# 1. Clone & create a virtual environment
$ git clone https://github.com/medetshatayev/Tilmash_Translator.git
$ cd Tilmash_Translator
$ python3 -m venv .venv && source .venv/bin/activate

# 2. Install dependencies
$ pip install -r requirements.txt

# 3. (optional) authenticate once to download the Tilmash weights
$ echo "HF_TOKEN=рџЄ„your_huggingface_token" > .env

# 4. Launch the Streamlit app
$ streamlit run main.py

рџ’ЎВ The helper script start.sh automates the above and sets safe memory limits for llamaвЂ‘cpp-python.

GPU OffвЂ‘loading (GemmaвЂ‘3)

Set GEMMA_GPU_LAYERS=<num_layers> in your environment (defaults to 48) to offвЂ‘load those layers to Metal/CUDA.

ProjectВ Layout

.
в”њв”Ђв”Ђ main.py               # Streamlit UI
в”њв”Ђв”Ђ utils/                # Translation & analysis helpers
в”‚   в”њв”Ђв”Ђ tilmash_translation.py
в”‚   в”њв”Ђв”Ђ gemma_translation.py
в”‚   в”њв”Ђв”Ђ readability_indices.py
в”‚   в””в”Ђв”Ђ ...
в”њв”Ђв”Ђ models/               # Extra resources (NLTK, etc.)
в”њв”Ђв”Ђ config.py             # Default envвЂ‘vars
в”њв”Ђв”Ђ start.sh              # Convenience launcher
в””в”Ђв”Ђ requirements.txt      # Python deps

ConfigurationВ Keys

Variable	Default	Purpose
`GEMMA_GPU_LAYERS`	48	Layers to move to GPU (0В = CPUвЂ‘only)
`GEMMA_CONTEXT_SIZE`	8192	Context window for GemmaвЂ‘3
`MAX_PARALLEL_MODELS`	4	Concurrency guard
`MAX_TOKENS`	4096	Generation cap per request
`CHUNK_SIZE`	3000	Token threshold before autoвЂ‘chunking

Override any of these via the environment or edit config.py.

HowВ ItВ Works

File ingestionВ вЂ” .txt, .docx, .pdf loaded via utils/file_readers.py
Language detectionВ вЂ” langdetect (autoвЂ‘detect option in UI)
Translation pipelineВ вЂ” <3000 tokens translate directly; longer texts are chunked (utils/chunking.py) and streamed through Tilmash or GemmaвЂ‘3
Readability analysisВ вЂ” scores computed in utils/readability_indices.py and colorвЂ‘coded in the app.

License

Distributed under the MIT License вЂ” see LICENSE for details.