Spaces:

asasasaasasa
/

tilmash-gemma3-translator

Build error

App Files Files Community

asasasaasasa commited on Aug 10

Commit

1359d8e

verified ·

1 Parent(s): d27b48d

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +104 -17

README.md CHANGED Viewed

@@ -1,19 +1,106 @@
----
-title: Tilmash Gemma3 Translator
-emoji: 🚀
-colorFrom: red
-colorTo: red
 sdk: docker
-app_port: 8501
-tags:
-- streamlit
 pinned: false
-short_description: Streamlit template space
----
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

+---
+title: Tilmash Translator
 sdk: docker
 pinned: false
+---
+$yaml = @"
+---
+title: Tilmash Translator
+sdk: streamlit
+app_file: main.py
+python_version: 3.11
+pinned: false
+---
+"@
+$orig = Get-Content -Raw README.md
+Set-Content README.md $yaml -Encoding UTF8
+Add-Content README.md $orig
+# TilmashВ Translator
+**TilmashВ Translator** is an offlineвЂ‘first, privacyвЂ‘preserving translation and readability toolkit for Russian, EnglishВ andВ Kazakh.
+It ships as a Streamlit webвЂ‘app and offers two core capabilities:
+1. **Neural Machine Translation**
+   вЂўВ Primary modelВ вЂ” [ISSAI/tilmash](https://huggingface.co/issai/tilmash) (Seq2Seq) for RUВ в†”В ENВ в†”В KK
+   вЂўВ LongвЂ‘text fallbackВ вЂ” *GemmaвЂ‘3* 12B (GGUF) running locally with `llamaвЂ‘cpp-python`В (+ optional GPU layers)
+   вЂўВ Smart chunking & streaming make multiвЂ‘page documents feel snappy
+2. **Readability Analysis**
+   вЂўВ Calculates FleschВ ReadingВ Ease, FleschвЂ‘Kincaid, GunningВ Fog andВ SMOG
+   вЂўВ Highlights complex words and supports RU/EN/KK
+---
+## QuickВ Start
+```bash
+# 1. Clone & create a virtual environment
+$ git clone https://github.com/medetshatayev/Tilmash_Translator.git
+$ cd Tilmash_Translator
+$ python3 -m venv .venv && source .venv/bin/activate
+# 2. Install dependencies
+$ pip install -r requirements.txt
+# 3. (optional) authenticate once to download the Tilmash weights
+$ echo "HF_TOKEN=рџЄ„your_huggingface_token" > .env
+# 4. Launch the Streamlit app
+$ streamlit run main.py
+```
+рџ’ЎВ The helper script `start.sh` automates the above and sets safe memory limits for `llamaвЂ‘cpp-python`.
+### GPU OffвЂ‘loading (GemmaвЂ‘3)
+Set `GEMMA_GPU_LAYERS=<num_layers>` in your environment (defaults to **48**) to offвЂ‘load those layers to Metal/CUDA.
+---
+## ProjectВ Layout
+```
+.
+в”њв”Ђв”Ђ main.py               # Streamlit UI
+в”њв”Ђв”Ђ utils/                # Translation & analysis helpers
+в”‚   в”њв”Ђв”Ђ tilmash_translation.py
+в”‚   в”њв”Ђв”Ђ gemma_translation.py
+в”‚   в”њв”Ђв”Ђ readability_indices.py
+в”‚   в””в”Ђв”Ђ ...
+в”њв”Ђв”Ђ models/               # Extra resources (NLTK, etc.)
+в”њв”Ђв”Ђ config.py             # Default envвЂ‘vars
+в”њв”Ђв”Ђ start.sh              # Convenience launcher
+в””в”Ђв”Ђ requirements.txt      # Python deps
+```
+## ConfigurationВ Keys
+| Variable               | Default | Purpose                                   |
+|------------------------|---------|-------------------------------------------|
+| `GEMMA_GPU_LAYERS`     | 48      | Layers to move to GPU (0В = CPUвЂ‘only)      |
+| `GEMMA_CONTEXT_SIZE`   | 8192    | Context window for GemmaвЂ‘3                |
+| `MAX_PARALLEL_MODELS`  | 4       | Concurrency guard                         |
+| `MAX_TOKENS`           | 4096    | Generation cap per request                |
+| `CHUNK_SIZE`           | 3000    | Token threshold before autoвЂ‘chunking      |
+Override any of these via the environment or edit **config.py**.
+---
+## HowВ ItВ Works
+1. **File ingestion**В вЂ” `.txt`, `.docx`, `.pdf` loaded via `utils/file_readers.py`
+2. **Language detection**В вЂ” `langdetect` (autoвЂ‘detect option in UI)
+3. **Translation pipeline**В вЂ” <3000 tokens translate directly; longer texts are chunked (`utils/chunking.py`) and streamed through Tilmash or GemmaвЂ‘3
+4. **Readability analysis**В вЂ” scores computed in `utils/readability_indices.py` and colorвЂ‘coded in the app.
+---
+## License
+Distributed under the MIT License вЂ” see `LICENSE` for details.