asasasaasasa commited on
Commit
1359d8e
·
verified ·
1 Parent(s): d27b48d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +104 -17
README.md CHANGED
@@ -1,19 +1,106 @@
1
- ---
2
- title: Tilmash Gemma3 Translator
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
  sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
  pinned: false
11
- short_description: Streamlit template space
12
- ---
13
-
14
- # Welcome to Streamlit!
15
-
16
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
17
-
18
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
19
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Tilmash Translator
 
 
 
3
  sdk: docker
 
 
 
4
  pinned: false
5
+ ---
6
+ $yaml = @"
7
+ ---
8
+ title: Tilmash Translator
9
+ sdk: streamlit
10
+ app_file: main.py
11
+ python_version: 3.11
12
+ pinned: false
13
+ ---
14
+ "@
15
+ $orig = Get-Content -Raw README.md
16
+ Set-Content README.md $yaml -Encoding UTF8
17
+ Add-Content README.md $orig
18
+
19
+
20
+
21
+ # TilmashВ Translator
22
+
23
+ **Tilmash Translator** is an offline‑first, privacy‑preserving translation and readability toolkit for Russian, English and Kazakh.
24
+
25
+ It ships as a Streamlit web‑app and offers two core capabilities:
26
+
27
+ 1. **Neural Machine Translation**
28
+ • Primary model — [ISSAI/tilmash](https://huggingface.co/issai/tilmash) (Seq2Seq) for RU ↔ EN ↔ KK
29
+ • Long‑text fallback — *Gemma‑3* 12B (GGUF) running locally with `llama‑cpp-python` (+ optional GPU layers)
30
+ • Smart chunking & streaming make multi‑page documents feel snappy
31
+ 2. **Readability Analysis**
32
+ • Calculates Flesch Reading Ease, Flesch‑Kincaid, Gunning Fog and SMOG
33
+ • Highlights complex words and supports RU/EN/KK
34
+
35
+
36
+ ---
37
+
38
+ ## QuickВ Start
39
+
40
+ ```bash
41
+ # 1. Clone & create a virtual environment
42
+ $ git clone https://github.com/medetshatayev/Tilmash_Translator.git
43
+ $ cd Tilmash_Translator
44
+ $ python3 -m venv .venv && source .venv/bin/activate
45
+
46
+ # 2. Install dependencies
47
+ $ pip install -r requirements.txt
48
+
49
+ # 3. (optional) authenticate once to download the Tilmash weights
50
+ $ echo "HF_TOKEN=рџЄ„your_huggingface_token" > .env
51
+
52
+ # 4. Launch the Streamlit app
53
+ $ streamlit run main.py
54
+ ```
55
+
56
+ 💡 The helper script `start.sh` automates the above and sets safe memory limits for `llama‑cpp-python`.
57
+
58
+ ### GPU Off‑loading (Gemma‑3)
59
+
60
+ Set `GEMMA_GPU_LAYERS=<num_layers>` in your environment (defaults to **48**) to off‑load those layers to Metal/CUDA.
61
+
62
+ ---
63
+
64
+ ## ProjectВ Layout
65
+
66
+ ```
67
+ .
68
+ в”њв”Ђв”Ђ main.py # Streamlit UI
69
+ в”њв”Ђв”Ђ utils/ # Translation & analysis helpers
70
+ в”‚ в”њв”Ђв”Ђ tilmash_translation.py
71
+ в”‚ в”њв”Ђв”Ђ gemma_translation.py
72
+ в”‚ в”њв”Ђв”Ђ readability_indices.py
73
+ в”‚ в””в”Ђв”Ђ ...
74
+ в”њв”Ђв”Ђ models/ # Extra resources (NLTK, etc.)
75
+ ├── config.py # Default env‑vars
76
+ в”њв”Ђв”Ђ start.sh # Convenience launcher
77
+ в””в”Ђв”Ђ requirements.txt # Python deps
78
+ ```
79
+
80
+ ## ConfigurationВ Keys
81
+
82
+ | Variable | Default | Purpose |
83
+ |------------------------|---------|-------------------------------------------|
84
+ | `GEMMA_GPU_LAYERS` | 48 | Layers to move to GPU (0 = CPU‑only) |
85
+ | `GEMMA_CONTEXT_SIZE` | 8192 | Context window for Gemma‑3 |
86
+ | `MAX_PARALLEL_MODELS` | 4 | Concurrency guard |
87
+ | `MAX_TOKENS` | 4096 | Generation cap per request |
88
+ | `CHUNK_SIZE` | 3000 | Token threshold before auto‑chunking |
89
+
90
+ Override any of these via the environment or edit **config.py**.
91
+
92
+ ---
93
+
94
+ ## HowВ ItВ Works
95
+
96
+ 1. **File ingestion** — `.txt`, `.docx`, `.pdf` loaded via `utils/file_readers.py`
97
+ 2. **Language detection** — `langdetect` (auto‑detect option in UI)
98
+ 3. **Translation pipeline** — <3000 tokens translate directly; longer texts are chunked (`utils/chunking.py`) and streamed through Tilmash or Gemma‑3
99
+ 4. **Readability analysis** — scores computed in `utils/readability_indices.py` and color‑coded in the app.
100
+
101
+ ---
102
+
103
+ ## License
104
+
105
+ Distributed under the MIT License — see `LICENSE` for details.
106
+