Suraponn commited on
Commit
c2ef0ab
·
verified ·
1 Parent(s): 4c9bbb3

update_readme2

Browse files
Files changed (1) hide show
  1. README.md +50 -8
README.md CHANGED
@@ -5,23 +5,65 @@ language:
5
  - th
6
  base_model:
7
  - Qwen/Qwen2.5-VL-7B-Instruct
 
 
8
  ---
9
 
10
- Typhoon OCR is an open-source, bilingual document parsing model built specifically for real-world documents in Thai and English. Inspired by models like olmOCR, Typhoon OCR introduces a redesigned architecture that is:
11
- Robust to noisy inputs and complex, irregular layouts
12
 
13
 
14
- Multilingual, with dedicated support for both Thai and English
15
 
16
 
17
- Layout-aware, preserving the document’s structural integrity in its output
18
 
 
 
 
 
19
 
20
- Unlike conventional OCR tools, Typhoon OCR doesn't just extract raw text—it produces semantic, structured, and layout-preserving outputs that are optimized for downstream tasks such as:
21
- Retrieval-Augmented Generation (RAG)
22
 
 
23
 
24
- Comprehensive document parsing and understanding
25
 
 
 
 
 
26
 
27
- Accurate interpretation of tables, charts, and forms
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - th
6
  base_model:
7
  - Qwen/Qwen2.5-VL-7B-Instruct
8
+ tags:
9
+ - OCR
10
  ---
11
 
 
 
12
 
13
 
14
+ **Typhoon-OCR-7B**: A bilingual document parsing model built specifically for real-world documents in Thai and English inspired by models like olmOCR.
15
 
16
 
17
+ ## **Model Description**
18
 
19
+ - **Model type**: A 7B Vision-Language Models (VLMs) model based on Qwen2.5-VL-Instruction.
20
+ - **Requirement**: transformers 4.50.0 or newer.
21
+ - **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
22
+ - **License**:
23
 
 
 
24
 
25
+ ## **Real-World Document Support**
26
 
27
+ **1. Structured Documents**: Financial reports, Academic papers, Books, Government forms
28
 
29
+ **Output format**:
30
+ - Markdown for general text
31
+ - HTML for tables (including merged cells and complex layouts)
32
+ - Figures, charts, and diagrams are represented using figure tags for structured visual understanding
33
 
34
+ **Each figure undergoes multi-layered interpretation**:
35
+ - **Observation**: Detects elements like landscapes, buildings, people, logos, and embedded text
36
+ - **Context Analysis**: Infers context such as location, event, or document section
37
+ - **Text Recognition**: Extracts and interprets embedded text (e.g., chart labels, captions) in Thai or English
38
+ - **Artistic & Structural Analysis**: Captures layout style, diagram type, or design choices contributing to document tone
39
+ - **Final Summary**: Combines all insights into a structured figure description for tasks like summarization and retrieval
40
+
41
+
42
+ **2. Layout-Heavy & Informal Documents**: Receipts, Menus papers, Tickets, Infographics
43
+
44
+ **Output format**:
45
+ - Markdown with embedded tables and layout-aware structures
46
+
47
+
48
+ ## Summary of Findings
49
+ Typhoon OCR outperforms both GPT-4o and Gemini 2.5 Flash in Thai document understanding, particularly on documents with complex layouts and mixed-language content.
50
+ However, in the Thai books benchmark, performance slightly declined due to the high frequency and diversity of embedded figures. These images vary significantly in type and structure, which poses challenges for our current figure tag parsing. This highlights a potential area for future improvement—specifically, in enhancing the model's image understanding capabilities.
51
+ For this version, our primary focus has been on achieving high-quality OCR for both English and Thai text. Future releases may extend support to more advanced image analysis and figure interpretation.
52
+
53
+ ## Usage Example
54
+
55
+
56
+ ## **Citation**
57
+
58
+ - If you find Typhoon2 useful for your work, please cite it using:
59
+ ```
60
+ @misc{typhoon2,
61
+ title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
62
+ author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
63
+ year={2024},
64
+ eprint={2412.13702},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.CL},
67
+ url={https://arxiv.org/abs/2412.13702},
68
+ }
69
+ ```