KevinHuSh commited on
Commit
6943c52
·
1 Parent(s): 41c7a59

refine README (#72)

Browse files

* refine README

* Update README.md

Files changed (1) hide show
  1. deepdoc/README.md +3 -11
deepdoc/README.md CHANGED
@@ -1,8 +1,6 @@
1
  English | [简体中文](./README_zh.md)
2
 
3
- #*Deep*Doc
4
-
5
- ---
6
 
7
  - [1. Introduction](#1)
8
  - [2. Vision](#2)
@@ -11,7 +9,6 @@ English | [简体中文](./README_zh.md)
11
  <a name="1"></a>
12
  ## 1. Introduction
13
 
14
- ---
15
  With a bunch of documents from various domains with various formats and along with diverse retrieval requirements,
16
  an accurate analysis becomes a very challenge task. *Deep*Doc is born for that purpose.
17
  There 2 parts in *Deep*Doc so far: vision and parser.
@@ -19,8 +16,6 @@ There 2 parts in *Deep*Doc so far: vision and parser.
19
  <a name="2"></a>
20
  ## 2. Vision
21
 
22
- ---
23
-
24
  We use vision information to resolve problems as human being.
25
  - OCR. Since a lot of documents presented as images or at least be able to transform to image,
26
  OCR is a very essential and fundamental or even universal solution for text extraction.
@@ -64,19 +59,16 @@ We use vision information to resolve problems as human being.
64
  <a name="3"></a>
65
  ## 3. Parser
66
 
67
- ---
68
-
69
  Four kinds of document formats as PDF, DOCX, EXCEL and PPT have their corresponding parser.
70
  The most complex one is PDF parser since PDF's flexibility. The output of PDF parser includes:
71
  - Text chunks with their own positions in PDF(page number and rectangular positions).
72
  - Tables with cropped image from the PDF, and contents which has already translated into natural language sentences.
73
  - Figures with caption and text in the figures.
74
 
75
- ###Résumé
76
 
77
- ---
78
  The résumé is a very complicated kind of document. A résumé which is composed of unstructured text
79
  with various layouts could be resolved into structured data composed of nearly a hundred of fields.
80
  We haven't opened the parser yet, as we open the processing method after parsing procedure.
81
 
82
-
 
1
  English | [简体中文](./README_zh.md)
2
 
3
+ # *Deep*Doc
 
 
4
 
5
  - [1. Introduction](#1)
6
  - [2. Vision](#2)
 
9
  <a name="1"></a>
10
  ## 1. Introduction
11
 
 
12
  With a bunch of documents from various domains with various formats and along with diverse retrieval requirements,
13
  an accurate analysis becomes a very challenge task. *Deep*Doc is born for that purpose.
14
  There 2 parts in *Deep*Doc so far: vision and parser.
 
16
  <a name="2"></a>
17
  ## 2. Vision
18
 
 
 
19
  We use vision information to resolve problems as human being.
20
  - OCR. Since a lot of documents presented as images or at least be able to transform to image,
21
  OCR is a very essential and fundamental or even universal solution for text extraction.
 
59
  <a name="3"></a>
60
  ## 3. Parser
61
 
 
 
62
  Four kinds of document formats as PDF, DOCX, EXCEL and PPT have their corresponding parser.
63
  The most complex one is PDF parser since PDF's flexibility. The output of PDF parser includes:
64
  - Text chunks with their own positions in PDF(page number and rectangular positions).
65
  - Tables with cropped image from the PDF, and contents which has already translated into natural language sentences.
66
  - Figures with caption and text in the figures.
67
 
68
+ ### Résumé
69
 
 
70
  The résumé is a very complicated kind of document. A résumé which is composed of unstructured text
71
  with various layouts could be resolved into structured data composed of nearly a hundred of fields.
72
  We haven't opened the parser yet, as we open the processing method after parsing procedure.
73
 
74
+