kalle07
/

pdf2txt_parser_converter

Model card Files Files and versions

kalle07 commited on May 16

Commit

c0373dd

·

verified ·

1 Parent(s): 474ebe6

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -21,8 +21,8 @@ tags:
 better input = better output<br>
 <b>&#x21e8;</b> give me a ❤️, if you like  ;)<br><br>
----
 Most LLM applications only convert your PDF simple to txt, nothing more, its like you save your PDF as txt file. Blocks of text that are close together are often mixed up and tables cannot be read logically.
 Therefore its better to convert it with some help of a <b>"Parser"</b>. The embedder can now find a better context.<br>
@@ -42,7 +42,7 @@ This I have created with my brain and the help of chatGPT, Iam not a coder... so
 It is really hard for me with GUI and the Function and in addition to compile it.<br>
 For the python-file oc you need to import missing libraries.<br>
 <br>
----
 <br>
 I also have a "<b>docling</b>" parser with OCR (GPU is need for fast processing), its only be a python-file, not compiled.<br>
 You have to download all libs, and if you start (first time) internal also OCR models are downloaded. At the moment i have prepared a kind of multi docling,

 better input = better output<br>
 <b>&#x21e8;</b> give me a ❤️, if you like  ;)<br><br>
+...
 Most LLM applications only convert your PDF simple to txt, nothing more, its like you save your PDF as txt file. Blocks of text that are close together are often mixed up and tables cannot be read logically.
 Therefore its better to convert it with some help of a <b>"Parser"</b>. The embedder can now find a better context.<br>
 It is really hard for me with GUI and the Function and in addition to compile it.<br>
 For the python-file oc you need to import missing libraries.<br>
 <br>
+...
 <br>
 I also have a "<b>docling</b>" parser with OCR (GPU is need for fast processing), its only be a python-file, not compiled.<br>
 You have to download all libs, and if you start (first time) internal also OCR models are downloaded. At the moment i have prepared a kind of multi docling,