|
|
--- |
|
|
language: |
|
|
- en |
|
|
- de |
|
|
tags: |
|
|
- parser |
|
|
- parsing |
|
|
- PDF |
|
|
- pdfplumber |
|
|
- txt |
|
|
- tables |
|
|
- python |
|
|
- windows |
|
|
- RAG |
|
|
--- |
|
|
|
|
|
--- |
|
|
|
|
|
# <b>PDF to TXT converter ready to chunck for your RAG</b> |
|
|
<b>ONLY WINDOWS</b><br> |
|
|
<b>EXE and PY aviable (en and german)</b><br> |
|
|
|
|
|
<b>⇨</b> give me a ❤️, if you like ;)<br><br> |
|
|
|
|
|
Most LLM applications only convert you PDF simple to txt, nothing more, its like you save your PDF as txt file. Often textblocks a mixed and tables not readable. therefore its bit better to convert it with some help of a <b>parser</b>.<br> |
|
|
I work with "pdfplumber/pdfminer" none OCR, so its very fast! I also have a "docling" parser in progress with OCR, but i think it will only be the python-file, not comiled.<br> |
|
|
<ul style="line-height: 1.05;"> |
|
|
<li>Works with single and multi pdf list, works with folder</li> |
|
|
<li>Intelligent multiprocessing</li> |
|
|
<li>Error tolerant, that means if your PDF is not convertible, it will be skipped</li> |
|
|
<li>Instant view of the result</li> |
|
|
<li>Converts some common tables as json to txt files</li> |
|
|
<li>It adds the absolute PAGE number to each page</li> |
|
|
<li>All txt files will be created in original folder of PDF</li> |
|
|
</ul> |
|
|
<br> |
|
|
This I have created with my brain and the help of chatGPT, sorry so I will not fulfill any wishes unless there are real errors.<br> |
|
|
It is really hard for me with GUI and the Function and in addition to compile it.<br> |
|
|
<br> |
|
|
now have fun and leave a comment if you like ;) |
|
|
|
|
|
|