kalle07's picture
Update README.md
4e8efba verified
|
raw
history blame
1.44 kB
---
language:
- en
- de
tags:
- parser
- parsing
- PDF
- pdfplumber
- txt
- tables
- python
- windows
- RAG
---
---
# <b>PDF to TXT converter ready to chunck for your RAG</b>
<b>ONLY WINDOWS</b><br>
<b>EXE and PY aviable (en and german)</b><br>
<b>&#x21e8;</b> give me a ❤️, if you like ;)<br><br>
Most LLM applications only convert you PDF simple to txt, nothing more, its like you save your PDF as txt file. Often textblocks a mixed and tables not readable. therefore its bit better to convert it with some help of a <b>parser</b>.<br>
I work with "pdfplumber/pdfminer" none OCR, so its very fast! I also have a "docling" parser in progress with OCR, but i think it will only be the python-file, not comiled.<br>
<ul style="line-height: 1.05;">
<li>Works with single and multi pdf list, works with folder</li>
<li>Intelligent multiprocessing</li>
<li>Error tolerant, that means if your PDF is not convertible, it will be skipped</li>
<li>Instant view of the result</li>
<li>Converts some common tables as json to txt files</li>
<li>It adds the absolute PAGE number to each page</li>
<li>All txt files will be created in original folder of PDF</li>
</ul>
<br>
This I have created with my brain and the help of chatGPT, sorry so I will not fulfill any wishes unless there are real errors.<br>
It is really hard for me with GUI and the Function and in addition to compile it.<br>
<br>
now have fun and leave a comment if you like ;)