kalle07
/

pdf2txt_parser_converter

Model card Files Files and versions

pdf2txt_parser_converter / README.md

kalle07's picture

Update README.md

4e8efba verified 6 months ago

|

1.44 kB

	---
	language:
	- en
	- de
	tags:
	- parser
	- parsing
	- PDF
	- pdfplumber
	- txt
	- tables
	- python
	- windows
	- RAG
	---

	---

	# <b>PDF to TXT converter ready to chunck for your RAG</b>
	<b>ONLY WINDOWS</b><br>
	<b>EXE and PY aviable (en and german)</b><br>

	<b>⇨</b> give me a ❤️, if you like ;)<br><br>

	Most LLM applications only convert you PDF simple to txt, nothing more, its like you save your PDF as txt file. Often textblocks a mixed and tables not readable. therefore its bit better to convert it with some help of a <b>parser</b>.<br>
	I work with "pdfplumber/pdfminer" none OCR, so its very fast! I also have a "docling" parser in progress with OCR, but i think it will only be the python-file, not comiled.<br>
	<ul style="line-height: 1.05;">
	<li>Works with single and multi pdf list, works with folder</li>
	<li>Intelligent multiprocessing</li>
	<li>Error tolerant, that means if your PDF is not convertible, it will be skipped</li>
	<li>Instant view of the result</li>
	<li>Converts some common tables as json to txt files</li>
	<li>It adds the absolute PAGE number to each page</li>
	<li>All txt files will be created in original folder of PDF</li>
	</ul>
	<br>
	This I have created with my brain and the help of chatGPT, sorry so I will not fulfill any wishes unless there are real errors.<br>
	It is really hard for me with GUI and the Function and in addition to compile it.<br>
	<br>
	now have fun and leave a comment if you like ;)