Stefano Fiorucci PRO

anakin87

AI & ML interests

Contributing to Haystack LLM framework ๐Ÿ—๏ธ. Language Models: orchestration, post-training, synthetic data...

Recent Activity

reacted to as-cle-bert's post with โค๏ธ 5 days ago
One of the biggest challenges I've been facing since I started developing [๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks๐Ÿซฃ That's why today I'm excited to introduce ๐ซ๐ž๐š๐๐ž๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐ŸŽ‰ With ๐˜ณ๐˜ฆ๐˜ข๐˜ฅ๐˜ฆ๐˜ณ๐˜ด, you can choose among three (for now๐Ÿ‘€) flavors of text extraction and conversion to PDF: - ๐——๐—ผ๐—ฐ๐—น๐—ถ๐—ป๐—ด, which does a fantastic work with presentations, spreadsheets and word documents๐Ÿฆ† - ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ๐—ฃ๐—ฎ๐—ฟ๐˜€๐—ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐Ÿฆ™ - ๐— ๐—ฎ๐—ฟ๐—ธ๐—œ๐˜๐——๐—ผ๐˜„๐—ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โœ’๏ธ You can use this new feature in your python scripts (check the attached code snippet!๐Ÿ˜‰) and in the command line interface as well!๐Ÿ Have fun and don't forget to star the repo on GitHub โžก๏ธ https://github.com/AstraBert/PdfItDown
upvoted a collection 7 days ago
Qwen Scheduler GRPO
View all activity

Organizations

deepset's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture Hugging Face Discord Community's profile picture

anakin87's activity