8.5
TFLOPS
as-cle-bert
1584
followers
ยท
40 following
AI & ML interests
Biology + Artificial Intelligence = โค๏ธ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source
Recent Activity
posted
an
update
about 2 months ago
Hey there, ๐ถ๐ป๐ด๐ฒ๐๐-๐ฎ๐ป๐๐๐ต๐ถ๐ป๐ด ๐๐ญ.๐ฌ.๐ฌ just dropped with major changes:
โ
Embeddings: now works with Sentence Transformers, Jina AI, Cohere, OpenAI, and Model2Vec
All powered via ๐๐ต๐ผ๐ป๐ธ๐ถ๐ฒโ๐ ๐๐๐๐ผ๐๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด๐.
No more local-only limitations ๐
โ
Vector DBs: now supports ๐ฎ๐น๐น ๐๐น๐ฎ๐บ๐ฎ๐๐ป๐ฑ๐ฒ๐
-๐ฐ๐ผ๐บ๐ฝ๐ฎ๐๐ถ๐ฏ๐น๐ฒ ๐ฏ๐ฎ๐ฐ๐ธ๐ฒ๐ป๐ฑ๐
Think: Qdrant, Pinecone, Weaviate, Milvus, etc.
No more bottlenecks๐
โ
File parsing: now plugs into any ๐๐น๐ฎ๐บ๐ฎ๐๐ป๐ฑ๐ฒ๐
-๐ฐ๐ผ๐บ๐ฝ๐ฎ๐๐ถ๐ฏ๐น๐ฒ ๐ฑ๐ฎ๐๐ฎ ๐น๐ผ๐ฎ๐ฑ๐ฒ๐ฟ
Using LlamaParse, Docling or your own setup? Youโre covered.
Curious of knowing more? Try it out! ๐ https://github.com/AstraBert/ingest-anything
posted
an
update
about 2 months ago
One of the biggest challenges I've been facing since I started developing [๐๐๐๐๐ญ๐๐จ๐ฐ๐ง](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks๐ซฃ
That's why today I'm excited to introduce ๐ซ๐๐๐๐๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐
With ๐ณ๐ฆ๐ข๐ฅ๐ฆ๐ณ๐ด, you can choose among three (for now๐) flavors of text extraction and conversion to PDF:
- ๐๐ผ๐ฐ๐น๐ถ๐ป๐ด, which does a fantastic work with presentations, spreadsheets and word documents๐ฆ
- ๐๐น๐ฎ๐บ๐ฎ๐ฃ๐ฎ๐ฟ๐๐ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐ฆ
- ๐ ๐ฎ๐ฟ๐ธ๐๐๐๐ผ๐๐ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โ๏ธ
You can use this new feature in your python scripts (check the attached code snippet!๐) and in the command line interface as well!๐
Have fun and don't forget to star the repo on GitHub โก๏ธ https://github.com/AstraBert/PdfItDown
View all activity
Organizations
Viewer
โข
Updated
Dec 30, 2024
โข
20
โข
39
โข
4
as-cle-bert/architecture_vs_normal_image_prompts
Viewer
โข
Updated
Nov 8, 2024
โข
6k
โข
15
โข
1
Viewer
โข
Updated
Jun 3, 2024
โข
2.43k
โข
46
as-cle-bert/saccaromyces-cerevisiae-base
Viewer
โข
Updated
Apr 16, 2024
โข
368
โข
16
โข
1
as-cle-bert/AMR-Gene-Families
Viewer
โข
Updated
Apr 1, 2024
โข
1.5k
โข
22
โข
1
as-cle-bert/scerevisiae-proteins-reduced
Viewer
โข
Updated
Apr 1, 2024
โข
600
โข
18
as-cle-bert/plastic-enzymes
Viewer
โข
Updated
Apr 1, 2024
โข
1.64k
โข
15
โข
1
as-cle-bert/scerevisiae-transcripts-biotypes
Viewer
โข
Updated
Mar 31, 2024
โข
6.72k
โข
31
โข
1
as-cle-bert/breastcancer-semantic-segmentation
Viewer
โข
Updated
Mar 31, 2024
โข
40
โข
18
as-cle-bert/banana-disease-classification
Viewer
โข
Updated
Mar 31, 2024
โข
777
โข
64
โข
2
as-cle-bert/breastcancer-auto-objdetect
Viewer
โข
Updated
Mar 30, 2024
โข
547
โข
49
โข
1
as-cle-bert/breastcancer-auto-segmentation
Viewer
โข
Updated
Mar 30, 2024
โข
547
โข
18
โข
1
as-cle-bert/breastcanc-ultrasound-class
Viewer
โข
Updated
Mar 29, 2024
โข
647
โข
68
โข
2
as-cle-bert/VirBiCla-training
Viewer
โข
Updated
Mar 20, 2024
โข
60k
โข
11
โข
1
as-cle-bert/genetics-arxiv-wiki
Viewer
โข
Updated
Mar 7, 2024
โข
23.3k
โข
19
โข
2