Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 98
view article Article How to build a custom text classifier without days of human labeling By sdiazlor • Oct 17, 2024 • 55
Unifying Multimodal Retrieval via Document Screenshot Embedding Paper • 2406.11251 • Published Jun 17, 2024 • 10
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 • 72
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data Paper • 2402.15343 • Published Feb 23, 2024 • 13
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval Mar 22, 2024 • 70
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 173
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien • May 23, 2024 • 16
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer Paper • 2311.08526 • Published Nov 14, 2023 • 9
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30, 2024 • 47
Automated Unit Test Improvement using Large Language Models at Meta Paper • 2402.09171 • Published Feb 14, 2024 • 5
Zero-shot text classification models Collection Collection of the best zero-shot text classification models. Fine-tune them with few examples using LiqFit - https://github.com/Knowledgator/LiqFit. • 9 items • Updated Sep 10, 2024 • 10
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent Paper • 2304.09542 • Published Apr 19, 2023 • 4
LLaMA: Open and Efficient Foundation Language Models Paper • 2302.13971 • Published Feb 27, 2023 • 13
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 243
Seven Failure Points When Engineering a Retrieval Augmented Generation System Paper • 2401.05856 • Published Jan 11, 2024 • 2
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 12 items • Updated 12 days ago • 123
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks Paper • 2305.05862 • Published May 10, 2023 • 4