view article Article How to build a custom text classifier without days of human labeling By sdiazlor • 27 days ago • 54
Unifying Multimodal Retrieval via Document Screenshot Embedding Paper • 2406.11251 • Published Jun 17 • 9
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data Paper • 2402.15343 • Published Feb 23 • 12
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval Mar 22 • 63
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 156
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien • May 23 • 15
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer Paper • 2311.08526 • Published Nov 14, 2023 • 9
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30 • 47
Automated Unit Test Improvement using Large Language Models at Meta Paper • 2402.09171 • Published Feb 14 • 5
Zero-shot text classification models Collection Collection of the best zero-shot text classification models. Fine-tune them with few examples using LiqFit - https://github.com/Knowledgator/LiqFit. • 9 items • Updated Sep 10 • 9
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent Paper • 2304.09542 • Published Apr 19, 2023 • 4
LLaMA: Open and Efficient Foundation Language Models Paper • 2302.13971 • Published Feb 27, 2023 • 13
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 242
Seven Failure Points When Engineering a Retrieval Augmented Generation System Paper • 2401.05856 • Published Jan 11 • 2
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 112
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks Paper • 2305.05862 • Published May 10, 2023 • 4
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 181