IrokoBench Collection a human-translated benchmark dataset for 16 African languages covering three tasks: NLI, MMLU and MGSM • 6 items • Updated May 31 • 17
Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper • 2403.13257 • Published Mar 20 • 19
Pretrained Text-Generation Models Below 250M Parameters Collection Great candidates for fine-tuning targeting Transformers.js, ordered by number of parameters. • 8 items • Updated Aug 10 • 7
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16 • 30
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 264 items • Updated Jun 22 • 392
Trained Models 🏋️ Collection They may be small, but they're training like giants! • 8 items • Updated May 13 • 16
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation Paper • 2310.08185 • Published Oct 12, 2023 • 6
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 36
ChatGPT-Mini Collection A collection of fine-tuned GPT-2 models each designed to deploy a ChatGPT-like model at home. These models can also be deployed on an old computer. • 8 items • Updated Nov 16, 2023 • 4
smol llama Collection 🚧"raw" pretrained smol_llama checkpoints - WIP 🚧 • 4 items • Updated Apr 29 • 6
Indic language fine-tunes Collection Halted State: Attempting to create acceptable quality fine-tunes of different models • 1 item • Updated Nov 23, 2023 • 1
PIC (Partner-in-Crime) project Collection Empathetic, small, really useful personalised models. • 3 items • Updated Dec 10, 2023 • 2
Cramp(ed) Models Collection Smaller models trained locally on my 2xA6000 Lambda Vector • 3 items • Updated Oct 10, 2023 • 1
Shrink Llama - V1 Collection Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept. • 2 items • Updated Sep 12, 2023 • 2
GPT2-Linear Collection GPT2 Models using Linear layers instead of Conv layers for convenience. • 6 items • Updated Sep 9, 2023 • 1
read papers Collection This is a collection of some papers I've read in the past few months • 10 items • Updated Nov 21, 2023 • 47
Instruction-Following Evaluation for Large Language Models Paper • 2311.07911 • Published Nov 14, 2023 • 19
KAI Large Language Models Collection All of the KAI LLMs in one collection. The KAI models are a series of lightweight LLMs ranging from 1 Billion parameters to 7 Billion parameters • 5 items • Updated Nov 14, 2023 • 2
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 43
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 490
TinyKAI Large Language Models Collection All of the TinyKAI LLMs in one collection. The TinyKAI models are a series of extremely lightweight LLMs under 5 Billion parameters. • 3 items • Updated Nov 14, 2023 • 2
Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf Paper • 2309.04658 • Published Sep 9, 2023 • 2
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models Paper • 2307.09793 • Published Jul 19, 2023 • 46
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models Paper • 2310.20499 • Published Oct 31, 2023 • 7
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning Paper • 2310.20587 • Published Oct 31, 2023 • 16
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper • 2310.19909 • Published Oct 30, 2023 • 20
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation Paper • 2311.00272 • Published Nov 1, 2023 • 9
Controllable Music Production with Diffusion Models and Guidance Gradients Paper • 2311.00613 • Published Nov 1, 2023 • 24
De-Diffusion Makes Text a Strong Cross-Modal Interface Paper • 2311.00618 • Published Nov 1, 2023 • 21
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 40
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 56
E3 TTS: Easy End-to-End Diffusion-based Text to Speech Paper • 2311.00945 • Published Nov 2, 2023 • 14