Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
hllj 's Collections
Pruning
PEFT
Quantization
Technical Report
(Continued) Pretraining
RLHF
Architectures
Retrieval Augmented Generation
Framework
Dataset
Dataset Processing Technique
Insight Paper
Vision-Language Model
Image-Text Models
Speculative Decoding
Code LLMs

Architectures

updated May 1, 2024
Upvote
-

  • Larimar: Large Language Models with Episodic Memory Control

    Paper • 2403.11901 • Published Mar 18, 2024 • 34

  • Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

    Paper • 2212.05055 • Published Dec 9, 2022 • 5

  • Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

    Paper • 2404.02258 • Published Apr 2, 2024 • 106

  • Multi-Head Mixture-of-Experts

    Paper • 2404.15045 • Published Apr 23, 2024 • 61

  • KAN: Kolmogorov-Arnold Networks

    Paper • 2404.19756 • Published Apr 30, 2024 • 113
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs