view article Article Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques By jmamou and 8 others • about 21 hours ago • 8
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published Feb 13 • 16
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published Feb 13 • 16
view article Article Universal Assisted Generation: Faster Decoding with Any Assistant Model By danielkorat and 7 others • Oct 29, 2024 • 52
view article Article Faster Assisted Generation with Dynamic Speculation By jmamou and 6 others • Oct 8, 2024 • 46
CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity Paper • 2404.10513 • Published Apr 16, 2024 • 2
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 37
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 37
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 18
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 18
view article Article Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon By danielkorat and 5 others • Apr 3, 2024 • 11
view article Article A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake By juliensimon and 5 others • Mar 20, 2024 • 6
view article Article CPU Optimized Embeddings with 🤗 Optimum Intel and fastRAG By peterizsak and 5 others • Mar 15, 2024 • 9