Andrew's picture

4

Andrew

mendeza

·

http://andrewmendez.me

AI & ML interests

Computer Vision, Deep Learning

Recent Activity

upvoted a paper 16 days ago

Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency

reacted to akhaliq's post with ❤️ 20 days ago

Leave No Context Behind Efficient Infinite Context Transformers with Infini-attention https://huggingface.co/papers/2404.07143 This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

upvoted an article 23 days ago

Tiny Agents: a MCP-powered agent in 50 lines of code

View all activity

Organizations

models 3

mendeza/opt-125m-synthetic-finetuned3

Text Generation • 0.1B • Updated Feb 14 • 42

mendeza/opt-125m-synthetic-finetuned

Text Generation • 0.1B • Updated Feb 13 • 8

mendeza/Llama-3.2-3B-Instruct

Text Generation • 3B • Updated Feb 12 • 10

datasets 0

None public yet