Writer

Enterprise
company
Verified
Activity Feed

AI & ML interests

AGI, LLMs, Knowledge Graph, Palmyra, Domain Specific LLM

Recent Activity

Writer's activity

melisaΒ 
in Writer/FailSafeQA 8 days ago

Fix paper link

#3 opened 8 days ago by
nielsr
samjulienΒ 
posted an update 3 months ago
view post
Post
1511
πŸ”₯ RAG in just a few lines of code?!

Try out our Hacker News Listener with new built-in RAG capabilities and Palmyra X 004 from the team at Writer!

This Writer Framework app:

- Scrapes up to 500 HN stories and comments
- Uploads them to a Knowledge Graph
- Enables interactive chat with the content using graph-based RAG
- Provides source attribution with every response

The best part? Setting up RAG is now incredibly simple - just a few lines of code to connect your Knowledge Graph as a tool with Palmyra X 004.

πŸ€— Space: samjulien/hacker-news-listener
πŸ’» Code: https://github.com/writer/framework-tutorials/tree/main/hacker-news-social-listener
melisaΒ 
posted an update 6 months ago
view post
Post
3063
πŸ”₯ Introducing "Writing in the Margins (WiM)" - better inference pattern for long context LLMs that solves the Lost-in-the-Middle problem πŸ”₯

Paper page: Writing in the Margins: Better Inference Pattern for Long Context Retrieval (2408.14906)

TL;DR
Make your model write "margin notes" as you chunk prefill the KV cache. Then ask it reread all notes before it speaks up.
Works with humans, works with AI πŸ€–

WiM leverages the chunked prefill of the key-value cache, which concurrently generates query-based extractive summaries at each step of the prefill that are subsequently reintegrated at the end of the computation. We term these intermediate outputs β€œmargins”, drawing inspiration from the practice of making margin notes for improved comprehension of long contexts in human reading. We show that this technique, which adds only minimal additional computation, significantly improves LLMs long context reasoning capabilities.

Think: Every chunk has a chance to be attended to/ be at the end of the context at least once. πŸŽ‰

πŸ“Š Results:
- An average accuracy boost of 7.5% in multi-hop reasoning tasks like HotpotQA and MultiHop-RAG.
- Even a 30% increase in F1-score for summarisation-like tasks (CWE).

Plus, WiM fits seamlessly into interactive applications (think: progress bar!). It can provide real-time progress updates during data retrieval and integration, making it user-friendly and transparent - a stark contrast to feeding 1mln tokens to an LLMs and waiting 6 min for the first token. 🀯

πŸ‘©β€πŸ’»πŸ§‘β€πŸ’» Check it out and contribute to our open-source project here: https://github.com/writer/writing-in-the-margins

🧠 More about chunked prefill: https://docs.vllm.ai/en/latest/models/performance.html#chunked-prefill
  • 2 replies
Β·