Jaward Sesay

Jaward

AI & ML interests

Building Lectūra AI | CS Grad Student @BIT | AI/ML Research: Autonomous Agents, LLMs | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

upvoted a paper about 3 hours ago

WebSailor: Navigating Super-human Reasoning for Web Agent

posted an update 5 days ago

I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕 It just works (well I had to add some guardrails) but still saves 5% of memory usage: The Patch: - Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod) - Handles arbitrary sequence lengths by padding to the nearest multiple of 4. - An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step. - Uses 5% less ops Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb Paper: https://arxiv.org/pdf/2505.09814

liked a model 8 days ago

google/gemma-3n-E4B-it

View all activity

Organizations

upvoted a paper about 3 hours ago

WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published 3 days ago • 67

posted an update 5 days ago

Post

1965

I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕
It just works (well I had to add some guardrails) but still saves 5% of memory usage:
The Patch:
- Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod)
- Handles arbitrary sequence lengths by padding to the nearest multiple of 4.
- An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step.
- Uses 5% less ops
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb
Paper: https://arxiv.org/pdf/2505.09814

liked a model 8 days ago

google/gemma-3n-E4B-it

Image-Text-to-Text • 8B • Updated 3 days ago • 197k • 465

posted an update 9 days ago

Post

2289

Mind2Web 2 is out - this time featuring eval and benchmark for deep research🔥
Paper: Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge (2506.21506)
Project: https://osu-nlp-group.github.io/Mind2Web-2/

upvoted a paper 9 days ago

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Paper • 2506.21506 • Published 9 days ago • 45

upvoted a paper 10 days ago

DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning

Paper • 2506.16012 • Published 17 days ago • 21

posted an update 12 days ago

Post

3406

Awesome intro to LLM course "Language Modeling from Scratch" by stanford. love the aesthetics behind the lecture notes, notes-in-code genius idea👍
Course site: https://stanford-cs336.github.io/spring2025/
Repo: https://github.com/stanford-cs336/spring2025-lectures
Videos: https://www.youtube.com/playlist?list=PLoROMvodv4rOY23Y0BoGoBGgQ1zmU_MT_

2 replies

posted an update 17 days ago

Post

1421

not sure of what to make of this but solving autonomous/selective reflection seems like a big deal in current agent frameworks. We did hit on this with iterative self-refinement in our AutoAgents framework (https://ijcai.org/proceedings/2024/0003.pdf). Nice read, looking forward to the code.
Paper: Scaling Test-time Compute for LLM Agents (2506.12928)

upvoted a paper 17 days ago

Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published 20 days ago • 60

upvoted a paper 18 days ago

Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model

Paper • 2506.13642 • Published 19 days ago • 26

updated a Space 21 days ago

Lectura Demo

🔥

Lectūra: Your AI Genie for Self-taught Mastery.

replied to their post 22 days ago

will cook a deep dive tutorial on dfms sometime next week, the math is nolonger scary after taking this course:)
https://diffusion.csail.mit.edu/

posted an update 22 days ago

Post

1388

You can now edit operations with a discrete flow model, supercool👍! It's amazing to see the progress on DFM within one year since its introduction - literally my litmus test for how fast the field is progressing:
1st Introduced (2024): https://arxiv.org/abs/2402.04997
Discrete Flow Matching (2024): https://arxiv.org/abs/2407.15595
Edit Discrete Flow (2025): https://arxiv.org/pdf/2506.09018
Looking forward to a SaaS level reach like that of dLLMs e.g Mercury by inception labs 🚀

1 reply

upvoted a paper 26 days ago

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Paper • 2506.07491 • Published 27 days ago • 38

upvoted an article about 1 month ago

Article

KV Cache from scratch in nanoVLM

and 4 others •

Jun 4

• 81

posted an update about 1 month ago

Post

1169

bumped into one of the OG reads today!! handwriting generation & synthesis is still my favorite application of RNNs - supper amazed at how such a small model (3.6M params), trained overnight on cpu could reach such peak performance. Huge credit to the data (IAM-OnDB🔥) which was meticulously curated using an infra-red device to track pen position.
Try demo here: https://www.calligrapher.ai/
Code: https://github.com/sjvasquez/handwriting-synthesis