Building Lectūra AI | CS Grad Student @BIT | AI/ML Research: Autonomous Agents, LLMs | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy
I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕 It just works (well I had to add some guardrails) but still saves 5% of memory usage: The Patch: - Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod) - Handles arbitrary sequence lengths by padding to the nearest multiple of 4. - An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step. - Uses 5% less ops Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb Paper: https://arxiv.org/pdf/2505.09814
You can now edit operations with a discrete flow model, supercool👍! It's amazing to see the progress on DFM within one year since its introduction - literally my litmus test for how fast the field is progressing: 1st Introduced (2024): https://arxiv.org/abs/2402.04997 Discrete Flow Matching (2024): https://arxiv.org/abs/2407.15595 Edit Discrete Flow (2025): https://arxiv.org/pdf/2506.09018 Looking forward to a SaaS level reach like that of dLLMs e.g Mercury by inception labs 🚀
bumped into one of the OG reads today!! handwriting generation & synthesis is still my favorite application of RNNs - supper amazed at how such a small model (3.6M params), trained overnight on cpu could reach such peak performance. Huge credit to the data (IAM-OnDB🔥) which was meticulously curated using an infra-red device to track pen position. Try demo here: https://www.calligrapher.ai/ Code: https://github.com/sjvasquez/handwriting-synthesis