emerging trend: models that can understand image + text and generate image + text
don't miss out β€΅οΈ > MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA > BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL both by ByteDance! π±
bumped into one of the OG reads today!! handwriting generation & synthesis is still my favorite application of RNNs - supper amazed at how such a small model (3.6M params), trained overnight on cpu could reach such peak performance. Huge credit to the data (IAM-OnDBπ₯) which was meticulously curated using an infra-red device to track pen position. Try demo here: https://www.calligrapher.ai/ Code: https://github.com/sjvasquez/handwriting-synthesis