Abakar Sylla's picture

20

Abakar Sylla

abakrsylla

AI & ML interests

LLM, zeroth order optimization

Recent Activity

upvoted a paper 26 days ago

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

upvoted a paper 26 days ago

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

upvoted a paper 26 days ago

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

View all activity

Organizations

None yet

upvoted 20 papers 26 days ago

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper • 2507.12841 • Published Jul 17 • 40

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Paper • 2507.13332 • Published Jul 17 • 48

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

Paper • 2507.13344 • Published Jul 17 • 55

π^3: Scalable Permutation-Equivariant Visual Geometry Learning

Paper • 2507.13347 • Published Jul 17 • 64

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17 • 72

The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

Paper • 2507.11097 • Published Jul 15 • 63

Streaming 4D Visual Geometry Transformer

Paper • 2507.11539 • Published Jul 15 • 14

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Paper • 2507.16812 • Published Jul 22 • 62

Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory

Paper • 2507.16713 • Published Jul 22 • 21

Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22 • 33

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Paper • 2507.16815 • Published Jul 22 • 37

Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers

Paper • 2507.08422 • Published Jul 11 • 35

Step-Audio 2 Technical Report

Paper • 2507.16632 • Published Jul 22 • 61

TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance

Paper • 2507.18192 • Published about 1 month ago • 7

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published 29 days ago • 29

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

Paper • 2507.20880 • Published 27 days ago • 10

Met^2Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems

Paper • 2507.17189 • Published Jul 23 • 12

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Paper • 2507.21049 • Published 26 days ago • 40

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published 29 days ago • 141

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

Paper • 2507.20939 • Published 27 days ago • 56