John Leon's picture

44 1

John Leon

1john

AI & ML interests

vision models & TTS

Organizations

None yet

1john's activity

upvoted 44 papers about 2 months ago

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29 • 33

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Paper • 2407.20179 • Published Jul 29 • 45

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29 • 53

Meltemi: The first open Large Language Model for Greek

Paper • 2407.20743 • Published Jul 30 • 67

Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework

Paper • 2407.20729 • Published Jul 30 • 25

Harvesting Textual and Structured Data from the HAL Publication Repository

Paper • 2407.20595 • Published Jul 30 • 21

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30 • 23

Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models

Paper • 2407.19914 • Published Jul 29 • 12

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29 • 47

CodePlan: Repository-level Coding using LLMs and Planning

Paper • 2309.12499 • Published Sep 21, 2023 • 73

Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77

FACET: Fairness in Computer Vision Evaluation Benchmark

Paper • 2309.00035 • Published Aug 31, 2023 • 16

YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 47

EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Paper • 2403.18118 • Published Mar 26 • 9

FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing

Paper • 2403.18605 • Published Mar 27 • 6

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 51

Scaling Open-Vocabulary Object Detection

Paper • 2306.09683 • Published Jun 16, 2023 • 13

Robot Learning with Sensorimotor Pre-training

Paper • 2306.10007 • Published Jun 16, 2023 • 13

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Paper • 2306.10012 • Published Jun 16, 2023 • 35

Teach LLMs to Personalize -- An Approach inspired by Writing Education

Paper • 2308.07968 • Published Aug 15, 2023 • 25

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Paper • 2308.08545 • Published Aug 16, 2023 • 33

GARField: Group Anything with Radiance Fields

Paper • 2401.09419 • Published Jan 17 • 17

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17 • 58

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

Paper • 2403.09977 • Published Mar 15 • 9

Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding

Paper • 2403.10395 • Published Mar 15 • 7

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 30

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Paper • 2403.04746 • Published Mar 7 • 22

StableDrag: Stable Dragging for Point-based Image Editing

Paper • 2403.04437 • Published Mar 7 • 25

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 61

Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

Paper • 2407.13759 • Published Jul 18 • 17

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

Paper • 2406.13897 • Published May 30 • 12

Shape of Motion: 4D Reconstruction from a Single Video

Paper • 2407.13764 • Published Jul 18 • 19

The Vision of Autonomic Computing: Can LLMs Make It a Reality?

Paper • 2407.14402 • Published Jul 19 • 13

SciCode: A Research Coding Benchmark Curated by Scientists

Paper • 2407.13168 • Published Jul 18 • 13

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17 • 18

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Paper • 2407.10960 • Published Jul 15 • 10

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22 • 38

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Paper • 2407.14561 • Published Jul 18 • 33

Cross Anything: General Quadruped Robot Navigation through Complex Terrains

Paper • 2407.16412 • Published Jul 23 • 4

A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data

Paper • 2407.16680 • Published Jul 23 • 11

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Paper • 2407.14505 • Published Jul 19 • 24

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

Paper • 2407.17438 • Published Jul 24 • 23

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Paper • 2407.16154 • Published Jul 23 • 20