BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper ⢠2206.15076 ⢠Published Jun 30, 2022 ⢠4
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure Paper ⢠2504.10049 ⢠Published 25 days ago ⢠3
view post Post 1929 Super grateful to @marriola for the release of the block diffusion code and model. I'm generating text with diffusion locally! Couldn't be more pleased. See translation 2 replies ¡ đ 4 4 đ 1 1 + Reply
COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition Paper ⢠2503.07259 ⢠Published Mar 10
WebGames: Challenging General-Purpose Web-Browsing AI Agents Paper ⢠2502.18356 ⢠Published Feb 25 ⢠12
Bridging the Data Provenance Gap Across Text, Speech and Video Paper ⢠2412.17847 ⢠Published Dec 19, 2024 ⢠9
view post Post 1544 Tired: shitposting on bskyWired: shitposting on hf See translation 1 reply ¡ đ§ 8 8 + Reply
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper ⢠2501.04682 ⢠Published Jan 8 ⢠97
RoLargeSum: A Large Dialect-Aware Romanian News Dataset for Summary, Headline, and Keyword Generation Paper ⢠2412.11317 ⢠Published Dec 15, 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper ⢠2412.13663 ⢠Published Dec 18, 2024 ⢠149
DateLogicQA: Benchmarking Temporal Biases in Large Language Models Paper ⢠2412.13377 ⢠Published Dec 17, 2024 ⢠2
view post Post 1823 The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot See translation 4 replies ¡ đĽ 5 5 đ 1 1 đ 1 1 + Reply
view post Post 2414 The Lichess database of games, puzzles, and engine evaluations is now on the Hub: Lichess Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! âď¸ đ¤- Lichess/positions-datasets-66f50837db5cd3287d60d489- Lichess/games-datasets-66f508df78f4b43e1bb2d353 See translation đ 7 7 â¤ď¸ 2 2 đĽ 2 2 + Reply
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper ⢠2412.02980 ⢠Published Dec 4, 2024 ⢠15
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks Paper ⢠2411.01192 ⢠Published Nov 2, 2024 ⢠3
GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems Paper ⢠2410.20643 ⢠Published Oct 28, 2024
DM-Codec: Distilling Multimodal Representations for Speech Tokenization Paper ⢠2410.15017 ⢠Published Oct 19, 2024 ⢠2
RoQLlama: A Lightweight Romanian Adapted Language Model Paper ⢠2410.04269 ⢠Published Oct 5, 2024