HUANG SHAOHAN's picture

9 23 4

HUANG SHAOHAN

buaahsh

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 9 days ago

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

upvoted a paper 25 days ago

Geometric-Mean Policy Optimization

upvoted a paper 2 months ago

Reasoning with Exploration: An Entropy Perspective

View all activity

Organizations

authored a paper 4 months ago

BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published Apr 16 • 74

authored a paper 8 months ago

GeAR: Generation Augmented Retrieval

Paper • 2501.02772 • Published Jan 6 • 23

authored 2 papers 9 months ago

On Domain-Specific Post-Training for Multimodal Large Language Models

Paper • 2411.19930 • Published Nov 29, 2024 • 30

MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 29

authored a paper about 1 year ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 95

authored 4 papers over 1 year ago

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 51

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 61

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 624

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20, 2024 • 51

authored 3 papers almost 2 years ago

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 104

Calibrating LLM-Based Evaluator

Paper • 2309.13308 • Published Sep 23, 2023 • 12

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50

authored 5 papers about 2 years ago

Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 172

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Paper • 2307.02486 • Published Jul 5, 2023 • 81

Kosmos-2: Grounding Multimodal Large Language Models to the World

Paper • 2306.14824 • Published Jun 26, 2023 • 34

Pre-training Language Model as a Multi-perspective Course Learner

Paper • 2305.03981 • Published May 6, 2023 • 1

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Paper • 2305.09148 • Published May 16, 2023 • 1