Datasets and models associated with the paper "Large-Scale Data Selection for Instruction Tuning" (https://arxiv.org/abs/2503.01807)
Hamish Ivison
hamishivi
AI & ML interests
NLP :)
Recent Activity
updated
a dataset
2 days ago
hamishivi/polaris_53k
published
a dataset
2 days ago
hamishivi/polaris_53k
Organizations
Large-Scale Data Selection for Instruction Tuning
Datasets and models associated with the paper "Large-Scale Data Selection for Instruction Tuning" (https://arxiv.org/abs/2503.01807)
TESS 2
Models associated with the paper "TESS-2: A Large-Scale, Generalist Diffusion Language Model". Code: https://github.com/hamishivi/tess-2
models
40

hamishivi/qwen2_5_openthoughts2
Updated
•
2.03k

hamishivi/olmo2_lc_ot2
Updated
•
156

hamishivi/qwen3_openthoughts2
Updated
•
13

hamishivi/Qwen-2.5-7b-tokenizer
Text Generation
•
2B
•
Updated
•
84

hamishivi/general-verifier
Text Generation
•
2B
•
Updated
•
17

hamishivi/qwen2.5_orz_upload
Updated

hamishivi/s1k_seq_orig_hyper__42__1740446762
Updated
•
10

hamishivi/tulu_3_long_finetune_qwen_7b_reg_system_prompt
Updated
•
15

hamishivi/tulu-2-wildchat-326k-sft
7B
•
Updated
•
17

hamishivi/tulu-2-arena-hard-326k-sft
7B
•
Updated
•
13
datasets
105
hamishivi/polaris_53k
Viewer
•
Updated
•
53.3k
hamishivi/hamishivi_rlvr_orz_math_57k_collected_all_filtered_hamishivi_qwen2_5_openthoughts2
Viewer
•
Updated
•
8.5k
hamishivi/allenai_IF_multi_constraints_upto5_all_filtered_qwen2_5_openthoughts2
Viewer
•
Updated
•
41k
hamishivi/saurabh5_rlvr_acecoder_all_filtered_qwen2_5_openthoughts2
Viewer
•
Updated
•
26.2k
hamishivi/tulu_3_rewritten_400k_string_f1_only_v2_all_filtered_qwen2_5_openthoughts2
Viewer
•
Updated
•
38.1k
hamishivi/open_scholar_rl_long_only
Viewer
•
Updated
•
34.5k
•
40
hamishivi/open_scholar_rl_long_only_no_prompt
Viewer
•
Updated
•
34.5k
•
40
hamishivi/open_scholar_rl_no_prompt
Viewer
•
Updated
•
37k
•
157
hamishivi/open_scholar_rl_no_refs
Viewer
•
Updated
•
60.2k
•
120
hamishivi/WebInstruct-verified-general-verifier-judge
Viewer
•
Updated
•
233k
•
185