jp-parallel-gloss

jp-parallel-gloss makes predictions on similarity of Japanese-to-English glosses (definitions). This is a sentence-transformers model fine-tuned using a dataset of 4M+ parallel/non-parallel gloss pairs from the JMDict database and antonym/synonym pairs from WordNet. The base model used is cross-encoder/ms-macro-MiniLM-L-6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. See its application in Kotoba Tag

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Language: English

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'dearest',
    'to become verminous',
    "having an (overly) strong attachment to one's mother",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Evaluated with BinaryClassificationEvaluator

Metric	Value
cosine_accuracy	0.9897545950802664
cosine_accuracy_threshold	0.4331962466239929
cosine_f1	0.9685565783209015
cosine_f1_threshold	0.4324696958065033
cosine_precision	0.9696722939424032
cosine_recall	0.9674434272579558
cosine_ap	0.9934008701351884
cosine_mcc	0.9624377824608901

Training Details

Size: 4,404,844 training samples
Columns: text1, text2, and label
Approximate statistics based on the first 1000 samples:
text1 text2 label
type string string int
details
min: 3 tokens
mean: 5.65 tokens
max: 27 tokens

min: 3 tokens
mean: 5.64 tokens
max: 31 tokens

False: ~91.80%
True: ~8.20%
Samples:

text1 text2 label

based on making up (a deficiency) False

folk (esp. music) if possible False

to start to die False

	text1	text2	label
type	string	string	int
details	min: 3 tokens mean: 5.65 tokens max: 27 tokens	min: 3 tokens mean: 5.64 tokens max: 31 tokens	False: ~91.80% True: ~8.20%

text1	text2	label
`based on`	`making up (a deficiency)`	`False`
`folk (esp. music)`	`if possible`	`False`
`to start`	`to die`	`False`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Evaluation

Size: 550,605 evaluation samples
Columns: text1, text2, and label
Approximate statistics based on the first 1000 samples:
text1 text2 label
type string string int
details
min: 3 tokens
mean: 5.74 tokens
max: 28 tokens

min: 3 tokens
mean: 5.7 tokens
max: 32 tokens

False: ~91.60%
True: ~8.40%

	text1	text2	label
type	string	string	int
details	min: 3 tokens mean: 5.74 tokens max: 28 tokens	min: 3 tokens mean: 5.7 tokens max: 32 tokens	False: ~91.60% True: ~8.40%

Samples:

text1	text2	label
`taking one's children along (to an event, into a new marriage, etc.)`	`disconnect`	`False`
`to thunder`	`sheet`	`False`
`throwing event (e.g. javelin, discus, shot put)`	`extinctive prescription`	`False`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
per_device_eval_batch_size: 32
weight_decay: 0.01
num_train_epochs: 8
warmup_ratio: 0.1

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 8
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	dev_cosine_ap
-1	-1	-	-	0.8061
0.0145	500	7.2395	-	-
0.0291	1000	7.2421	-	-
0.0436	1500	6.5757	-	-
0.0581	2000	5.96	-	-
0.0726	2500	5.5217	-	-
0.0872	3000	5.3224	-	-
0.1017	3500	5.2104	-	-
0.1162	4000	5.0525	-	-
0.1308	4500	5.1228	-	-
0.1453	5000	5.0317	1.5742	0.8818
0.1598	5500	4.9875	-	-
0.1744	6000	4.85	-	-
0.1889	6500	4.9348	-	-
0.2034	7000	4.7928	-	-
0.2179	7500	4.8412	-	-
0.2325	8000	4.8304	-	-
0.2470	8500	4.8031	-	-
0.2615	9000	4.7567	-	-
0.2761	9500	4.7847	-	-
0.2906	10000	4.7743	1.3281	0.9066
0.3051	10500	4.6624	-	-
0.3196	11000	4.6653	-	-
0.3342	11500	4.6047	-	-
0.3487	12000	4.5972	-	-
0.3632	12500	4.6678	-	-
0.3778	13000	4.5873	-	-
0.3923	13500	4.6007	-	-
0.4068	14000	4.526	-	-
0.4214	14500	4.576	-	-
0.4359	15000	4.5587	1.1674	0.9213
0.4504	15500	4.4398	-	-
0.4649	16000	4.529	-	-
0.4795	16500	4.4231	-	-
0.4940	17000	4.5204	-	-
0.5085	17500	4.508	-	-
0.5231	18000	4.4563	-	-
0.5376	18500	4.4922	-	-
0.5521	19000	4.3455	-	-
0.5666	19500	4.393	-	-
0.5812	20000	4.3754	1.1346	0.9267
0.5957	20500	4.3033	-	-
0.6102	21000	4.4046	-	-
0.6248	21500	4.4623	-	-
0.6393	22000	4.3426	-	-
0.6538	22500	4.3791	-	-
0.6684	23000	4.4055	-	-
0.6829	23500	4.3898	-	-
0.6974	24000	4.3318	-	-
0.7119	24500	4.3469	-	-
0.7265	25000	4.39	1.1003	0.9304
0.7410	25500	4.2806	-	-
0.7555	26000	4.3901	-	-
0.7701	26500	4.3526	-	-
0.7846	27000	4.2083	-	-
0.7991	27500	4.4242	-	-
0.8136	28000	4.3139	-	-
0.8282	28500	4.2971	-	-
0.8427	29000	4.2024	-	-
0.8572	29500	4.2684	-	-
0.8718	30000	4.3175	0.9830	0.9365
0.8863	30500	4.2168	-	-
0.9008	31000	4.1969	-	-
0.9154	31500	4.248	-	-
0.9299	32000	4.1886	-	-
0.9444	32500	4.269	-	-
0.9589	33000	4.1733	-	-
0.9735	33500	4.1176	-	-
0.9880	34000	4.2357	-	-
1.0025	34500	4.0826	-	-
1.0171	35000	3.6937	0.9222	0.9416
1.0316	35500	3.9462	-	-
1.0461	36000	3.8201	-	-
1.0606	36500	3.8564	-	-
1.0752	37000	3.8252	-	-
1.0897	37500	3.8981	-	-
1.1042	38000	3.8162	-	-
1.1188	38500	3.742	-	-
1.1333	39000	3.7388	-	-
1.1478	39500	3.852	-	-
1.1624	40000	3.7787	0.8873	0.9440
1.1769	40500	3.6863	-	-
1.1914	41000	3.7342	-	-
1.2059	41500	3.7647	-	-
1.2205	42000	3.7589	-	-
1.2350	42500	3.7183	-	-
1.2495	43000	3.8539	-	-
1.2641	43500	3.7406	-	-
1.2786	44000	3.7291	-	-
1.2931	44500	3.729	-	-
1.3076	45000	3.6944	0.8696	0.9457
1.3222	45500	3.8864	-	-
1.3367	46000	3.7167	-	-
1.3512	46500	3.7737	-	-
1.3658	47000	3.7781	-	-
1.3803	47500	3.7873	-	-
1.3948	48000	3.6664	-	-
1.4094	48500	3.8184	-	-
1.4239	49000	3.6521	-	-
1.4384	49500	3.7833	-	-
1.4529	50000	3.7294	0.8075	0.9504
1.4675	50500	3.7328	-	-
1.4820	51000	3.7784	-	-
1.4965	51500	3.6691	-	-
1.5111	52000	3.6275	-	-
1.5256	52500	3.7145	-	-
1.5401	53000	3.6423	-	-
1.5546	53500	3.6464	-	-
1.5692	54000	3.6415	-	-
1.5837	54500	3.7093	-	-
1.5982	55000	3.6996	0.7741	0.9527
1.6128	55500	3.6644	-	-
1.6273	56000	3.6496	-	-
1.6418	56500	3.6891	-	-
1.6564	57000	3.7227	-	-
1.6709	57500	3.6413	-	-
1.6854	58000	3.6085	-	-
1.6999	58500	3.4957	-	-
1.7145	59000	3.5888	-	-
1.7290	59500	3.6562	-	-
1.7435	60000	3.6091	0.7441	0.9549
1.7581	60500	3.4945	-	-
1.7726	61000	3.5744	-	-
1.7871	61500	3.6632	-	-
1.8016	62000	3.5322	-	-
1.8162	62500	3.4866	-	-
1.8307	63000	3.5391	-	-
1.8452	63500	3.4714	-	-
1.8598	64000	3.4245	-	-
1.8743	64500	3.4765	-	-
1.8888	65000	3.4499	0.7203	0.9563
1.9034	65500	3.5459	-	-
1.9179	66000	3.6055	-	-
1.9324	66500	3.5734	-	-
1.9469	67000	3.5724	-	-
1.9615	67500	3.5344	-	-
1.9760	68000	3.4783	-	-
1.9905	68500	3.5332	-	-
2.0051	69000	3.1724	-	-
2.0196	69500	2.8641	-	-
2.0341	70000	2.7543	0.7252	0.9577
2.0486	70500	2.8778	-	-
2.0632	71000	2.5721	-	-
2.0777	71500	2.7482	-	-
2.0922	72000	2.8025	-	-
2.1068	72500	2.8993	-	-
2.1213	73000	2.9477	-	-
2.1358	73500	2.8873	-	-
2.1504	74000	2.9593	-	-
2.1649	74500	2.8642	-	-
2.1794	75000	2.9113	0.7252	0.9582
2.1939	75500	2.8282	-	-
2.2085	76000	2.9086	-	-
2.2230	76500	2.7911	-	-
2.2375	77000	2.9013	-	-
2.2521	77500	2.9883	-	-
2.2666	78000	2.7996	-	-
2.2811	78500	2.9005	-	-
2.2956	79000	2.8725	-	-
2.3102	79500	2.9003	-	-
2.3247	80000	3.0029	0.6799	0.9607
2.3392	80500	2.9904	-	-
2.3538	81000	2.9155	-	-
2.3683	81500	2.933	-	-
2.3828	82000	2.8691	-	-
2.3973	82500	3.003	-	-
2.4119	83000	2.9573	-	-
2.4264	83500	2.8678	-	-
2.4409	84000	3.0882	-	-
2.4555	84500	2.8722	-	-
2.4700	85000	2.9527	0.6760	0.9610
2.4845	85500	3.1515	-	-
2.4991	86000	2.9227	-	-
2.5136	86500	2.9474	-	-
2.5281	87000	2.9981	-	-
2.5426	87500	2.8989	-	-
2.5572	88000	2.8141	-	-
2.5717	88500	3.0488	-	-
2.5862	89000	2.8426	-	-
2.6008	89500	2.7394	-	-
2.6153	90000	3.0399	0.6430	0.9628
2.6298	90500	2.9426	-	-
2.6443	91000	2.7746	-	-
2.6589	91500	2.9781	-	-
2.6734	92000	2.8177	-	-
2.6879	92500	2.6764	-	-
2.7025	93000	2.8852	-	-
2.7170	93500	2.8658	-	-
2.7315	94000	2.9031	-	-
2.7461	94500	2.9051	-	-
2.7606	95000	2.9715	0.6347	0.9636
2.7751	95500	2.8294	-	-
2.7896	96000	2.9833	-	-
2.8042	96500	2.8931	-	-
2.8187	97000	2.866	-	-
2.8332	97500	2.7796	-	-
2.8478	98000	2.7783	-	-
2.8623	98500	2.9983	-	-
2.8768	99000	2.965	-	-
2.8913	99500	2.9125	-	-
2.9059	100000	2.8308	0.6162	0.9649
2.9204	100500	2.7666	-	-
2.9349	101000	2.8829	-	-
2.9495	101500	2.7808	-	-
2.9640	102000	3.0559	-	-
2.9785	102500	2.8531	-	-
2.9931	103000	2.8534	-	-
3.0076	103500	2.3948	-	-
3.0221	104000	1.9878	-	-
3.0366	104500	2.204	-	-
3.0512	105000	2.0951	0.6358	0.9651
3.0657	105500	2.1723	-	-
3.0802	106000	2.096	-	-
3.0948	106500	2.1398	-	-
3.1093	107000	2.1534	-	-
3.1238	107500	2.0605	-	-
3.1383	108000	1.9515	-	-
3.1529	108500	2.1798	-	-
3.1674	109000	2.1395	-	-
3.1819	109500	2.0357	-	-
3.1965	110000	2.0579	0.6275	0.9656
3.2110	110500	2.2834	-	-
3.2255	111000	2.1215	-	-
3.2401	111500	2.3135	-	-
3.2546	112000	2.1642	-	-
3.2691	112500	2.1095	-	-
3.2836	113000	2.1022	-	-
3.2982	113500	2.2954	-	-
3.3127	114000	2.2834	-	-
3.3272	114500	2.2489	-	-
3.3418	115000	2.2317	0.6205	0.9663
3.3563	115500	2.234	-	-
3.3708	116000	2.1769	-	-
3.3853	116500	2.1369	-	-
3.3999	117000	2.1962	-	-
3.4144	117500	2.1586	-	-
3.4289	118000	2.2802	-	-
3.4435	118500	2.2446	-	-
3.4580	119000	2.3673	-	-
3.4725	119500	2.1549	-	-
3.4871	120000	2.2963	0.5948	0.9672
3.5016	120500	2.331	-	-
3.5161	121000	2.2441	-	-
3.5306	121500	2.0613	-	-
3.5452	122000	2.2732	-	-
3.5597	122500	2.1462	-	-
3.5742	123000	2.2862	-	-
3.5888	123500	2.466	-	-
3.6033	124000	2.1136	-	-
3.6178	124500	2.2851	-	-
3.6323	125000	2.2898	0.5887	0.9677
3.6469	125500	2.1318	-	-
3.6614	126000	2.2125	-	-
3.6759	126500	2.2985	-	-
3.6905	127000	2.2355	-	-
3.7050	127500	2.1965	-	-
3.7195	128000	2.2711	-	-
3.7341	128500	2.2094	-	-
3.7486	129000	2.1588	-	-
3.7631	129500	2.3413	-	-
3.7776	130000	2.1223	0.5878	0.9683
3.7922	130500	2.1582	-	-
3.8067	131000	2.3648	-	-
3.8212	131500	2.2182	-	-
3.8358	132000	2.1239	-	-
3.8503	132500	2.0056	-	-
3.8648	133000	2.1289	-	-
3.8793	133500	2.223	-	-
3.8939	134000	2.3067	-	-
3.9084	134500	2.2172	-	-
3.9229	135000	2.2992	0.5534	0.9699
3.9375	135500	2.1945	-	-
3.9520	136000	2.2532	-	-
3.9665	136500	2.3272	-	-
3.9811	137000	2.2678	-	-
3.9956	137500	2.2451	-	-
4.0101	138000	1.506	-	-
4.0246	138500	1.552	-	-
4.0392	139000	1.5056	-	-
4.0537	139500	1.5867	-	-
4.0682	140000	1.4977	0.5668	0.9697
4.0828	140500	1.5145	-	-
4.0973	141000	1.571	-	-
4.1118	141500	1.5091	-	-
4.1263	142000	1.5696	-	-
4.1409	142500	1.6053	-	-
4.1554	143000	1.5816	-	-
4.1699	143500	1.6723	-	-
4.1845	144000	1.5638	-	-
4.1990	144500	1.5457	-	-
4.2135	145000	1.5442	0.5663	0.9698
4.2281	145500	1.6303	-	-
4.2426	146000	1.4715	-	-
4.2571	146500	1.5385	-	-
4.2716	147000	1.6144	-	-
4.2862	147500	1.4881	-	-
4.3007	148000	1.8148	-	-
4.3152	148500	1.5511	-	-
4.3298	149000	1.6536	-	-
4.3443	149500	1.5755	-	-
4.3588	150000	1.6997	0.5608	0.9702
4.3733	150500	1.6931	-	-
4.3879	151000	1.5777	-	-
4.4024	151500	1.7588	-	-
4.4169	152000	1.5043	-	-
4.4315	152500	1.5527	-	-
4.4460	153000	1.5128	-	-
4.4605	153500	1.5893	-	-
4.4751	154000	1.6465	-	-
4.4896	154500	1.6211	-	-
4.5041	155000	1.5675	0.5623	0.9704
4.5186	155500	1.752	-	-
4.5332	156000	1.8182	-	-
4.5477	156500	1.5368	-	-
4.5622	157000	1.6635	-	-
4.5768	157500	1.5425	-	-
4.5913	158000	1.5988	-	-
4.6058	158500	1.7011	-	-
4.6203	159000	1.5353	-	-
4.6349	159500	1.625	-	-
4.6494	160000	1.5483	0.5426	0.9714
4.6639	160500	1.6127	-	-
4.6785	161000	1.6512	-	-
4.6930	161500	1.7213	-	-
4.7075	162000	1.5976	-	-
4.7221	162500	1.5711	-	-
4.7366	163000	1.5911	-	-
4.7511	163500	1.6364	-	-
4.7656	164000	1.6361	-	-
4.7802	164500	1.7027	-	-
4.7947	165000	1.6462	0.5388	0.9717
4.8092	165500	1.7102	-	-
4.8238	166000	1.6149	-	-
4.8383	166500	1.5491	-	-
4.8528	167000	1.6389	-	-
4.8673	167500	1.5092	-	-
4.8819	168000	1.6771	-	-
4.8964	168500	1.6812	-	-
4.9109	169000	1.6414	-	-
4.9255	169500	1.6066	-	-
4.9400	170000	1.4729	0.5236	0.9724
4.9545	170500	1.6032	-	-
4.9691	171000	1.6274	-	-
4.9836	171500	1.8478	-	-
4.9981	172000	1.6356	-	-
5.0126	172500	1.1942	-	-
5.0272	173000	1.1838	-	-
5.0417	173500	1.0514	-	-
5.0562	174000	1.0647	-	-
5.0708	174500	1.0718	-	-
5.0853	175000	1.0162	0.5385	0.9720
5.0998	175500	1.0253	-	-
5.1143	176000	1.115	-	-
5.1289	176500	1.0504	-	-
5.1434	177000	1.1573	-	-
5.1579	177500	1.0937	-	-
5.1725	178000	1.0939	-	-
5.1870	178500	1.0392	-	-
5.2015	179000	1.0852	-	-
5.2161	179500	1.165	-	-
5.2306	180000	1.1048	0.5291	0.9723
5.2451	180500	1.1814	-	-
5.2596	181000	1.2639	-	-
5.2742	181500	1.1395	-	-
5.2887	182000	1.1452	-	-
5.3032	182500	1.2131	-	-
5.3178	183000	1.236	-	-
5.3323	183500	1.1449	-	-
5.3468	184000	1.1425	-	-
5.3613	184500	1.2328	-	-
5.3759	185000	1.1114	0.5252	0.9727

Framework Versions

Python: 3.9.21
Sentence Transformers: 3.4.0
Transformers: 4.48.1
PyTorch: 2.5.1
Accelerate: 1.3.0
Datasets: 3.2.0
Tokenizers: 0.21.0

Downloads last month: 5

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for nphach/jp-parallel-gloss

Base model

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(578)

this model

Evaluation results

Cosine Accuracy
self-reported

0.990
Cosine Accuracy Threshold
self-reported

0.433
Cosine F1
self-reported

0.969
Cosine F1 Threshold
self-reported

0.432
Cosine Precision
self-reported

0.970
Cosine Recall
self-reported

0.967
Cosine Ap
self-reported

0.993
Cosine Mcc
self-reported

0.962

Metadata error: specify a dataset to view leaderboard