SentenceTransformer based on allenai/specter2_aug2023refresh_base

This is a sentence-transformers model finetuned from allenai/specter2_aug2023refresh_base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: allenai/specter2_aug2023refresh_base
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("m7n/discipline-tuned_specter_2_009")
# Run inference
sentences = [
    'After the tragedy of September , , the Central Asian countries were "catapulted" into the international spotlight. Post / , U.S. interests radically changed with what the George W. Bush administration identified as the "war on terrorism," the first stage of which focused on Afghanistan. At the same time, the United States increased its presence in Central Asia, when the military campaign against the Taliban regime in Afghanistan was launched in October . This article first briefly examines Kazakhstan\'s role in the war on terrorism, and then analyzes the primary geopolitical interests of the United States in Central Asia, namely North Atlantic Treaty Organization (NATO) expansion, the export of Caspian energy supplies, and political and economic reform.',
    'Kazakhstan\'s internal and external policies are of growing concern to Russia. At question is not only the future of the Russian ethnic minority, but also the disposition of former Soviet infrastructure (such as the Baikanur Space Launch Center) and strategic weaponry. The reorganization of the Kazakh armed forces also has raised concerns in Russia. Improved RussianKazakh relations may lead to "Eurasian Unity" and also to the vitally important ChristianMuslim understanding in the former Soviet geopolitical space.',
    'Creep strain equations of Grade steel which is used in boilers and piping systems of ultra-supercritical (USC) thermal power plants were developed based on the results of creep tests using smooth round bar specimens of three different sources of Grade steels. In these equations, primary creep behavior was represented by a power-law function of time and tertiary creep behavior was described by an exponential function of time. Parameters in these equations were determined as a function of creep rupture time which was obtained from each creep rupture curve. The creep strain equations were able to express the creep deformation behavior of each test material with a satisfactory accuracy for a wide range of temperature and stress.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Datasets: specter_2_ and discipline-tuned_specter_2_009
Evaluated with TripletEvaluator

Metric	specter_2_	discipline-tuned_specter_2_009
cosine_accuracy	0.9007	0.9011

Training Details

Training Dataset

Unnamed Dataset

Size: 48,000 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 82 tokens mean: 227.84 tokens max: 512 tokens	min: 79 tokens mean: 232.22 tokens max: 512 tokens	min: 87 tokens mean: 238.66 tokens max: 512 tokens

Samples:

anchor	positive	negative
The quantum version of the Bochkov-Kuzovlev identity is derived on the basis of the appropriate definition of work as the difference of the measured internal energies of a quantum system at the beginning and the end of an external action on the system given by a prescribed protocol. According to the spirit of the original Bochkov-Kuzovlev approach, we adopt the 'exclusive' viewpoint, meaning that the coupling to the external work source is not counted as part of the internal energy. The corresponding canonical and microcanonical quantum fluctuation theorems are derived as well, and are compared with the respective theorems obtained within the 'inclusive' approach. The relations between the quantum inclusive work w, the exclusive work w( ) and the dissipated work w(dis), are discussed and clarified. We show by an explicit example that w( ) and w(dis) are distinct stochastic quantities obeying different statistics.	When a large number of similar entities interact among each other and with their environment at a low scale, unexpected outcomes at higher spatio-temporal scales might spontaneously arise. This non-trivial phenomenon, known as emergence, characterizes a broad range of distinct complex systems-from physical to biological and social-and is often related to collective behaviour. It is ubiquitous, from non-living entities such as oscillators that under specific conditions synchronize, to living ones, such as birds flocking or fish schooling. Despite the ample phenomenological evidence of the existence of systems' emergent properties, central theoretical questions to the study of emergence remain unanswered, such as the lack of a widely accepted, rigorous definition of the phenomenon or the identification of the essential physical conditions that favour emergence. We offer here a general overview of the phenomenon of emergence and sketch current and future challenges on the topic. Our short...	Using the Maxwellian macroscopic approach and analysing the formulation of the dielectric constant, it is shown that the concept of energy has not been properly incorporated into the current kinetic plasma theory. The difficulties are due to the Boltzmann collisional term (F/t)coll which accounts for a change in the velocity distribution due to collisions alone. If one attempts to rephrase the Boltzmann-Vlasov theory in terms of the Maxwcllian macroscopic formulation, one obtains an expression for energy which is not consistent with the meaning of this concept in generalized dynamics. In a revised version developed in this analysis the Boltzmann collisional term has been eliminated and an appropriate collisional operator is introduced which is believed to describe more adequately collisional processes in a plasma. It is assumed that the collisional operator can be applied directly to the electrical intensity of the field interacting with the plasma and is effective in transforming the ...
The emerging services of 0G and beyond (0G/B0G) have imposed challenging resiliency demands that make reliable slicing in radio access networks (RAN) an imperative concern. In previous literature, service protection schemes for one single domain are well studied, e.g., lightpath protection for transport domain. However, 0G/B0G will evolve towards a converged network where data transport and function processing coordinate together, thus the single-domain protection is inadequate. In this paper, we are dedicated to investigating a multi-domain protection scheme, which can achieve customized protection for specific reliability demands and reduce backup redundancy. To this end, we propose a topology-level based protection scheme (TLPS) to customize virtual networks (VN) for slices to guarantee reliability while reducing redundancy. The designed VNs are then embedded onto the substrate network with our proposed mixed integer linear programming (MILP) model and heuristic, which aim at minimi...	SDN and NFV have recently changed the way we operate networks. By decoupling control and data plane operations and virtualising their components, they have opened up new frontiers towards reducing network ownership costs and improving usability and efficiency. Recently, their applicability has moved towards public telecommunications networks, with concepts such as the cloud-CO that have pioneered its use in access and metro networks: an idea that has quickly attracted the interest of network operators. By merging mobile, residential and enterprise services into a common framework, built around commoditised data centre types of architectures, future embodiments of this CO virtualisation concept could achieve significant capital and operational cost savings, while providing customised network experience to high-capacity and low-latency future applications. This tutorial provides an overview of the various frameworks and architectures outlining current network disaggregation trends that a...	SMART (Semantic web information Management with automated Reasoning Tool) is an open-source project, which aims to provide intuitive tools for life scientists for represent, integrate, manage and query heterogeneous and distributed biological knowledge. SMART was designed with interoperability and extensibility in mind and uses AJAX, SVG and JSF technologies, RDF, OWL, SPARQL semantic web languages, triple stores (i.e. Jena) and DL reasoners (i.e. Pellet) for the automated reasoning. Features include semantic query composition and validation using DL reasoners, a graphical representation of the query, a mapping of DL queries to SPARQL, and the retrieval of pre-computed inferences from an RDF triple store. With a use case scenario, we illustrate how a biological scientist can intuitively query the yeast knowledge base and navigate the results. Continued development of this web-based resource for the biological semantic web will enable new information retrieval opportunities for the life...
The DouglasRachford method has been employed successfully to solve many kinds of nonconvex feasibility problems. In particular, recent research has shown surprising stability for the method when it is applied to finding the intersections of hypersurfaces. Motivated by these discoveries, we reformulate a second order boundary value problem (BVP) as a feasibility problem where the sets are hypersurfaces. We show that such a problem may always be reformulated as a feasibility problem on no more than three sets and is well suited to parallelization. We explore the stability of the method by applying it to several BVPs, including cases where the traditional Newton's method fails.	We propose a new adaptive and composite BarzilaiBorwein (BB) step size by integrating the advantages of such existing step sizes. Particularly, the proposed step size is an optimal weighted mean of two classical BB step sizes and the weights are updated at each iteration in accordance with the quality of the classical BB step sizes. Combined with the steepest descent direction, the adaptive and composite BB step size is incorporated into the development of an algorithm such that it is efficient to solve large-scale optimization problems. We prove that the developed algorithm is globally convergent and it R-linearly converges when applied to solve strictly convex quadratic minimization problems. Compared with the state-of-the-art algorithms available in the literature, the proposed step size is more efficient in solving ill-posed or large-scale benchmark test problems.	Nonconvex optimization is becoming more and more important in machine learning and operations research. In spite of recent progresses, the development of provably efficient algorithm for optimization with nonconvex functional constraints remains open. Such problems have potential applications in risk-averse machine learning, semisupervised learning and robust optimization among others. In this paper, we introduce a new proximal point type method for solving this important class of nonconvex problems by transforming them into a sequence of convex constrained subproblems. We establish the convergence and rate of convergence of this algorithm to the KKT point under different types of constraint qualifications. In particular, we prove that our algorithm will converge to an \epsilon -KKT point in O( /\epsilon) iterations under a properly defined condition. For practical use, we present inexact variants of this approach, in which approximate solutions of the subproblems are computed by eithe...

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.COSINE",
    "triplet_margin": 0.3
}

Evaluation Dataset

Unnamed Dataset

Size: 2,400 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 84 tokens mean: 235.04 tokens max: 512 tokens	min: 81 tokens mean: 230.99 tokens max: 512 tokens	min: 83 tokens mean: 242.95 tokens max: 512 tokens

Samples:

anchor	positive	negative
To assess the effects of a gluten-free diet on bone structure in children with celiac disease using fractal analysis on panoramic radiographs.A total of patients with celiac disease aged to years, separated into two groups as previously and newly diagnosed, and a control group of healthy individuals were evaluated. In previously and newly diagnosed patients with celiac disease, body mass index Z-scores were calculated, calcium, alkaline phosphatase, vitamin D0, and parathormone levels were measured, and bone mineral density Z-scores were obtained from dual energy x-ray absorptiometry. In all patients, the fractal dimensions of the right and left temporomandibular condyles were evaluated with the fractal analysis method on panoramic radiographs.The mean values of serum biomarker levels and the body mass index and bone mineral density Z-scores for both celiac groups were within the normal reference range. No statistically significant difference was determined between right and left condy...	The International Caries Detection and Assessment System (ICDAS II) and the Caries Classification System (CCS) are caries stage description systems proposed for adoption into clinical practice. This pilot study investigated clinicians' training in and use of these systems for detection of early caries and recommendations for individual tooth treatment. Patient participants (N = ) with a range of noncavitated lesions (CCS ranks and and ICDAS II ranks - ) identified by a team of calibrated examiners were recruited from the New York University College of Dentistry clinic. Eighteen dentists- from the Practitioners Engaged in Applied Research and Learning (PEARL) Network and recruited from the Academy of General Dentistry-were randomly assigned to of groups: dentists used only visual-tactile (VT) examination, were trained in the ICDAS II, and were trained in the CCS. Lesion stage for each tooth was determined by the ICDAS II and CCS groups, and recommended treatment was decided by all group...	Abstract Six procedures were evaluated for aspartate aminotransferase (EC .0.0) isoenzyme assay in human serum and tissue homogenates. Results of procedures based on immunochemical precipitation by use of antibodies directed against either the mitochondrial or (with greater precision) soluble isoenzyme correlated well with those by a differential kinetic assay involving both different pH conditions and adipate inhibition. Results with a DEAE-Sephadex ion-exchange chromatographic procedure correlated well with these techniques for specimens containing purified isoenzymes, but showed substantial positive bias for determination of the mitochondrial isoenzyme in human serum. An assay based on the differential effects of pH alone discriminated between the isoenzymes with less bias than did the chromatographic assay. Precision of the two differential pH assays was limited by significant reagent blank activity resulting from destruction of NADH at pH or . An electrophoretic procedure in which...
Heating by electricity, replacing coal by electricity, is regarded as an effective way to solve the environment problem. Thus electric heating load is growing rapidly, which may result in some undesired problem to distribution grid due to its randomness and dispersed integration. However, electric heating load may be a kind of energy storage system by optimal control of its operation. So the optimal modeling of electric heating load characteristic considering its randomness is of important for grid planning and construction. In this paper, the heating loads of distributed residential users in a certain area are modeled based on Fanger Thermal Comfort Equation (FTCE) and the PMV thermal comfort index calculation method. Different temperatures are taken into consideration during modeling users' heating loads. According to the time- varying equation of inside temperature, the heat load demand curve is estimated. And then a multi-objective optimization model for electric heating load with ...	Cretaceous bituminous coals of known rank R0 max, vitrinite reflectance) have been examined by ESR (electron spin resonance) and ENDOR (electron nuclear double resonance) techniques. Both highly oxidised (outcrop) and unoxidised minerun Balmer coal from the Crowsnest field have been subjected to heat treatment (000000C), and the matrix proton ENDOR signal studied as a function of applied microwave and rf power. Changes in ENDOR line shape and intensity are described with particular emphasis on the presoftening region of the unoxidised coal. A comparative study of the carbonization of hvb and 0vb coking coal from the Crowsnest is reported.	The pathogen of leaf spots found in Strelitzia reginae was identified as Cylindrocladium colhounii peerall colhounii. Its hypha grew differently on different media. Richards medium was found the best. Conidia were not produced on Richards medium and PCA medium. The optimum temperature and pH value for hypha growth were - and - . The optimum temperature for conidium growth was - . Among carbon sources, D-glucose, sucrose and amylum were better than maltose, D-fructose and lactose. Among nitrogen sources, yeast extract, beef extract and ammonium nitrate were better than peptone, ammonium sulphate and Laspartic acid.
Four experiments were conducted to examine the hypothesis that when incorrect strategies for solving domain-specific problems were contradicted, a domain-general rule would be induced and would subsequently facilitate transfer to problems outside of the original domain. Experiments involved examining transfer from problems designed to elicit the "permission" and the "causal" schemata described by P. W. Cheng and K. J. Holyoak ( ). Results indicated that (a) training might have led to the construction of a domain-independent rule only when source problems were causal, (b) transfer was more likely when source problems were causal than when source problems were permissions, and (c) transfer from causal problems was weakly related to IQ, whereas transfer from permissions was strongly related to IQ	We used a probe procedure to show that a goal established earlier in a text is active in memory at the point of its achievement. An initial experiment demonstrated that a goal category (began an investigation to nab the THIEF) is accessible, relative to a control condition, following the processing of a goal-achievement sentence (had the PURSER brought to his office). The remaining experiments provided evidence against several explanations of this result: (a) that the goal category's accessibility is due to an advantage in the strength of its initial encoding; (b) that the goal category is maintained in memory from the point at which the goal is established; or (c) that the goal category is reinstated at the point of goal achievement as the result of a high-level inference. The results suggest that the goal category is reinstated as the result of a low-level inference similar to the type that links an anaphor and its antecedent.	Introduction: Emerging evidences have inconsistently reported that dietary components mediate the relationship between low socioeconomic status (SES) and higher cardiometabolic risk, but the findings are limited in Korean. Hypothesis: We tested the hypothesis whether the education level as a proxy for SES is associated with the prevalence of metabolic syndrome and this association is mediated by dietary pattern. Methods: We used nationally representative data from the Korea National Health and Nutritional Examination Survey ( - ) for cross-sectional analyses (Total number of subjects= , - yrs). Dietary data were assessed using food frequency questionnaire including food items and were categorized into seven food groups based on the Korea nutrient database. Metabolic syndrome was defined using revised National Cholesterol Education Program criteria. The possible mediating effect of dietary components (fruit, vegetable, red meat, milk, and soft-drink) on the association between education...

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.COSINE",
    "triplet_margin": 0.3
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
learning_rate: 1e-05
weight_decay: 0.01
num_train_epochs: 1
warmup_ratio: 0.1
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	specter_2__cosine_accuracy	discipline-tuned_specter_2_009_cosine_accuracy
0	0	-	-	0.8076	-
0.0167	100	0.2146	0.1750	0.8258	-
0.0333	200	0.1355	0.1224	0.8461	-
0.05	300	0.1119	0.1071	0.8630	-
0.0667	400	0.0924	0.1001	0.8749	-
0.0833	500	0.0859	0.0978	0.8749	-
0.1	600	0.0864	0.0935	0.8803	-
0.1167	700	0.0895	0.0914	0.8862	-
0.1333	800	0.0815	0.0896	0.8853	-
0.15	900	0.0797	0.0892	0.8838	-
0.1667	1000	0.0814	0.0899	0.8872	-
0.1833	1100	0.0905	0.0872	0.8892	-
0.2	1200	0.0788	0.0861	0.8917	-
0.2167	1300	0.0812	0.0833	0.8916	-
0.2333	1400	0.0805	0.0835	0.8916	-
0.25	1500	0.0775	0.0829	0.8932	-
0.2667	1600	0.0792	0.0824	0.8954	-
0.2833	1700	0.0735	0.0820	0.8936	-
0.3	1800	0.0866	0.0823	0.8980	-
0.3167	1900	0.0847	0.0834	0.8975	-
0.3333	2000	0.0865	0.0853	0.895	-
0.35	2100	0.0768	0.0829	0.8979	-
0.3667	2200	0.0752	0.0797	0.9007	-
0.3833	2300	0.0737	-	-	0.9011

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.49.0.dev0
PyTorch: 2.5.1+cu121
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

m7n
/

discipline-tuned_specter_2_009

SentenceTransformer based on allenai/specter2_aug2023refresh_base

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Triplet

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

TripletLoss

Model tree for m7n/discipline-tuned_specter_2_009

Evaluation results