SentenceTransformer based on sentence-transformers/allenai-specter

This is a sentence-transformers model finetuned from sentence-transformers/allenai-specter. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/allenai-specter
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("nadrajak/allenai-specter-ft")
# Run inference
sentences = [
    'Let G be a free group in a variety of groups, but G is not absolutely free. We prove that the group of automorphisms Aut(G) is linear iff G is a virtually nilpotent group.',
    'An orthogonal array OA(q^{2n-1},q^{2n-2}, q,2) is constructed from the action of a subset of PGL(n+1,q^2) on some non--degenerate Hermitian varieties in PG(n,q^2). It is also shown that the rows of this orthogonal array correspond to some blocks of an affine design, which for q> 2 is a non--classical model of the affine space AG(2n-1,q).',
    'Suppose that a target function is monotonic, namely, weakly increasing, and an original estimate of the target function is available, which is not weakly increasing. Many common estimation methods used in statistics produce such estimates. We show that these estimates can always be improved with no harm using rearrangement techniques: The rearrangement methods, univariate and multivariate, transform the original estimate to a monotonic estimate, and the resulting estimate is closer to the true curve in common metrics than the original estimate. We illustrate the results with a computational example and an empirical example dealing with age-height growth charts.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Dataset: triplet_eval
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.932

Triplet

Dataset: triplet_eval
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.94

Training Details

Training Dataset

Unnamed Dataset

Size: 9,702 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 37 tokens mean: 175.25 tokens max: 512 tokens	min: 36 tokens mean: 172.87 tokens max: 512 tokens	min: 37 tokens mean: 162.78 tokens max: 451 tokens

Samples:

anchor	positive	negative
We study the notion of the scaled entropy of a filtration of $\sigma$-fields (= decreasing sequence of $\sigma$-fields) introduced by the first author ({V4}). We suggest a method for computing this entropy for the sequence of $\sigma$-fields of pasts of a Markov process determined by a random walk over the trajectories of a Bernoulli action of a commutative or nilpotent countable group (Theorems5,6). Since the scaled entropy is a metric invariant of the filtration, it follows that the sequences of $\sigma$-fields of pasts of random walks over the trajectories of Bernoulli actions of lattices (groups ${\Bbb Z}^d$) are metrically nonisomorphic for different dimensions $d$, and for the same $d$ but different values of the entropy of the Bernoulli scheme. We give a brief survey of the metric theory of filtrations, in particular, formulate the standardness criterion and describe its connections with the scaled entropy and the notion of a tower of measures.	`In this paper we complete a classification of finite linear spaces $\cS$ with line size at most 12 admitting a line-transitive point-imprimitive subgroup of automorphisms. The examples are the Desarguesian projective planes of orders $4,7, 9$ and 11, two designs on 91 points with line size 6, and 467 designs on 729 points with line size 8.`	We show that the combined data from solar, long-baseline and reactor neutrino experiments can exclude the generalized bicycle model of Lorentz noninvariant direction-dependent and/or direction-independent oscillations of massless neutrinos. This model has five parameters, which is more than is needed in standard oscillation phenomenology with neutrino masses. Solar data alone are sufficient to exclude the pure direction-dependent case. The combination of solar and long-baseline data rules out the pure direction-independent case. With the addition of KamLAND data, a mixture of direction-dependent and direction-independent terms in the effective Hamiltonian is also excluded.
We discuss a numerical model for black hole growth and its associated feedback processes that for the first time allows cosmological simulations of structure formation to self-consistently follow the build up of the cosmic population of galaxies and active galactic nuclei. Our model assumes that seed black holes are present at early cosmic epochs at the centres of forming halos. We then track their growth from gas accretion and mergers with other black holes in the course of cosmic time. For black holes that are active, we distinguish between two distinct modes of feedback, depending on the black hole accretion rate itself. Black holes that accrete at high rates are assumed to be in a `quasar regime', where we model their feedback by thermally coupling a small fraction of their bolometric luminosity to the surrounding gas. For black holes with low accretion rates, we conjecture that most of their feedback occurs in mechanical form, where AGN-driven bubbles are injected into a gaseous e...	Context: L'-band (3.8 micron) images of the Galactic Center show a large number of thin filaments in the mini-spiral, located west of the mini-cavity and along the inner edge of the Northern Arm. One possible mechanism that could produce such structures is the interaction of a central wind with the mini-spiral. Additionally, we identify similar features that appear to be associated with stars. Aims: We present the first proper motion measurements of the thin dust filaments observed in the central parsec around SgrA* and investigate possible mechanisms that could be responsible for the observed motions. Methods: The observations have been carried out using the NACO adaptive optics system at the ESO VLT. The images have been transformed to a common coordinate system and features of interest were extracted. Then a cross-correlation technique could be performed in order to determine the offsets between the features with respect to their position in the reference epoch. Results: We derive t...	`Energy resolution, alpha/beta ratio, pulse-shape discrimination for gamma rays and alpha particles, temperature dependence of scintillation properties, and radioactive contamination were studied with CaMoO4 crystal scintillators. A high sensitivity experiment to search for neutrinoless double beta decay of 100-Mo by using CaMoO4 scintillators is discussed.`
From a macroscopic point of view phase transitions as surface melting or two dimensional (2D) towards three dimensional (3D) growth mode (Stranski-Krastanov transition) can be described in terms of Gibbs excess quantity duly amended by size effects (since usual Gibbs excess quantities are only well defined for semi-infinite systems). The aim of this study is to consider such amended quantities to describe surface melting and Stranski-Krastanov transition of epitaxial layers. the so-introduced size effects allows us to predict the equilibrium thickness of the wetting layer of the Stranski-Krastanov growth mode and to describe and classify two different melting cases: the incomplete melting relayed by a first order transition and the continuous premelting relayed by continuous overheating	We tailor the shape and phase of the pump pulse spectrum in order to study the coherent lattice dynamics in tellurium. Employing the coherent control via splitting the pump pulse into a two-pulse sequence, we show that the oscillations due to A1 coherent phonons can be cancelled but not enhanced as compared to single pulse excitation. We further demonstrate that a decisive factor for the coherent phonon generation is the bandwidth of the pulse spectrum and not the steepness of the pulse envelope. We also observe that the coherent amplitude for long pump pulses decreases exponentially independent of the shape of the pulse spectrum. Finally, by varying the pulse chirp, we show that the coherent amplitude is independent of while the oscillation lifetime is dependent on the chirp sign.	From the spectral plot of the (normalized) graph Laplacian, the essential qualitative properties of a network can be simultaneously deduced. Given a class of empirical networks, reconstruction schemes for elucidating the evolutionary dynamics leading to those particular data can then be developed. This method is exemplified for protein-protein interaction networks. Traces of their evolutionary history of duplication and divergence processes are identified. In particular, we can identify typical specific features that robustly distinguish protein-protein interaction networks from other classes of networks, in spite of possible statistical fluctuations of the underlying data.

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Evaluation Dataset

Unnamed Dataset

Size: 2,389 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 39 tokens mean: 169.07 tokens max: 485 tokens	min: 37 tokens mean: 168.4 tokens max: 512 tokens	min: 39 tokens mean: 165.13 tokens max: 478 tokens

Samples:

anchor	positive	negative
`We give axioms which characterize the local Reidemeister trace for orientable differentiable manifolds. The local Reidemeister trace in fixed point theory is already known, and we provide both uniqueness and existence results for the local Reidemeister trace in coincidence theory.`	We derive a unified stochastic picture for the duality of a resampling-selection model with a branching-coalescing particle process (cf. http://www.ams.org/mathscinet-getitem?mr=MR2123250) and for the self-duality of Feller's branching diffusion with logistic growth (cf. math/0509612). The two dual processes are approximated by particle processes which are forward and backward processes in a graphical representation. We identify duality relations between the basic building blocks of the particle processes which lead to the two dualities mentioned above.	CLIC is a linear $e^+e^-$ ($\gamma\gamma$) collider project which uses a drive beam to accelerate the main beam. The drive beam provides RF power for each corresponding unit of the main linac through energy extracting RF structures. CLIC has a wide range of center-of-mass energy options from 150 GeV to 3 TeV. The present paper contains optimization of Free Electron Laser (FEL) using one bunch of CLIC drive beam in order to provide polarized light amplification using appropriate wiggler and luminosity spectrum of $\gamma\gamma$ collider for $E_{cm}$=0.5 TeV. Then amplified laser can be converted to a polarized high-energy $\gamma$ beam at the Conversion point (CP-prior to electron positron interaction point) in the process of Compton backscattering. At the CP a powerful laser pulse (FEL) focused to main linac electrons (positrons). Here this scheme described and it is show that CLIC drive beam parameters satisfy the requirement of FEL additionally essential undulator parameters has been...
We determine the quantum phase diagram of the one-dimensional Hubbard model with bond-charge interaction X in addition to the usual Coulomb repulsion U at half-filling. For large enough X and positive U the model shows three phases. For large U the system is in the spin-density wave phase already known in the usual Hubbard model. As U decreases, there is first a spin transition to a spontaneously dimerized bond-ordered wave phase and then a charge transition to a novel phase in which the dominant correlations at large distances correspond to an incommensurate singlet superconductor.	Vortex-antivortex pairs are localized excitations and have been found to be spontaneously created in magnetic elements. In the case that the vortex and the antivortex have opposite polarities the pair has a nonzero topological charge, and it behaves as a rotating vortex dipole. We find theoretically, and confirm numerically, the form of the energy as a function of the angular momentum of the system and the associated rotation frequencies. We discuss the process of annihilation of the pair which changes the topological charge of the system by unity while its energy is monotonically decreasing. Such a change in the topological charge affects profoundly the dynamics in the magnetic system. We finally discuss the connection of our results with Bloch Points (BP) and the implications for BP dynamics.	`We present results of simulations of a muon content in the air showers induced by very high energy cosmic rays. Muon energy distributions and muon densities at ground level are given. We discuss a prompt muon component generated by decays of charm mesons. The method combines standard Monte Carlo generators incorporated in the CORSIKA code and phenomenological estimates of the charm hadroproduction.`
We discuss quantum evolution of a decaying state in relation to a recent experiment of Katz et al. Based on exact analytical and numerical solutions of a simple model, we identify a regime where qubit retains coherence over a finite time interval independently of the rates of three competing decoherence processes. In this regime, the quantum decay process can be continuously monitored via a ``weak'' measurement without affecting the qubit coherence.	We investigate the physical property of the kappa parameter and the kappa-distribution in the kappa-deformed statistics, based on Kaniadakis entropy, for a relativistic gas in an electromagnetic field. We derive two relations for the relativistic gas in the framework of kappa-deformed statistics, which describe the physical situation represented by the relativistic kappa-distribution function, provide a reasonable connection between the parameter kappa, the temperature four-gradient and the four-vector potential gradient, and thus present for the case kappa different from zero a clearly physical meaning. It is shown that such a physical situation is a meta-equilibrium state of the system, but has a new physical characteristic.	We analyze 27 house price indexes of Las Vegas from Jun. 1983 to Mar. 2005, corresponding to 27 different zip codes. These analyses confirm the existence of a real-estate bubble, defined as a price acceleration faster than exponential, which is found however to be confined to a rather limited time interval in the recent past from approximately 2003 to mid-2004 and has progressively transformed into a more normal growth rate comparable to pre-bubble levels in 2005. There has been no bubble till 2002 except for a medium-sized surge in 1990. In addition, we have identified a strong yearly periodicity which provides a good potential for fine-tuned prediction from month to month. A monthly monitoring using a model that we have developed could confirm, by testing the intra-year structure, if indeed the market has returned to ``normal'' or if more turbulence is expected ahead. We predict the evolution of the indexes one year ahead, which is validated with new data up to Sep. 2006. The present...

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
learning_rate: 2e-05
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	triplet_eval_cosine_accuracy
-1	-1	-	-	0.8210
0.4122	500	1.4856	1.2697	0.8910
0.8244	1000	0.897	0.9961	0.9250
1.2366	1500	0.5647	1.0038	0.9210
1.6488	2000	0.3959	0.8957	0.9330
2.0610	2500	0.3289	0.8055	0.9220
2.4732	3000	0.1267	0.7920	0.9290
2.8854	3500	0.096	0.8040	0.9320
-1	-1	-	-	0.9400

Framework Versions

Python: 3.11.13
Sentence Transformers: 4.1.0
Transformers: 4.52.4
PyTorch: 2.6.0+cu124
Accelerate: 1.8.1
Datasets: 2.14.4
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

nadrajak
/

allenai-specter-ft

SentenceTransformer based on sentence-transformers/allenai-specter

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Triplet

Triplet

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

TripletLoss

Model tree for nadrajak/allenai-specter-ft

Evaluation results