metadata
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:40482
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: List the deadliest viruses in the world.
sentences:
- >-
Mediator is a large multiprotein complex conserved in all eukaryotes,
which has
a crucial coregulator function in transcription by RNA polymerase II
(Pol II).
However, the molecular mechanisms of its action in vivo remain to be
understood.
Med17 is an essential and central component of the Mediator head module.
In this
work, we utilised our large collection of conditional
temperature-sensitive
med17 mutants to investigate Mediator's role in coordinating
preinitiation
complex (PIC) formation in vivo at the genome level after a transfer to
a
non-permissive temperature for 45 minutes. The effect of a yeast
mutation
proposed to be equivalent to the human Med17-L371P responsible for
infantile
cerebral atrophy was also analyzed. The ChIP-seq results demonstrate
that med17
mutations differentially affected the global presence of several PIC
components
including Mediator, TBP, TFIIH modules and Pol II. Our data show that
Mediator
stabilizes TFIIK kinase and TFIIH core modules independently, suggesting
that
the recruitment or the stability of TFIIH modules is regulated
independently on
yeast genome. We demonstrate that Mediator selectively contributes to
TBP
recruitment or stabilization to chromatin. This study provides an
extensive
genome-wide view of Mediator's role in PIC formation, suggesting that
Mediator
coordinates multiple steps of a PIC assembly pathway.
- >-
mTOR complex 2 (mTORC2) signaling is upregulated in multiple types of
human
cancer, but the molecular mechanisms underlying its activation and
regulation
remain elusive. Here, we show that microRNA-mediated upregulation of
Rictor, an
mTORC2-specific component, contributes to tumor progression. Rictor is
upregulated via the repression of the miR-424/503 cluster in human
prostate and
colon cancer cell lines that harbor c-Src upregulation and in
Src-transformed
cells. The tumorigenicity and invasive activity of these cells were
suppressed
by re-expression of miR-424/503. Rictor upregulation promotes formation
of
mTORC2 and induces activation of mTORC2, resulting in promotion of tumor
growth
and invasion. Furthermore, downregulation of miR-424/503 is associated
with
Rictor upregulation in colon cancer tissues. These findings suggest that
the
miR-424/503-Rictor pathway plays a crucial role in tumor progression.
- >-
This year marks the 100th anniversary of the deadliest event in human
history.
In 1918-1919, pandemic influenza appeared nearly simultaneously around
the globe
and caused extraordinary mortality (an estimated 50-100 million deaths)
associated with unexpected clinical and epidemiological features. The
descendants of the 1918 virus remain today; as endemic influenza
viruses, they
cause significant mortality each year. Although the ability to predict
influenza
pandemics remains no better than it was a century ago, numerous
scientific
advances provide an important head start in limiting severe disease and
death
from both current and future influenza viruses: identification and
substantial
characterization of the natural history and pathogenesis of the 1918
causative
virus itself, as well as hundreds of its viral descendants; development
of
moderately effective vaccines; improved diagnosis and treatment of
influenza-associated pneumonia; and effective prevention and control
measures.
Remaining challenges include development of vaccines eliciting
significantly
broader protection (against antigenically different influenza viruses)
that can
prevent or significantly downregulate viral replication; more complete
characterization of natural history and pathogenesis emphasizing the
protective
role of mucosal immunity; and biomarkers of impending
influenza-associated
pneumonia.
- source_sentence: Where is X-ray free electron laser used?
sentences:
- >-
BACKGROUND: After tooth loss, the posterior maxilla is usually
characterized by
limited bone height secondary to pneumatization of the maxillary sinus
and/or
collapse of the alveolar ridge that preclude in many instances the
installation
of dental implants. In order to compensate for the lack of bone height,
several
treatment options have been proposed. These treatment alternatives aimed
at the
installation of dental implants with or without the utilization of bone
grafting
materials avoiding the perforation of the Schneiderian membrane.
Nevertheless,
membrane perforations represent the most common complication among
these
procedures. Consequently, the present review aimed at the elucidation of
the
relevance of this phenomenon on implant survival and complications.
MATERIAL AND METHODS: Electronic and manual literature searches were
performed
by two independent reviewers in several databases, including MEDLINE,
EMBASE,
and Cochrane Oral Health Group Trials Register, for articles up to
January 2018
reporting outcome of implant placement perforating the sinus floor
without
regenerative procedure (lateral sinus lift or transalveolar technique)
and graft
material. The intrusion of the implants can occur during drilling or
implant
placement, with and without punch out Schneiderian. Only studies with at
least
6 months of follow-up were included in the qualitative assessment.
RESULTS: Eight studies provided information on the survival rate, with a
global
sample of 493 implants, being the weighted mean survival rate 95.6% (IC
95%),
after 52.7 months of follow-up. The level of implant penetration (≤ 4 mm
or
> 4 mm) did not report statistically significant differences in survival
rate
(p = 0.403). Seven studies provided information on the rate of clinical
complications, being the mean complication rate 3.4% (IC 95%). The most
frequent
clinical complication was epistaxis, without finding significant
differences
according to the level of penetration. Five studies provide information
on the
radiographic complication; the most common complication was thickening
of the
Schneiderian membrane. The weighted complication rate was 14.8% (IC
95%), and
penetration level affects the rate of radiological complications, being
these of
5.29% in implant penetrating ≤4 mm and 29.3% in implant penetrating
> 4 mm,
without reaching statistical significant difference (p = 0.301).
CONCLUSION: The overall survival rate of the implants into the sinus
cavity was
95.6%, without statistical differences according to the level of
penetration.
The clinical and radiological complications were 3.4% and 14.8%
respectively.
The most frequent clinical complication was the epistaxis, and the
radiological
complication was thickening of the Schneiderian membrane, without
reaching
statistical significant difference according to the level of implant
penetration
inside the sinus.
- >-
Ultrashort X-ray pulses from free-electron laser X-ray sources make it
feasible
to conduct small- and wide-angle scattering experiments on biomolecular
samples
in solution at sub-picosecond timescales. During these so-called
fluctuation
scattering experiments, the absence of rotational averaging, typically
induced
by Brownian motion in classic solution-scattering experiments, increases
the
information content of the data. In order to perform shape
reconstruction or
structure refinement from such data, it is essential to compute the
theoretical
profiles from three-dimensional models. Based on the three-dimensional
Zernike
polynomial expansion models, a fast method to compute the theoretical
fluctuation scattering profiles has been derived. The theoretical
profiles have
been validated against simulated results obtained from 300 000
scattering
patterns for several representative biomolecular species.
- >-
Hemophilic Pseudotumor is a rare complication of hemophilia. It is an
encapsulated haematoma in patients with haemophilia which has a
tendency to progress and produce clinical symptoms related to its
anatomical location. The lesion most frequently occurs in the long
bones, pelvis, small bones of the hands and feet, or rarely in the
maxillofacial region.
- source_sentence: For the constructions of which organs has 3D printing been tested?
sentences:
- >-
The ability to three-dimensionally interweave biological tissue with
functional
electronics could enable the creation of bionic organs possessing
enhanced
functionalities over their human counterparts. Conventional electronic
devices
are inherently two-dimensional, preventing seamless multidimensional
integration
with synthetic biology, as the processes and materials are very
different. Here,
we present a novel strategy for overcoming these difficulties via
additive
manufacturing of biological cells with structural and nanoparticle
derived
electronic elements. As a proof of concept, we generated a bionic ear
via 3D
printing of a cell-seeded hydrogel matrix in the anatomic geometry of a
human
ear, along with an intertwined conducting polymer consisting of infused
silver
nanoparticles. This allowed for in vitro culturing of cartilage tissue
around an
inductive coil antenna in the ear, which subsequently enables readout
of
inductively-coupled signals from cochlea-shaped electrodes. The printed
ear
exhibits enhanced auditory sensing for radio frequency reception, and
complementary left and right ears can listen to stereo audio music.
Overall, our
approach suggests a means to intricately merge biologic and
nanoelectronic
functionalities via 3D printing.
- >-
A case of heterochromia iridis and Horner's syndrome is reported in a
7-year old
girl with paravertebral neurilemmoma. These clinical findings can be
useful in
the early diagnosis of mediastinal tumors in the paravertebral axis.
While
typically associated with neuroblastoma, these findings can be due to
tumors
which are inately benign--in this case neurilemmoma. The mechanism for
heterochromia is briefly discussed.
- >-
The creation of complex neuronal networks relies on ligand-receptor
interactions
that mediate attraction or repulsion towards specific targets.
Roundabouts
comprise a family of single-pass transmembrane receptors facilitating
this
process upon interaction with the soluble extracellular ligand Slit
protein
family emanating from the midline. Due to the complexity and flexible
nature of
Robo receptors , their overall structure has remained elusive until now.
Recent
structural studies of the Robo 1 and Robo 2 ectodomains have provided
the basis
for a better understanding of their signalling mechanism. These
structures
reveal how Robo receptors adopt an auto-inhibited conformation on the
cell
surface that can be further stabilised by cis and/or trans
oligmerisation
arrays. Upon Slit -N binding Robo receptors must undergo a
conformational change
for Ig4 mediated dimerisation and signaling, probably via endocytosis.
Furthermore, it's become clear that Robo receptors do not only act
alone, but as
large and more complex cell surface receptor assemblies to manifest
directional
and growth effects in a concerted fashion. These context dependent
assemblies
provide a mechanism to fine tune attractive and repulsive signals in a
combinatorial manner required during neuronal development. While a
mechanistic
understanding of Slit mediated Robo signaling has advanced significantly
further
structural studies on larger assemblies are required for the design of
new
experiments to elucidate their role in cell surface receptor complexes.
These
will be necessary to understand the role of Slit -Robo signaling in
neurogenesis, angiogenesis, organ development and cancer progression. In
this
chapter, we provide a review of the current knowledge in the field with
a
particular focus on the Roundabout receptor family.
- source_sentence: For the constructions of which organs has 3D printing been tested?
sentences:
- >-
Objective:To evaluate the value of improved Mallampati grading combined
with
NoSAS questionnaire in screening for obstructive sleep apnea (OSA).
Method:A
total of 344 patients admitted to our hospital for sleep disorders were
studied.
All patients were measured for their height, weight, neck circumference
and
other parameters. NoSAS scores, improved Mallampati grading and
polysomnography
(PSG) were performed in these patients. According to AHI in PSG
monitoring
results, patients were divided into non-osa group (AHI<5) 93 cases and
OSA group
251 cases. The OSA group were divided into mild (AHI 5-15), moderate(AHI
16-30)
and severe OSA group(AHI>30) according to the PSG result. The ROC curve
was
plotted to evaluate the screening value of NoSAS and improved Mallampati
grading
combined with NoSAS for OSA. Result:With the NoSAS score of 8 or 9 as
cutoffs
for analysis, the sensitivity for OSA was 0.733 and 0.701; the
specificity for
OSA was 0.538 and 0.624, respectively. The sensitivity and specificity
of NoSAS
combined with improved Mallampati grading for screening OSA were 0.813
and
0.710, respectively. Conclusion:As a new screening tool, NoSAS
questionnaire is
simple and convenient, and has certain screening value to OSA. The
improved
Mallampati grading combined with NoSAS questionnaire can obviously
improve the
screening sensitivity and specificity of Osa, and has higher application
value.
- >-
The morphology and the functionality of the murid glandular complex,
composed of
the submandibular and sublingual salivary glands (SSC), were the object
of
several studies conducted mainly using magnetic resonance imaging (MRI).
Using a
4.7 T scanner and a manganese-based contrast agent, we improved the
signal-to-noise ratio of the SSC relating to the surrounding anatomical
structures allowing to obtain high-contrast 3D images of the SSC. In the
last
few years, the large development in resin melting techniques opened the
way for
printing 3D objects starting from a 3D stack of images. Here, we
demonstrate the
feasibility of the 3D printing technique of soft tissues such as the SSC
in the
rat with the aim to improve the visualization of the organs. This
approach is
useful to preserve the real in vivo morphology of the SCC in living
animals
avoiding the anatomical shape changes due to the lack of relationships
with the
surrounding organs in case of extraction. It is also harmless,
repeatable and
can be applied to explore volumetric changes occurring during body
growth,
excretory duct obstruction, tumorigenesis and regeneration processes.
3D
printing allows to obtain a solid object with the same shape of the
organ of
interest, which can be observed, freely rotated and manipulated. To
increase the
visibility of the details, it is possible to print the organs with a
selected
zoom factor, useful as in case of tiny organs in small mammalia. An
immediate
application of this technique is represented by educational classes.
- >-
Mobile phone use and risk of acoustic neuroma: results of the
interphone
case-control study in five north European countries [corrected].
- source_sentence: What is known about the Digit Ratio (2D:4D) cancer?
sentences:
- >-
Proteins undergo conformational changes during their biological
function. As
such, a high-resolution structure of a protein's resting conformation
provides a
starting point for elucidating its reaction mechanism, but provides no
direct
information concerning the protein's conformational dynamics. Several
X-ray
methods have been developed to elucidate those conformational changes
that occur
during a protein's reaction, including time-resolved Laue diffraction
and
intermediate trapping studies on three-dimensional protein crystals,
and
time-resolved wide-angle X-ray scattering and X-ray absorption studies
on
proteins in the solution phase. This review emphasizes the scope and
limitations
of these complementary experimental approaches when seeking to
understand
protein conformational dynamics. These methods are illustrated using a
limited
set of examples including myoglobin and haemoglobin in complex with
carbon
monoxide, the simple light-driven proton pump bacteriorhodopsin, and
the
superoxide scavenger superoxide reductase. In conclusion, likely future
developments of these methods at synchrotron X-ray sources and the
potential
impact of emerging X-ray free-electron laser facilities are speculated
upon.
- >-
Extensive messenger RNA editing generates transcript and protein
diversity in genes involved in neural excitability, as previously
described, as well as in genes participating in a broad range of other
cellular functions.
- >-
BACKGROUND: The ratio of the lengths of index and ring fingers (2D:4D)
is a
marker of prenatal exposure to sex hormones, with low 2D:4D being
indicative of
high prenatal androgen action. Recent studies have reported a strong
association
between 2D:4D and risk of prostate cancer.
METHODS: A total of 6258 men participating in the Melbourne
Collaborative Cohort
Study had 2D:4D assessed. Of these men, we identified 686 incident
prostate
cancer cases. Hazard ratios (HRs) and confidence intervals (CIs) were
estimated
for a standard deviation increase in 2D:4D.
RESULTS: No association was observed between 2D:4D and prostate cancer
risk
overall (HRs 1.00; 95% CIs, 0.92-1.08 for right, 0.93-1.08 for left).
We
observed a weak inverse association between 2D:4D and risk of prostate
cancer
for age <60, however 95% CIs included unity for all observed ages.
CONCLUSION: Our results are not consistent with an association between
2D:4D and
overall prostate cancer risk, but we cannot exclude a weak inverse
association
between 2D:4D and early onset prostate cancer risk.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: Biomedical MRL
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.7397454031117398
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8472418670438473
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8925035360678925
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9292786421499293
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7397454031117398
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.6058462989156059
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.5295615275813296
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.41103253182461097
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.22757153438103173
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.39389351666156774
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.4953500769443452
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.626185395476178
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7036538830306982
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.8041815406030398
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6499688056459438
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.7326732673267327
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.842998585572843
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8882602545968883
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9151343705799151
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7326732673267327
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5964167845355963
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.5278642149929279
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.40990099009900993
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.21918993091456265
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.38673218299790596
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.4915208575777972
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.6229670136489501
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6971415938662006
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7968989245863362
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6403253251933015
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.7227722772277227
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8373408769448374
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8769448373408769
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9108910891089109
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7227722772277227
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5893446487505893
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.5131541725601132
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.4048090523338048
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.2165092120706659
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.3843563311047163
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.4706508437641641
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.6082103871285517
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6857315358161504
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7889281785321389
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6255397978739031
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.7072135785007072
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8076379066478077
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8458274398868458
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8967468175388967
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7072135785007072
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5605846298915607
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.4876944837340877
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.38189533239038187
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.2131717638221153
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.3571863197583239
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.44275724893253604
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.5763830904405497
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.651957768079385
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7681035450483825
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.5861399094808066
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.6435643564356436
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.7666195190947667
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8048090523338048
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8415841584158416
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6435643564356436
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5115511551155115
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.45007072135785003
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.3510608203677511
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.18506567524592368
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.3180821001225782
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.3926270123067019
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.5118404409971898
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.5894018468562044
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7115219685233828
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.5197323616049745
name: Cosine Map@100
Biomedical MRL
This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Language: en
- License: apache-2.0
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("potsu-potsu/bge-base-mrl-train40k")
sentences = [
'What is known about the Digit Ratio (2D:4D) cancer?',
'BACKGROUND: The ratio of the lengths of index and ring fingers (2D:4D) is a \nmarker of prenatal exposure to sex hormones, with low 2D:4D being indicative of \nhigh prenatal androgen action. Recent studies have reported a strong association \nbetween 2D:4D and risk of prostate cancer.\nMETHODS: A total of 6258 men participating in the Melbourne Collaborative Cohort \nStudy had 2D:4D assessed. Of these men, we identified 686 incident prostate \ncancer cases. Hazard ratios (HRs) and confidence intervals (CIs) were estimated \nfor a standard deviation increase in 2D:4D.\nRESULTS: No association was observed between 2D:4D and prostate cancer risk \noverall (HRs 1.00; 95% CIs, 0.92-1.08 for right, 0.93-1.08 for left). We \nobserved a weak inverse association between 2D:4D and risk of prostate cancer \nfor age <60, however 95% CIs included unity for all observed ages.\nCONCLUSION: Our results are not consistent with an association between 2D:4D and \noverall prostate cancer risk, but we cannot exclude a weak inverse association \nbetween 2D:4D and early onset prostate cancer risk.',
"Proteins undergo conformational changes during their biological function. As \nsuch, a high-resolution structure of a protein's resting conformation provides a \nstarting point for elucidating its reaction mechanism, but provides no direct \ninformation concerning the protein's conformational dynamics. Several X-ray \nmethods have been developed to elucidate those conformational changes that occur \nduring a protein's reaction, including time-resolved Laue diffraction and \nintermediate trapping studies on three-dimensional protein crystals, and \ntime-resolved wide-angle X-ray scattering and X-ray absorption studies on \nproteins in the solution phase. This review emphasizes the scope and limitations \nof these complementary experimental approaches when seeking to understand \nprotein conformational dynamics. These methods are illustrated using a limited \nset of examples including myoglobin and haemoglobin in complex with carbon \nmonoxide, the simple light-driven proton pump bacteriorhodopsin, and the \nsuperoxide scavenger superoxide reductase. In conclusion, likely future \ndevelopments of these methods at synchrotron X-ray sources and the potential \nimpact of emerging X-ray free-electron laser facilities are speculated upon.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.7397 |
cosine_accuracy@3 |
0.8472 |
cosine_accuracy@5 |
0.8925 |
cosine_accuracy@10 |
0.9293 |
cosine_precision@1 |
0.7397 |
cosine_precision@3 |
0.6058 |
cosine_precision@5 |
0.5296 |
cosine_precision@10 |
0.411 |
cosine_recall@1 |
0.2276 |
cosine_recall@3 |
0.3939 |
cosine_recall@5 |
0.4954 |
cosine_recall@10 |
0.6262 |
cosine_ndcg@10 |
0.7037 |
cosine_mrr@10 |
0.8042 |
cosine_map@100 |
0.65 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.7327 |
cosine_accuracy@3 |
0.843 |
cosine_accuracy@5 |
0.8883 |
cosine_accuracy@10 |
0.9151 |
cosine_precision@1 |
0.7327 |
cosine_precision@3 |
0.5964 |
cosine_precision@5 |
0.5279 |
cosine_precision@10 |
0.4099 |
cosine_recall@1 |
0.2192 |
cosine_recall@3 |
0.3867 |
cosine_recall@5 |
0.4915 |
cosine_recall@10 |
0.623 |
cosine_ndcg@10 |
0.6971 |
cosine_mrr@10 |
0.7969 |
cosine_map@100 |
0.6403 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.7228 |
cosine_accuracy@3 |
0.8373 |
cosine_accuracy@5 |
0.8769 |
cosine_accuracy@10 |
0.9109 |
cosine_precision@1 |
0.7228 |
cosine_precision@3 |
0.5893 |
cosine_precision@5 |
0.5132 |
cosine_precision@10 |
0.4048 |
cosine_recall@1 |
0.2165 |
cosine_recall@3 |
0.3844 |
cosine_recall@5 |
0.4707 |
cosine_recall@10 |
0.6082 |
cosine_ndcg@10 |
0.6857 |
cosine_mrr@10 |
0.7889 |
cosine_map@100 |
0.6255 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.7072 |
cosine_accuracy@3 |
0.8076 |
cosine_accuracy@5 |
0.8458 |
cosine_accuracy@10 |
0.8967 |
cosine_precision@1 |
0.7072 |
cosine_precision@3 |
0.5606 |
cosine_precision@5 |
0.4877 |
cosine_precision@10 |
0.3819 |
cosine_recall@1 |
0.2132 |
cosine_recall@3 |
0.3572 |
cosine_recall@5 |
0.4428 |
cosine_recall@10 |
0.5764 |
cosine_ndcg@10 |
0.652 |
cosine_mrr@10 |
0.7681 |
cosine_map@100 |
0.5861 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.6436 |
cosine_accuracy@3 |
0.7666 |
cosine_accuracy@5 |
0.8048 |
cosine_accuracy@10 |
0.8416 |
cosine_precision@1 |
0.6436 |
cosine_precision@3 |
0.5116 |
cosine_precision@5 |
0.4501 |
cosine_precision@10 |
0.3511 |
cosine_recall@1 |
0.1851 |
cosine_recall@3 |
0.3181 |
cosine_recall@5 |
0.3926 |
cosine_recall@10 |
0.5118 |
cosine_ndcg@10 |
0.5894 |
cosine_mrr@10 |
0.7115 |
cosine_map@100 |
0.5197 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 40,482 training samples
- Columns:
anchor
and positive
- Approximate statistics based on the first 1000 samples:
|
anchor |
positive |
type |
string |
string |
details |
- min: 6 tokens
- mean: 16.0 tokens
- max: 32 tokens
|
- min: 4 tokens
- mean: 287.89 tokens
- max: 512 tokens
|
- Samples:
anchor |
positive |
What is the implication of histone lysine methylation in medulloblastoma? |
Aberrant patterns of H3K4, H3K9, and H3K27 histone lysine methylation were shown to result in histone code alterations, which induce changes in gene expression, and affect the proliferation rate of cells in medulloblastoma. |
What is the implication of histone lysine methylation in medulloblastoma? |
Recent studies showed frequent mutations in histone H3 lysine 27 (H3K27) demethylases in medulloblastomas of Group 3 and Group 4, suggesting a role for H3K27 methylation in these tumors. Indeed, trimethylated H3K27 (H3K27me3) levels were shown to be higher in Group 3 and 4 tumors compared to WNT and SHH medulloblastomas, also in tumors without detectable mutations in demethylases. Here, we report that polycomb genes, required for H3K27 methylation, are consistently upregulated in Group 3 and 4 tumors. These tumors show high expression of the homeobox transcription factor OTX2. Silencing of OTX2 in D425 medulloblastoma cells resulted in downregulation of polycomb genes such as EZH2, EED, SUZ12 and RBBP4 and upregulation of H3K27 demethylases KDM6A, KDM6B, JARID2 and KDM7A. This was accompanied by decreased H3K27me3 and increased H3K27me1 levels in promoter regions. Strikingly, the decrease of H3K27me3 was most prominent in promoters that bind OTX2. OTX2-bound promoters showe... |
What is the implication of histone lysine methylation in medulloblastoma? |
We used high-resolution SNP genotyping to identify regions of genomic gain and loss in the genomes of 212 medulloblastomas, malignant pediatric brain tumors. We found focal amplifications of 15 known oncogenes and focal deletions of 20 known tumor suppressor genes (TSG), most not previously implicated in medulloblastoma. Notably, we identified previously unknown amplifications and homozygous deletions, including recurrent, mutually exclusive, highly focal genetic events in genes targeting histone lysine methylation, particularly that of histone 3, lysine 9 (H3K9). Post-translational modification of histone proteins is critical for regulation of gene expression, can participate in determination of stem cell fates and has been implicated in carcinogenesis. Consistent with our genetic data, restoration of expression of genes controlling H3K9 methylation greatly diminishes proliferation of medulloblastoma in vitro. Copy number aberrations of genes with critical roles in writing... |
- Loss:
MatryoshkaLoss
with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epoch
per_device_train_batch_size
: 32
per_device_eval_batch_size
: 16
gradient_accumulation_steps
: 16
learning_rate
: 2e-05
num_train_epochs
: 4
lr_scheduler_type
: cosine
warmup_ratio
: 0.1
bf16
: True
tf32
: True
load_best_model_at_end
: True
optim
: adamw_torch_fused
batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: False
do_predict
: False
eval_strategy
: epoch
prediction_loss_only
: True
per_device_train_batch_size
: 32
per_device_eval_batch_size
: 16
per_gpu_train_batch_size
: None
per_gpu_eval_batch_size
: None
gradient_accumulation_steps
: 16
eval_accumulation_steps
: None
torch_empty_cache_steps
: None
learning_rate
: 2e-05
weight_decay
: 0.0
adam_beta1
: 0.9
adam_beta2
: 0.999
adam_epsilon
: 1e-08
max_grad_norm
: 1.0
num_train_epochs
: 4
max_steps
: -1
lr_scheduler_type
: cosine
lr_scheduler_kwargs
: {}
warmup_ratio
: 0.1
warmup_steps
: 0
log_level
: passive
log_level_replica
: warning
log_on_each_node
: True
logging_nan_inf_filter
: True
save_safetensors
: True
save_on_each_node
: False
save_only_model
: False
restore_callback_states_from_checkpoint
: False
no_cuda
: False
use_cpu
: False
use_mps_device
: False
seed
: 42
data_seed
: None
jit_mode_eval
: False
use_ipex
: False
bf16
: True
fp16
: False
fp16_opt_level
: O1
half_precision_backend
: auto
bf16_full_eval
: False
fp16_full_eval
: False
tf32
: True
local_rank
: 0
ddp_backend
: None
tpu_num_cores
: None
tpu_metrics_debug
: False
debug
: []
dataloader_drop_last
: False
dataloader_num_workers
: 0
dataloader_prefetch_factor
: None
past_index
: -1
disable_tqdm
: False
remove_unused_columns
: True
label_names
: None
load_best_model_at_end
: True
ignore_data_skip
: False
fsdp
: []
fsdp_min_num_params
: 0
fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap
: None
accelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed
: None
label_smoothing_factor
: 0.0
optim
: adamw_torch_fused
optim_args
: None
adafactor
: False
group_by_length
: False
length_column_name
: length
ddp_find_unused_parameters
: None
ddp_bucket_cap_mb
: None
ddp_broadcast_buffers
: False
dataloader_pin_memory
: True
dataloader_persistent_workers
: False
skip_memory_metrics
: True
use_legacy_prediction_loop
: False
push_to_hub
: False
resume_from_checkpoint
: None
hub_model_id
: None
hub_strategy
: every_save
hub_private_repo
: None
hub_always_push
: False
gradient_checkpointing
: False
gradient_checkpointing_kwargs
: None
include_inputs_for_metrics
: False
include_for_metrics
: []
eval_do_concat_batches
: True
fp16_backend
: auto
push_to_hub_model_id
: None
push_to_hub_organization
: None
mp_parameters
:
auto_find_batch_size
: False
full_determinism
: False
torchdynamo
: None
ray_scope
: last
ddp_timeout
: 1800
torch_compile
: False
torch_compile_backend
: None
torch_compile_mode
: None
include_tokens_per_second
: False
include_num_input_tokens_seen
: False
neftune_noise_alpha
: None
optim_target_modules
: None
batch_eval_metrics
: False
eval_on_start
: False
use_liger_kernel
: False
eval_use_gather_object
: False
average_tokens_across_devices
: False
prompts
: None
batch_sampler
: no_duplicates
multi_dataset_batch_sampler
: proportional
Training Logs
Epoch |
Step |
Training Loss |
dim_768_cosine_ndcg@10 |
dim_512_cosine_ndcg@10 |
dim_256_cosine_ndcg@10 |
dim_128_cosine_ndcg@10 |
dim_64_cosine_ndcg@10 |
0.1264 |
10 |
65.1116 |
- |
- |
- |
- |
- |
0.2528 |
20 |
52.0541 |
- |
- |
- |
- |
- |
0.3791 |
30 |
36.0158 |
- |
- |
- |
- |
- |
0.5055 |
40 |
26.0258 |
- |
- |
- |
- |
- |
0.6319 |
50 |
24.2254 |
- |
- |
- |
- |
- |
0.7583 |
60 |
21.8763 |
- |
- |
- |
- |
- |
0.8847 |
70 |
18.0685 |
- |
- |
- |
- |
- |
1.0 |
80 |
17.7443 |
0.7094 |
0.7054 |
0.6895 |
0.6487 |
0.5783 |
1.1264 |
90 |
14.5363 |
- |
- |
- |
- |
- |
1.2528 |
100 |
14.1097 |
- |
- |
- |
- |
- |
1.3791 |
110 |
13.5251 |
- |
- |
- |
- |
- |
1.5055 |
120 |
13.3574 |
- |
- |
- |
- |
- |
1.6319 |
130 |
13.3079 |
- |
- |
- |
- |
- |
1.7583 |
140 |
12.926 |
- |
- |
- |
- |
- |
1.8847 |
150 |
12.0388 |
- |
- |
- |
- |
- |
2.0 |
160 |
10.9161 |
0.7063 |
0.7005 |
0.6880 |
0.6514 |
0.5886 |
2.1264 |
170 |
10.7059 |
- |
- |
- |
- |
- |
2.2528 |
180 |
10.1178 |
- |
- |
- |
- |
- |
2.3791 |
190 |
10.4664 |
- |
- |
- |
- |
- |
2.5055 |
200 |
10.4824 |
- |
- |
- |
- |
- |
2.6319 |
210 |
10.2784 |
- |
- |
- |
- |
- |
2.7583 |
220 |
9.2031 |
- |
- |
- |
- |
- |
2.8847 |
230 |
8.9788 |
- |
- |
- |
- |
- |
3.0 |
240 |
7.5905 |
0.7027 |
0.6964 |
0.6855 |
0.6515 |
0.5881 |
3.1264 |
250 |
8.4637 |
- |
- |
- |
- |
- |
3.2528 |
260 |
9.4921 |
- |
- |
- |
- |
- |
3.3791 |
270 |
9.0615 |
- |
- |
- |
- |
- |
3.5055 |
280 |
9.0181 |
- |
- |
- |
- |
- |
3.6319 |
290 |
8.6193 |
- |
- |
- |
- |
- |
3.7583 |
300 |
8.3741 |
- |
- |
- |
- |
- |
3.8847 |
310 |
8.9504 |
- |
- |
- |
- |
- |
4.0 |
320 |
7.4761 |
0.7037 |
0.6971 |
0.6857 |
0.652 |
0.5894 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.6
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}