potsu-potsu's picture
Add new SentenceTransformer model
ca6db2e verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:36470
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: What are the targets of avapritinib?
    sentences:
      - >-
        Origins of DNA replication on eukaryotic genomes have been observed to
        fire 

        during S phase in a coordinated manner. Studies in yeast indicate that
        origin 

        firing is affected by several factors, including checkpoint regulators
        and 

        chromatin modifiers. However, it is unclear what the mechanisms
        orchestrating 

        this coordinated process are. Recent studies have identified factors
        that 

        regulate the timing of origin activation, including Rif1 which plays
        crucial 

        roles in the regulation of the replication timing program in yeast as
        well as in 

        higher eukaryotes. In mammalian cells, Rif1 appears to regulate the
        structures 

        of replication timing domains through its ability to organize chromatin
        loop 

        structures. Regulation of chromatin architecture by Rif1 may be linked
        to other 

        chromosome transactions including recombination, repair, or
        transcription. This 

        review summarizes recent progress in the effort to elucidate the
        regulatory 

        mechanisms of replication timing of eukaryotic replicons.
      - >-
        Avapritinib (AYVAKIT™) is a potent and selective tyrosine kinase
        inhibitor of 

        platelet-derived growth factor receptor alpha (PDGFRA) and KIT
        activation loop 

        mutants. It is being developed by Blueprint Medicines for the treatment
        of 

        gastrointestinal stromal tumours (GIST), solid tumours and systemic 

        mastocytosis. Avapritinib is approved in the USA for PDGFRA exon 18
        (including 

        D842V) mutant GIST and is undergoing regulatory assessment in the USA as
        a 

        4th-line treatment for GIST. Avapritinib is also undergoing regulatory 

        assessment in the EU for PDGFRA D842V mutant GIST. This article
        summarizes the 

        milestones in the development of avapritinib leading to this first
        approval for 

        the treatment of adults with unresectable or metastatic GIST harbouring
        a PDGFRA 

        exon 18 mutation, including PDGFRA D842V mutations. Clinical development
        of 

        avapritinib is also underway for the treatment of systemic mastocytosis
        and 

        late-stage solid tumours in several countries.
      - >-
        PURPOSE: Primary chemotherapy provides an ideal opportunity to correlate
        gene 

        expression with response to treatment. We used paraffin-embedded core
        biopsies 

        from a completed phase II trial to identify genes that correlate with
        response 

        to primary chemotherapy.

        PATIENTS AND METHODS: Patients with newly diagnosed stage II or III
        breast 

        cancer were treated with sequential doxorubicin 75 mg/M2 q2 wks x 3 and 

        docetaxel 40 mg/M2 weekly x 6; treatment order was randomly assigned. 

        Pretreatment core biopsy samples were interrogated for genes that might 

        correlate with pathologic complete response (pCR). In addition to the
        individual 

        genes, the correlation of the Oncotype DX Recurrence Score with pCR was 

        examined.

        RESULTS: Of 70 patients enrolled in the parent trial, core biopsies
        samples with 

        sufficient RNA for gene analyses were available from 45 patients; 9
        (20%) had 

        inflammatory breast cancer (IBC). Six (14%) patients achieved a pCR.
        Twenty-two 

        of the 274 candidate genes assessed correlated with pCR (p < 0.05).
        Genes 

        correlating with pCR could be grouped into three large clusters: 

        angiogenesis-related genes, proliferation related genes, and
        invasion-related 

        genes. Expression of estrogen receptor (ER)-related genes and Recurrence
        Score 

        did not correlate with pCR. In an exploratory analysis we compared gene 

        expression in IBC to non-inflammatory breast cancer; twenty-four (9%) of
        the 

        genes were differentially expressed (p < 0.05), 5 were upregulated and
        19 were 

        downregulated in IBC.

        CONCLUSION: Gene expression analysis on core biopsy samples is feasible
        and 

        identifies candidate genes that correlate with pCR to primary
        chemotherapy. Gene 

        expression in IBC differs significantly from noninflammatory breast
        cancer.
  - source_sentence: List markers for autophagy.
    sentences:
      - >-
        C/EBPbeta is an intrinsically repressed transcription factor that
        regulates 

        genes involved in differentiation, proliferation, tumorigenesis, and
        apoptosis. 

        C/EBPbeta acts as a repressor that is turned into an activator by the
        Ras 

        oncoprotein through phosphorylation of a MAPK site. C/EBPbeta activation
        is 

        accompanied by a conformational change. Active and repressive C/EBPbeta 

        interacts with multisubunit Mediator complexes through the CRSP130/Sur2
        subunit. 

        The CRSP130/Sur2 subunit is common to two distinct types of Mediator
        complexes, 

        characterized by CRSP70 and CDK8 proteins as transcriptionally active
        and 

        inactive Mediator, respectively. Knockdown of CRSP130/Sur2 prevents
        Mediator 

        binding and transactivation through C/EBPbeta. Oncogenic Ras signaling
        or 

        activating mutations in C/EBPbeta selects the transcriptionally active
        Mediator 

        complex that also associates with RNA polymerase II. These results show
        that a 

        Ras-induced structural alteration of C/EBPbeta determines differential
        gene 

        activation through selective interaction with distinct Mediator
        complexes.
      - >-
        Sporadic inclusion body myositis (sIBM) and polymyositis (PM) are
        characterized 

        by muscle inflammation, with sIBM showing additional degenerative
        alterations. 

        In this study we investigated human beta defensins and associated TLRs
        to 

        elucidate the role of the innate immune system in idiopathic
        inflammatory 

        myopathies (IIM), and its association with inflammatory and
        degenerative 

        alterations. Expression levels of human beta-defensin (HBD)-1, HBD-2,
        HBD-3 and 

        TLR2, 3, 4, 7 and 9 were analysed by quantitative real-time PCR in
        skeletal 

        muscle tissue. Localization of HBD-3, collagen 6, dystrophin,
        CD8-positive 

        T-cells, CD-68-positive macrophages, β-amyloid, the autophagy marker
        LC3, and 

        TLR3 were detected by immunofluorescence and co-localization was
        quantified. 

        HBD-3 and all TLRs except for TLR9 were overexpressed in both IIM with 

        significant overexpression of TLR3 in sIBM. HBD-3 showed characteristic 

        intracellular accumulations near deposits of β-amyloid, LC3 and TLR3 in
        sIBM, 

        and was detected in inflammatory infiltrations and macrophages invading
        necrotic 

        muscle fibres in both IIM. The characteristic intracellular localization
        of 

        HBD-3 near markers of degeneration and autophagy, and overexpression of 

        endosomal TLR3 in sIBM hint at different pathogenetic mechanisms in
        sIBM 

        compared with PM. This descriptive study serves as a first approach to
        the role 

        of the innate immune system in sIBM and PM.
      - >-
        Circular RNAs (circRNAs) are a large type of noncoding RNAs
        characterized by 

        their circular shape resulting from covalently closed continuous loops.
        They are 

        known to regulate gene expression in mammals. These tissue-specific
        transcripts 

        are largely generated from exonic or intronic sequences of their host
        genes. 

        Although several models of circRNA biogenesis have been proposed, the 

        understanding of their origin is far from complete. Unlike other
        noncoding RNAs, 

        circRNAs are widely expressed, highly conserved and stable in cytoplasm,
        which 

        confer special functionalities to them. They are known to serve as
        microRNA 

        (miRNA) sponges, regulators of alternative splicing, transcription
        factors and 

        encode for proteins. The expression of circRNAs is associated with
        several 

        pathological states and may potentially serve as novel diagnostic or
        predictive 

        biomarkers. CircRNAs are known to regulate the expression of numerous 

        cancer-related miRNAs. The circRNA-miRNA-mRNA axis is a known regulatory
        pattern 

        of several cancer-associated pathways, with both agonist and antagonist
        effects 

        on carcinogenesis. In consideration of their potential clinical
        relevance, 

        circRNAs are at the center of ongoing research initiatives on cancer
        prevention 

        and treatment. In this review, we discuss the current understanding of
        circRNAs 

        and the prospects for their potential clinical application in the
        management of 

        cancer patients.
  - source_sentence: Where is X-ray free electron laser used?
    sentences:
      - >-
        The phase problem is inherent to crystallographic, astronomical and
        optical 

        imaging where only the intensity of the scattered signal is detected and
        the 

        phase information is lost and must somehow be recovered to reconstruct
        the 

        object's structure. Modern imaging techniques at the molecular scale
        rely on 

        utilizing novel coherent light sources like X-ray free electron lasers
        for the 

        ultimate goal of visualizing such objects as individual biomolecules
        rather than 

        crystals. Here, unlike in the case of crystals where structures can be
        solved by 

        model building and phase refinement, the phase distribution of the wave 

        scattered by an individual molecule must directly be recovered. There
        are two 

        well-known solutions to the phase problem: holography and coherent
        diffraction 

        imaging (CDI). Both techniques have their pros and cons. In holography,
        the 

        reconstruction of the scattered complex-valued object wave is directly
        provided 

        by a well-defined reference wave that must cover the entire detector
        area which 

        often is an experimental challenge. CDI provides the highest possible,
        only 

        wavelength limited, resolution, but the phase recovery is an iterative
        process 

        which requires some pre-defined information about the object and whose
        outcome 

        is not always uniquely-defined. Moreover, the diffraction patterns must
        be 

        recorded under oversampling conditions, a pre-requisite to be able to
        solve the 

        phase problem. Here, we report how holography and CDI can be merged into
        one 

        superior technique: holographic coherent diffraction imaging (HCDI). An
        inline 

        hologram can be recorded by employing a modified CDI experimental
        scheme. We 

        demonstrate that the amplitude of the Fourier transform of an inline
        hologram is 

        related to the complex-valued visibility, thus providing information on
        both, 

        the amplitude and the phase of the scattered wave in the plane of the 

        diffraction pattern. With the phase information available, the condition
        of 

        oversampling the diffraction patterns can be relaxed, and the phase
        problem can 

        be solved in a fast and unambiguous manner. We demonstrate the
        reconstruction of 

        various diffraction patterns of objects recorded with visible light as
        well as 

        with low-energy electrons. Although we have demonstrated our HCDI method
        using 

        laser light and low-energy electrons, it can also be applied to any
        other 

        coherent radiation such as X-rays or high-energy electrons.
      - >-
        We study, using simulated experiments inspired by thin-film magnetic
        domain 

        patterns, the feasibility of phase retrieval in x-ray diffractive
        imaging in the 

        presence of intrinsic charge scattering given only photon-shot-noise
        limited 

        diffraction data. We detail a reconstruction algorithm to recover the
        sample's 

        magnetization distribution under such conditions and compare its
        performance 

        with that of Fourier transform holography. Concerning the design of
        future 

        experiments, we also chart out the reconstruction limits of diffractive
        imaging 

        when photon-shot-noise and the intensity of charge scattering noise are 

        independently varied. This work is directly relevant to the
        time-resolved 

        imaging of magnetic dynamics using coherent and ultrafast radiation from
        x-ray 

        free-electron lasers and also to broader classes of diffractive imaging 

        experiments which suffer noisy data, missing data, or both.
      - >-
        INTRODUCTION: Most cases of Charcot-Marie-Tooth (CMT) disease are caused
        by 

        mutations in the peripheral myelin protein 22 gene (PMP22), including 

        heterozygous duplications (CMT1A), deletions (HNPP), and point
        mutations 

        (CMT1E).

        METHODS: Single-nucleotide polymorphism (SNP) arrays were used to study
        PMP22 

        mutations based on the results of multiplex ligation-dependent probe 

        amplification (MLPA) and polymerase chain reaction-restriction fragment
        length 

        polymorphism methods in 77 Chinese Han families with CMT1. PMP22
        sequencing was 

        performed in MLPA-negative probands. Clinical characteristics were
        collected for 

        all CMT1A/HNPP probands and their family members.

        RESULTS: Twenty-one of 77 CMT1 probands (27.3%) carried
        duplication/deletion 

        (dup/del) copynumber variants. No point mutations were detected. SNP
        array and 

        MLPA seem to have similar sensitivity. Fifty-seven patients from 19
        CMT1A 

        families had the classical CMT phenotype, except for 1 with concomitant
        CIDP. 

        Two HNPP probands presented with acute ulnar nerve palsy or recurrent
        sural 

        nerve palsy, respectively.

        CONCLUSIONS: The SNP array has wide coverage, high sensitivity, and
        high 

        resolution and can be used as a screening tool to detect PMP22 dup/del
        as shown 

        in this Chinese Han population.
  - source_sentence: Which syndromes are associated with heterochromia iridum?
    sentences:
      - >-
        BACKGROUND: The three-dimensional (3D) structure of the genome plays a
        crucial 

        role in gene expression regulation. Chromatin conformation capture
        technologies 

        (Hi-C) have revealed that the genome is organized in a hierarchy of 

        topologically associated domains (TADs), sub-TADs, and chromatin loops. 

        Identifying such hierarchical structures is a critical step in
        understanding 

        genome regulation. Existing tools for TAD calling are frequently
        sensitive to 

        biases in Hi-C data, depend on tunable parameters, and are
        computationally 

        inefficient.

        METHODS: To address these challenges, we developed a novel sliding
        window-based 

        spectral clustering framework that uses gaps between consecutive
        eigenvectors 

        for TAD boundary identification.

        RESULTS: Our method, implemented in an R package, SpectralTAD, detects 

        hierarchical, biologically relevant TADs, has automatic parameter
        selection, is 

        robust to sequencing depth, resolution, and sparsity of Hi-C data.
        SpectralTAD 

        outperforms four state-of-the-art TAD callers in simulated and
        experimental 

        settings. We demonstrate that TAD boundaries shared among multiple
        levels of the 

        TAD hierarchy were more enriched in classical boundary marks and more
        conserved 

        across cell lines and tissues. In contrast, boundaries of TADs that
        cannot be 

        split into sub-TADs showed less enrichment and conservation, suggesting
        their 

        more dynamic role in genome regulation.

        CONCLUSION: SpectralTAD is available on Bioconductor, 

        http://bioconductor.org/packages/SpectralTAD/ .
      - >-
        Holoprosencephaly (HPE) is a congenital defect of the brain, median
        structures, 

        and face resulting from an incomplete cleavage of the primitive brain
        during 

        early embryogenesis. The authors report a case of trisomy 13 syndrome
        diagnosed 

        at prenatal follow up. The preterm newborn lived only 5 hours, and died
        because 

        of severe respiratory failure. The autopsy findings disclosed facial,
        skull, 

        limbs, cardiac, and cerebral malformations. Among the latter, the
        presence of 

        alobar HPE, the central theme of this report, was evident. The most
        common 

        nonrandom chromosomal abnormality in patients with HPE is trisomy 13.
        The most 

        severe variant, namely alobar HPE, is shown in this case report.
        Discussion on 

        this severe anomaly, along with the case report with details of Patau's 

        syndrome, is the goal of this report.
      - >-
        BACKGROUND: Heterochromia iridis, asymmetry of iris pigmentation, has
        been well 

        described with congenital Horner syndrome. Acquired heterochromia
        associated 

        with lesions in the ocular sympathetic pathways in adulthood, however,
        is rare.

        METHODS: Two cases are reported in which sympathectomy in adults
        resulted in 

        ipsilateral Horner syndrome with heterochromia. In each case,
        pharmacologic 

        testing with cocaine and hydroxyamphetamine was performed.

        RESULTS: In both cases, sympathectomy occurred at the level of the
        second order 

        neuron, but hydroxyamphetamine testing suggested at least partial third
        order 

        neuron involvement.

        CONCLUSION: Acquired heterochromia can occur in adults. The partial
        response to 

        hydroxyamphetamine in the two cases presented may reflect
        trans-synaptic 

        degeneration of the postganglionic neuron. A reduction in trophic
        influences on 

        iris melanocytes may have contributed to the observed heterochromia.
  - source_sentence: What is Pseudomelanosis duodeni?
    sentences:
      - >-
        Pseudomelanosis duodeni is a rare condition in which dark pigment
        accumulates in 

        macrophages located in the lamina propria of the duodenal mucosa. Three
        cases 

        are reported here and the literature is reviewed. No clinical
        association can be 

        found that points clearly to the underlying etiology. Electron probe
        x-ray 

        microanalysis was used to study the pigment in macrophage granules in 2
        of our 

        patients and demonstrated high iron and sulfur content. Iron
        accumulation in 

        ferritinlike particles was detected in absorptive cell lysosomes. A
        possible 

        mechanism for the accumulation of absorbed iron by macrophages is
        considered.
      - >-
        Initiation of eukaryotic DNA replication requires phosphorylation of the
        MCM 

        complex by Dbf4-dependent kinase (DDK), composed of Cdc7 kinase and its 

        activator, Dbf4. We report here that budding yeast Rif1
        (Rap1-interacting factor 

        1) controls DNA replication genome-wide and describe how Rif1 opposes
        DDK 

        function by directing Protein Phosphatase 1 (PP1)-mediated
        dephosphorylation of 

        the MCM complex. Deleting RIF1 partially compensates for the limited
        DDK 

        activity in a cdc7-1 mutant strain by allowing increased, premature 

        phosphorylation of Mcm4. PP1 interaction motifs within the Rif1
        N-terminal 

        domain are critical for its repressive effect on replication. We confirm
        that 

        Rif1 interacts with PP1 and that PP1 prevents premature Mcm4
        phosphorylation. 

        Remarkably, our results suggest that replication repression by Rif1 is
        itself 

        also DDK-regulated through phosphorylation near the PP1-interacting
        motifs. 

        Based on our findings, we propose that Rif1 is a novel PP1 substrate
        targeting 

        subunit that counteracts DDK-mediated phosphorylation during
        replication. 

        Fission yeast and mammalian Rif1 proteins have also been implicated in 

        regulating DNA replication. Since PP1 interaction sites are
        evolutionarily 

        conserved within the Rif1 sequence, it is likely that replication
        control by 

        Rif1 through PP1 is a conserved mechanism.
      - >-
        This year marks the 100th anniversary of the deadliest event in human
        history. 

        In 1918-1919, pandemic influenza appeared nearly simultaneously around
        the globe 

        and caused extraordinary mortality (an estimated 50-100 million deaths) 

        associated with unexpected clinical and epidemiological features. The 

        descendants of the 1918 virus remain today; as endemic influenza
        viruses, they 

        cause significant mortality each year. Although the ability to predict
        influenza 

        pandemics remains no better than it was a century ago, numerous
        scientific 

        advances provide an important head start in limiting severe disease and
        death 

        from both current and future influenza viruses: identification and
        substantial 

        characterization of the natural history and pathogenesis of the 1918
        causative 

        virus itself, as well as hundreds of its viral descendants; development
        of 

        moderately effective vaccines; improved diagnosis and treatment of 

        influenza-associated pneumonia; and effective prevention and control
        measures. 

        Remaining challenges include development of vaccines eliciting
        significantly 

        broader protection (against antigenically different influenza viruses)
        that can 

        prevent or significantly downregulate viral replication; more complete 

        characterization of natural history and pathogenesis emphasizing the
        protective 

        role of mucosal immunity; and biomarkers of impending
        influenza-associated 

        pneumonia.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: Biomedical MRL
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.7454031117397454
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8500707213578501
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8910891089108911
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9236209335219236
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7454031117397454
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.6025459688826026
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5270155586987271
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.4107496463932107
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2250855612939271
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.39083578086577686
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4920587987757972
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.626230051288883
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7029780298857469
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8069851372892393
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6478394926216507
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.7340876944837341
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8415841584158416
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8826025459688827
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9193776520509194
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7340876944837341
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.6001885902876002
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5230551626591231
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.40947666195190946
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.22084064700657996
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.38663251424675177
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.48277567466168336
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6228226830239426
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6974451148765582
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.797282840528951
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6413853118418739
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.7213578500707214
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8373408769448374
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8727015558698727
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9066478076379066
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7213578500707214
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5898161244695899
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5106082036775106
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.40282885431400284
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.21851800409940159
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.38143229614225316
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4684035311435285
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.605079189964237
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6838557382875731
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7865545227992186
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6256997609817881
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.6987270155586988
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8161244695898161
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.842998585572843
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8925035360678925
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6987270155586988
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5605846298915605
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.4862800565770863
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.37850070721357854
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2107728574016154
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.3586858510427449
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.437764794946033
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5727124785732842
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6485243360567318
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7642615792191463
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5843572398023539
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.6393210749646393
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7637906647807637
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8132956152758133
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8472418670438473
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6393210749646393
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5134370579915134
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.4517680339462518
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.3490806223479491
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1844044968212152
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.3196269161523018
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.398801159559495
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5072426828594248
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5876796250069416
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7114153252059898
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5187521840685396
            name: Cosine Map@100

Biomedical MRL

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("potsu-potsu/bge-base-mrl-train36k")
# Run inference
sentences = [
    'What is Pseudomelanosis duodeni?',
    'Pseudomelanosis duodeni is a rare condition in which dark pigment accumulates in \nmacrophages located in the lamina propria of the duodenal mucosa. Three cases \nare reported here and the literature is reviewed. No clinical association can be \nfound that points clearly to the underlying etiology. Electron probe x-ray \nmicroanalysis was used to study the pigment in macrophage granules in 2 of our \npatients and demonstrated high iron and sulfur content. Iron accumulation in \nferritinlike particles was detected in absorptive cell lysosomes. A possible \nmechanism for the accumulation of absorbed iron by macrophages is considered.',
    'This year marks the 100th anniversary of the deadliest event in human history. \nIn 1918-1919, pandemic influenza appeared nearly simultaneously around the globe \nand caused extraordinary mortality (an estimated 50-100 million deaths) \nassociated with unexpected clinical and epidemiological features. The \ndescendants of the 1918 virus remain today; as endemic influenza viruses, they \ncause significant mortality each year. Although the ability to predict influenza \npandemics remains no better than it was a century ago, numerous scientific \nadvances provide an important head start in limiting severe disease and death \nfrom both current and future influenza viruses: identification and substantial \ncharacterization of the natural history and pathogenesis of the 1918 causative \nvirus itself, as well as hundreds of its viral descendants; development of \nmoderately effective vaccines; improved diagnosis and treatment of \ninfluenza-associated pneumonia; and effective prevention and control measures. \nRemaining challenges include development of vaccines eliciting significantly \nbroader protection (against antigenically different influenza viruses) that can \nprevent or significantly downregulate viral replication; more complete \ncharacterization of natural history and pathogenesis emphasizing the protective \nrole of mucosal immunity; and biomarkers of impending influenza-associated \npneumonia.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7454
cosine_accuracy@3 0.8501
cosine_accuracy@5 0.8911
cosine_accuracy@10 0.9236
cosine_precision@1 0.7454
cosine_precision@3 0.6025
cosine_precision@5 0.527
cosine_precision@10 0.4107
cosine_recall@1 0.2251
cosine_recall@3 0.3908
cosine_recall@5 0.4921
cosine_recall@10 0.6262
cosine_ndcg@10 0.703
cosine_mrr@10 0.807
cosine_map@100 0.6478

Information Retrieval

Metric Value
cosine_accuracy@1 0.7341
cosine_accuracy@3 0.8416
cosine_accuracy@5 0.8826
cosine_accuracy@10 0.9194
cosine_precision@1 0.7341
cosine_precision@3 0.6002
cosine_precision@5 0.5231
cosine_precision@10 0.4095
cosine_recall@1 0.2208
cosine_recall@3 0.3866
cosine_recall@5 0.4828
cosine_recall@10 0.6228
cosine_ndcg@10 0.6974
cosine_mrr@10 0.7973
cosine_map@100 0.6414

Information Retrieval

Metric Value
cosine_accuracy@1 0.7214
cosine_accuracy@3 0.8373
cosine_accuracy@5 0.8727
cosine_accuracy@10 0.9066
cosine_precision@1 0.7214
cosine_precision@3 0.5898
cosine_precision@5 0.5106
cosine_precision@10 0.4028
cosine_recall@1 0.2185
cosine_recall@3 0.3814
cosine_recall@5 0.4684
cosine_recall@10 0.6051
cosine_ndcg@10 0.6839
cosine_mrr@10 0.7866
cosine_map@100 0.6257

Information Retrieval

Metric Value
cosine_accuracy@1 0.6987
cosine_accuracy@3 0.8161
cosine_accuracy@5 0.843
cosine_accuracy@10 0.8925
cosine_precision@1 0.6987
cosine_precision@3 0.5606
cosine_precision@5 0.4863
cosine_precision@10 0.3785
cosine_recall@1 0.2108
cosine_recall@3 0.3587
cosine_recall@5 0.4378
cosine_recall@10 0.5727
cosine_ndcg@10 0.6485
cosine_mrr@10 0.7643
cosine_map@100 0.5844

Information Retrieval

Metric Value
cosine_accuracy@1 0.6393
cosine_accuracy@3 0.7638
cosine_accuracy@5 0.8133
cosine_accuracy@10 0.8472
cosine_precision@1 0.6393
cosine_precision@3 0.5134
cosine_precision@5 0.4518
cosine_precision@10 0.3491
cosine_recall@1 0.1844
cosine_recall@3 0.3196
cosine_recall@5 0.3988
cosine_recall@10 0.5072
cosine_ndcg@10 0.5877
cosine_mrr@10 0.7114
cosine_map@100 0.5188

Training Details

Training Dataset

Unnamed Dataset

  • Size: 36,470 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 15.86 tokens
    • max: 32 tokens
    • min: 31 tokens
    • mean: 316.54 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What is the implication of histone lysine methylation in medulloblastoma? Recent studies showed frequent mutations in histone H3 lysine 27 (H3K27)
    demethylases in medulloblastomas of Group 3 and Group 4, suggesting a role for
    H3K27 methylation in these tumors. Indeed, trimethylated H3K27 (H3K27me3) levels
    were shown to be higher in Group 3 and 4 tumors compared to WNT and SHH
    medulloblastomas, also in tumors without detectable mutations in demethylases.
    Here, we report that polycomb genes, required for H3K27 methylation, are
    consistently upregulated in Group 3 and 4 tumors. These tumors show high
    expression of the homeobox transcription factor OTX2. Silencing of OTX2 in D425
    medulloblastoma cells resulted in downregulation of polycomb genes such as EZH2,
    EED, SUZ12 and RBBP4 and upregulation of H3K27 demethylases KDM6A, KDM6B, JARID2
    and KDM7A. This was accompanied by decreased H3K27me3 and increased H3K27me1
    levels in promoter regions. Strikingly, the decrease of H3K27me3 was most
    prominent in promoters that bind OTX2. OTX2-bound promoters showe...
    What is the implication of histone lysine methylation in medulloblastoma? We used high-resolution SNP genotyping to identify regions of genomic gain and
    loss in the genomes of 212 medulloblastomas, malignant pediatric brain tumors.
    We found focal amplifications of 15 known oncogenes and focal deletions of 20
    known tumor suppressor genes (TSG), most not previously implicated in
    medulloblastoma. Notably, we identified previously unknown amplifications and
    homozygous deletions, including recurrent, mutually exclusive, highly focal
    genetic events in genes targeting histone lysine methylation, particularly that
    of histone 3, lysine 9 (H3K9). Post-translational modification of histone
    proteins is critical for regulation of gene expression, can participate in
    determination of stem cell fates and has been implicated in carcinogenesis.
    Consistent with our genetic data, restoration of expression of genes controlling
    H3K9 methylation greatly diminishes proliferation of medulloblastoma in vitro.
    Copy number aberrations of genes with critical roles in writing...
    What is the implication of histone lysine methylation in medulloblastoma? Recent sequencing efforts have described the mutational landscape of the
    pediatric brain tumor medulloblastoma. Although MLL2 is among the most frequent
    somatic single nucleotide variants (SNV), the clinical and biological
    significance of these mutations remains uncharacterized. Through targeted
    re-sequencing, we identified mutations of MLL2 in 8 % (14/175) of MBs, the
    majority of which were loss of function. Notably, we also report mutations
    affecting the MLL2-binding partner KDM6A, in 4 % (7/175) of tumors. While MLL2
    mutations were independent of age, gender, histological subtype, M-stage or
    molecular subgroup, KDM6A mutations were most commonly identified in Group 4
    MBs, and were mutually exclusive with MLL2 mutations. Immunohistochemical
    staining for H3K4me3 and H3K27me3, the chromatin effectors of MLL2 and KDM6A
    activity, respectively, demonstrated alterations of the histone code in 24 %
    (53/220) of MBs across all subgroups. Correlating these MLL2- and KDM6A-driven
    h...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1404 10 69.3215 - - - - -
0.2807 20 52.3318 - - - - -
0.4211 30 36.7248 - - - - -
0.5614 40 28.2319 - - - - -
0.7018 50 23.4969 - - - - -
0.8421 60 21.3192 - - - - -
0.9825 70 19.6236 - - - - -
1.0 72 - 0.7044 0.7002 0.6855 0.6454 0.5808
1.1123 80 16.3603 - - - - -
1.2526 90 16.2618 - - - - -
1.3930 100 14.1553 - - - - -
1.5333 110 15.0068 - - - - -
1.6737 120 13.5377 - - - - -
1.8140 130 12.34 - - - - -
1.9544 140 12.8821 - - - - -
2.0 144 - 0.7026 0.6977 0.6823 0.6471 0.5865
2.0842 150 10.9923 - - - - -
2.2246 160 10.6518 - - - - -
2.3649 170 10.9113 - - - - -
2.5053 180 9.8746 - - - - -
2.6456 190 10.8115 - - - - -
2.7860 200 10.3438 - - - - -
2.9263 210 10.729 - - - - -
3.0 216 - 0.7053 0.6971 0.6835 0.6466 0.5846
3.0561 220 9.1357 - - - - -
3.1965 230 9.7226 - - - - -
3.3368 240 9.5715 - - - - -
3.4772 250 9.3667 - - - - -
3.6175 260 9.4973 - - - - -
3.7579 270 8.6693 - - - - -
3.8982 280 9.2358 - - - - -
4.0 288 - 0.703 0.6974 0.6839 0.6485 0.5877
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}