potsu-potsu's picture
Add new SentenceTransformer model
eba4616 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:40482
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: List the deadliest viruses in the world.
    sentences:
      - >-
        Mediator is a large multiprotein complex conserved in all eukaryotes,
        which has 

        a crucial coregulator function in transcription by RNA polymerase II
        (Pol II). 

        However, the molecular mechanisms of its action in vivo remain to be
        understood. 

        Med17 is an essential and central component of the Mediator head module.
        In this 

        work, we utilised our large collection of conditional
        temperature-sensitive 

        med17 mutants to investigate Mediator's role in coordinating
        preinitiation 

        complex (PIC) formation in vivo at the genome level after a transfer to
        a 

        non-permissive temperature for 45 minutes. The effect of a yeast
        mutation 

        proposed to be equivalent to the human Med17-L371P responsible for
        infantile 

        cerebral atrophy was also analyzed. The ChIP-seq results demonstrate
        that med17 

        mutations differentially affected the global presence of several PIC
        components 

        including Mediator, TBP, TFIIH modules and Pol II. Our data show that
        Mediator 

        stabilizes TFIIK kinase and TFIIH core modules independently, suggesting
        that 

        the recruitment or the stability of TFIIH modules is regulated
        independently on 

        yeast genome. We demonstrate that Mediator selectively contributes to
        TBP 

        recruitment or stabilization to chromatin. This study provides an
        extensive 

        genome-wide view of Mediator's role in PIC formation, suggesting that
        Mediator 

        coordinates multiple steps of a PIC assembly pathway.
      - >-
        mTOR complex 2 (mTORC2) signaling is upregulated in multiple types of
        human 

        cancer, but the molecular mechanisms underlying its activation and
        regulation 

        remain elusive. Here, we show that microRNA-mediated upregulation of
        Rictor, an 

        mTORC2-specific component, contributes to tumor progression. Rictor is 

        upregulated via the repression of the miR-424/503 cluster in human
        prostate and 

        colon cancer cell lines that harbor c-Src upregulation and in
        Src-transformed 

        cells. The tumorigenicity and invasive activity of these cells were
        suppressed 

        by re-expression of miR-424/503. Rictor upregulation promotes formation
        of 

        mTORC2 and induces activation of mTORC2, resulting in promotion of tumor
        growth 

        and invasion. Furthermore, downregulation of miR-424/503 is associated
        with 

        Rictor upregulation in colon cancer tissues. These findings suggest that
        the 

        miR-424/503-Rictor pathway plays a crucial role in tumor progression.
      - >-
        This year marks the 100th anniversary of the deadliest event in human
        history. 

        In 1918-1919, pandemic influenza appeared nearly simultaneously around
        the globe 

        and caused extraordinary mortality (an estimated 50-100 million deaths) 

        associated with unexpected clinical and epidemiological features. The 

        descendants of the 1918 virus remain today; as endemic influenza
        viruses, they 

        cause significant mortality each year. Although the ability to predict
        influenza 

        pandemics remains no better than it was a century ago, numerous
        scientific 

        advances provide an important head start in limiting severe disease and
        death 

        from both current and future influenza viruses: identification and
        substantial 

        characterization of the natural history and pathogenesis of the 1918
        causative 

        virus itself, as well as hundreds of its viral descendants; development
        of 

        moderately effective vaccines; improved diagnosis and treatment of 

        influenza-associated pneumonia; and effective prevention and control
        measures. 

        Remaining challenges include development of vaccines eliciting
        significantly 

        broader protection (against antigenically different influenza viruses)
        that can 

        prevent or significantly downregulate viral replication; more complete 

        characterization of natural history and pathogenesis emphasizing the
        protective 

        role of mucosal immunity; and biomarkers of impending
        influenza-associated 

        pneumonia.
  - source_sentence: Where is X-ray free electron laser used?
    sentences:
      - >-
        BACKGROUND: After tooth loss, the posterior maxilla is usually
        characterized by 

        limited bone height secondary to pneumatization of the maxillary sinus
        and/or 

        collapse of the alveolar ridge that preclude in many instances the
        installation 

        of dental implants. In order to compensate for the lack of bone height,
        several 

        treatment options have been proposed. These treatment alternatives aimed
        at the 

        installation of dental implants with or without the utilization of bone
        grafting 

        materials avoiding the perforation of the Schneiderian membrane.
        Nevertheless, 

        membrane perforations represent the most common complication among
        these 

        procedures. Consequently, the present review aimed at the elucidation of
        the 

        relevance of this phenomenon on implant survival and complications.

        MATERIAL AND METHODS: Electronic and manual literature searches were
        performed 

        by two independent reviewers in several databases, including MEDLINE,
        EMBASE, 

        and Cochrane Oral Health Group Trials Register, for articles up to
        January 2018 

        reporting outcome of implant placement perforating the sinus floor
        without 

        regenerative procedure (lateral sinus lift or transalveolar technique)
        and graft 

        material. The intrusion of the implants can occur during drilling or
        implant 

        placement, with and without punch out Schneiderian. Only studies with at
        least 

        6months of follow-up were included in the qualitative assessment.

        RESULTS: Eight studies provided information on the survival rate, with a
        global 

        sample of 493 implants, being the weighted mean survival rate 95.6% (IC
        95%), 

        after 52.7months of follow-up. The level of implant penetration (≤4mm
        or 

        >4mm) did not report statistically significant differences in survival
        rate 

        (p=0.403). Seven studies provided information on the rate of clinical 

        complications, being the mean complication rate 3.4% (IC 95%). The most
        frequent 

        clinical complication was epistaxis, without finding significant
        differences 

        according to the level of penetration. Five studies provide information
        on the 

        radiographic complication; the most common complication was thickening
        of the 

        Schneiderian membrane. The weighted complication rate was 14.8% (IC
        95%), and 

        penetration level affects the rate of radiological complications, being
        these of 

        5.29% in implant penetrating ≤4mm and 29.3% in implant penetrating
        >4mm, 

        without reaching statistical significant difference (p=0.301).

        CONCLUSION: The overall survival rate of the implants into the sinus
        cavity was 

        95.6%, without statistical differences according to the level of
        penetration. 

        The clinical and radiological complications were 3.4% and 14.8%
        respectively. 

        The most frequent clinical complication was the epistaxis, and the
        radiological 

        complication was thickening of the Schneiderian membrane, without
        reaching 

        statistical significant difference according to the level of implant
        penetration 

        inside the sinus.
      - >-
        Ultrashort X-ray pulses from free-electron laser X-ray sources make it
        feasible 

        to conduct small- and wide-angle scattering experiments on biomolecular
        samples 

        in solution at sub-picosecond timescales. During these so-called
        fluctuation 

        scattering experiments, the absence of rotational averaging, typically
        induced 

        by Brownian motion in classic solution-scattering experiments, increases
        the 

        information content of the data. In order to perform shape
        reconstruction or 

        structure refinement from such data, it is essential to compute the
        theoretical 

        profiles from three-dimensional models. Based on the three-dimensional
        Zernike 

        polynomial expansion models, a fast method to compute the theoretical 

        fluctuation scattering profiles has been derived. The theoretical
        profiles have 

        been validated against simulated results obtained from 300000
        scattering 

        patterns for several representative biomolecular species.
      - >-
        Hemophilic Pseudotumor is a rare complication of hemophilia. It  is an
        encapsulated haematoma in patients with haemophilia  which has a
        tendency to progress and produce clinical symptoms related to its
        anatomical location. The lesion most frequently occurs in the long
        bones, pelvis, small bones of the hands and feet, or rarely in the
        maxillofacial region.
  - source_sentence: For the constructions of which organs has 3D printing been tested?
    sentences:
      - >-
        The ability to three-dimensionally interweave biological tissue with
        functional 

        electronics could enable the creation of bionic organs possessing
        enhanced 

        functionalities over their human counterparts. Conventional electronic
        devices 

        are inherently two-dimensional, preventing seamless multidimensional
        integration 

        with synthetic biology, as the processes and materials are very
        different. Here, 

        we present a novel strategy for overcoming these difficulties via
        additive 

        manufacturing of biological cells with structural and nanoparticle
        derived 

        electronic elements. As a proof of concept, we generated a bionic ear
        via 3D 

        printing of a cell-seeded hydrogel matrix in the anatomic geometry of a
        human 

        ear, along with an intertwined conducting polymer consisting of infused
        silver 

        nanoparticles. This allowed for in vitro culturing of cartilage tissue
        around an 

        inductive coil antenna in the ear, which subsequently enables readout
        of 

        inductively-coupled signals from cochlea-shaped electrodes. The printed
        ear 

        exhibits enhanced auditory sensing for radio frequency reception, and 

        complementary left and right ears can listen to stereo audio music.
        Overall, our 

        approach suggests a means to intricately merge biologic and
        nanoelectronic 

        functionalities via 3D printing.
      - >-
        A case of heterochromia iridis and Horner's syndrome is reported in a
        7-year old 

        girl with paravertebral neurilemmoma. These clinical findings can be
        useful in 

        the early diagnosis of mediastinal tumors in the paravertebral axis.
        While 

        typically associated with neuroblastoma, these findings can be due to
        tumors 

        which are inately benign--in this case neurilemmoma. The mechanism for 

        heterochromia is briefly discussed.
      - >-
        The creation of complex neuronal networks relies on ligand-receptor
        interactions 

        that mediate attraction or repulsion towards specific targets.
        Roundabouts 

        comprise a family of single-pass transmembrane receptors facilitating
        this 

        process upon interaction with the soluble extracellular ligand Slit
        protein 

        family emanating from the midline. Due to the complexity and flexible
        nature of 

        Robo receptors , their overall structure has remained elusive until now.
        Recent 

        structural studies of the Robo 1 and Robo 2 ectodomains have provided
        the basis 

        for a better understanding of their signalling mechanism. These
        structures 

        reveal how Robo receptors adopt an auto-inhibited conformation on the
        cell 

        surface that can be further stabilised by cis and/or trans
        oligmerisation 

        arrays. Upon Slit -N binding Robo receptors must undergo a
        conformational change 

        for Ig4 mediated dimerisation and signaling, probably via endocytosis. 

        Furthermore, it's become clear that Robo receptors do not only act
        alone, but as 

        large and more complex cell surface receptor assemblies to manifest
        directional 

        and growth effects in a concerted fashion. These context dependent
        assemblies 

        provide a mechanism to fine tune attractive and repulsive signals in a 

        combinatorial manner required during neuronal development. While a
        mechanistic 

        understanding of Slit mediated Robo signaling has advanced significantly
        further 

        structural studies on larger assemblies are required for the design of
        new 

        experiments to elucidate their role in cell surface receptor complexes.
        These 

        will be necessary to understand the role of Slit -Robo signaling in 

        neurogenesis, angiogenesis, organ development and cancer progression. In
        this 

        chapter, we provide a review of the current knowledge in the field with
        a 

        particular focus on the Roundabout receptor family.
  - source_sentence: For the constructions of which organs has 3D printing been tested?
    sentences:
      - >-
        Objective:To evaluate the value of improved Mallampati grading combined
        with 

        NoSAS questionnaire in screening for obstructive sleep apnea (OSA).
        Method:A 

        total of 344 patients admitted to our hospital for sleep disorders were
        studied. 

        All patients were measured for their height, weight, neck circumference
        and 

        other parameters. NoSAS scores, improved Mallampati grading and
        polysomnography 

        (PSG) were performed in these patients. According to AHI in PSG
        monitoring 

        results, patients were divided into non-osa group (AHI<5) 93 cases and
        OSA group 

        251 cases. The OSA group were divided into mild (AHI 5-15), moderate(AHI
        16-30 

        and severe OSA group(AHI>30) according to the PSG result. The ROC curve
        was 

        plotted to evaluate the screening value of NoSAS and improved Mallampati
        grading 

        combined with NoSAS for OSA. Result:With the NoSAS score of 8 or 9 as
        cutoffs 

        for analysis, the sensitivity for OSA was 0.733 and 0.701; the
        specificity for 

        OSA was 0.538 and 0.624, respectively. The sensitivity and specificity
        of NoSAS 

        combined with improved Mallampati grading for screening OSA were 0.813
        and 

        0.710, respectively. Conclusion:As a new screening tool, NoSAS
        questionnaire is 

        simple and convenient, and has certain screening value to OSA. The
        improved 

        Mallampati grading combined with NoSAS questionnaire can obviously
        improve the 

        screening sensitivity and specificity of Osa, and has higher application
        value.
      - >-
        The morphology and the functionality of the murid glandular complex,
        composed of 

        the submandibular and sublingual salivary glands (SSC), were the object
        of 

        several studies conducted mainly using magnetic resonance imaging (MRI).
        Using a 

        4.7 T scanner and a manganese-based contrast agent, we improved the 

        signal-to-noise ratio of the SSC relating to the surrounding anatomical 

        structures allowing to obtain high-contrast 3D images of the SSC. In the
        last 

        few years, the large development in resin melting techniques opened the
        way for 

        printing 3D objects starting from a 3D stack of images. Here, we
        demonstrate the 

        feasibility of the 3D printing technique of soft tissues such as the SSC
        in the 

        rat with the aim to improve the visualization of the organs. This
        approach is 

        useful to preserve the real in vivo morphology of the SCC in living
        animals 

        avoiding the anatomical shape changes due to the lack of relationships
        with the 

        surrounding organs in case of extraction. It is also harmless,
        repeatable and 

        can be applied to explore volumetric changes occurring during body
        growth, 

        excretory duct obstruction, tumorigenesis and regeneration processes.
        3D 

        printing allows to obtain a solid object with the same shape of the
        organ of 

        interest, which can be observed, freely rotated and manipulated. To
        increase the 

        visibility of the details, it is possible to print the organs with a
        selected 

        zoom factor, useful as in case of tiny organs in small mammalia. An
        immediate 

        application of this technique is represented by educational classes.
      - >-
        Mobile phone use and risk of acoustic neuroma: results of the
        interphone 

        case-control study in five north European countries [corrected].
  - source_sentence: What is known about the Digit Ratio (2D:4D) cancer?
    sentences:
      - >-
        Proteins undergo conformational changes during their biological
        function. As 

        such, a high-resolution structure of a protein's resting conformation
        provides a 

        starting point for elucidating its reaction mechanism, but provides no
        direct 

        information concerning the protein's conformational dynamics. Several
        X-ray 

        methods have been developed to elucidate those conformational changes
        that occur 

        during a protein's reaction, including time-resolved Laue diffraction
        and 

        intermediate trapping studies on three-dimensional protein crystals,
        and 

        time-resolved wide-angle X-ray scattering and X-ray absorption studies
        on 

        proteins in the solution phase. This review emphasizes the scope and
        limitations 

        of these complementary experimental approaches when seeking to
        understand 

        protein conformational dynamics. These methods are illustrated using a
        limited 

        set of examples including myoglobin and haemoglobin in complex with
        carbon 

        monoxide, the simple light-driven proton pump bacteriorhodopsin, and
        the 

        superoxide scavenger superoxide reductase. In conclusion, likely future 

        developments of these methods at synchrotron X-ray sources and the
        potential 

        impact of emerging X-ray free-electron laser facilities are speculated
        upon.
      - >-
        Extensive messenger RNA editing generates transcript and protein
        diversity in genes involved in neural excitability, as previously
        described, as well as in genes participating in a broad range of other
        cellular functions. 
      - >-
        BACKGROUND: The ratio of the lengths of index and ring fingers (2D:4D)
        is a 

        marker of prenatal exposure to sex hormones, with low 2D:4D being
        indicative of 

        high prenatal androgen action. Recent studies have reported a strong
        association 

        between 2D:4D and risk of prostate cancer.

        METHODS: A total of 6258 men participating in the Melbourne
        Collaborative Cohort 

        Study had 2D:4D assessed. Of these men, we identified 686 incident
        prostate 

        cancer cases. Hazard ratios (HRs) and confidence intervals (CIs) were
        estimated 

        for a standard deviation increase in 2D:4D.

        RESULTS: No association was observed between 2D:4D and prostate cancer
        risk 

        overall (HRs 1.00; 95% CIs, 0.92-1.08 for right, 0.93-1.08 for left).
        We 

        observed a weak inverse association between 2D:4D and risk of prostate
        cancer 

        for age <60, however 95% CIs included unity for all observed ages.

        CONCLUSION: Our results are not consistent with an association between
        2D:4D and 

        overall prostate cancer risk, but we cannot exclude a weak inverse
        association 

        between 2D:4D and early onset prostate cancer risk.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: Biomedical MRL
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.7397454031117398
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8472418670438473
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8925035360678925
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9292786421499293
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7397454031117398
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.6058462989156059
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5295615275813296
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.41103253182461097
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.22757153438103173
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.39389351666156774
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4953500769443452
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.626185395476178
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7036538830306982
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8041815406030398
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6499688056459438
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.7326732673267327
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.842998585572843
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8882602545968883
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9151343705799151
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7326732673267327
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5964167845355963
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5278642149929279
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.40990099009900993
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.21918993091456265
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.38673218299790596
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4915208575777972
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6229670136489501
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6971415938662006
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7968989245863362
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6403253251933015
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.7227722772277227
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8373408769448374
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8769448373408769
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9108910891089109
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7227722772277227
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5893446487505893
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5131541725601132
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.4048090523338048
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2165092120706659
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.3843563311047163
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4706508437641641
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6082103871285517
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6857315358161504
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7889281785321389
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6255397978739031
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.7072135785007072
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8076379066478077
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8458274398868458
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8967468175388967
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7072135785007072
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5605846298915607
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.4876944837340877
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.38189533239038187
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2131717638221153
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.3571863197583239
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.44275724893253604
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5763830904405497
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.651957768079385
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7681035450483825
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5861399094808066
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.6435643564356436
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7666195190947667
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8048090523338048
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8415841584158416
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6435643564356436
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5115511551155115
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.45007072135785003
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.3510608203677511
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.18506567524592368
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.3180821001225782
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.3926270123067019
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5118404409971898
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5894018468562044
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7115219685233828
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5197323616049745
            name: Cosine Map@100

Biomedical MRL

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("potsu-potsu/bge-base-mrl-train40k")
# Run inference
sentences = [
    'What is known about the Digit Ratio (2D:4D) cancer?',
    'BACKGROUND: The ratio of the lengths of index and ring fingers (2D:4D) is a \nmarker of prenatal exposure to sex hormones, with low 2D:4D being indicative of \nhigh prenatal androgen action. Recent studies have reported a strong association \nbetween 2D:4D and risk of prostate cancer.\nMETHODS: A total of 6258 men participating in the Melbourne Collaborative Cohort \nStudy had 2D:4D assessed. Of these men, we identified 686 incident prostate \ncancer cases. Hazard ratios (HRs) and confidence intervals (CIs) were estimated \nfor a standard deviation increase in 2D:4D.\nRESULTS: No association was observed between 2D:4D and prostate cancer risk \noverall (HRs 1.00; 95% CIs, 0.92-1.08 for right, 0.93-1.08 for left). We \nobserved a weak inverse association between 2D:4D and risk of prostate cancer \nfor age <60, however 95% CIs included unity for all observed ages.\nCONCLUSION: Our results are not consistent with an association between 2D:4D and \noverall prostate cancer risk, but we cannot exclude a weak inverse association \nbetween 2D:4D and early onset prostate cancer risk.',
    "Proteins undergo conformational changes during their biological function. As \nsuch, a high-resolution structure of a protein's resting conformation provides a \nstarting point for elucidating its reaction mechanism, but provides no direct \ninformation concerning the protein's conformational dynamics. Several X-ray \nmethods have been developed to elucidate those conformational changes that occur \nduring a protein's reaction, including time-resolved Laue diffraction and \nintermediate trapping studies on three-dimensional protein crystals, and \ntime-resolved wide-angle X-ray scattering and X-ray absorption studies on \nproteins in the solution phase. This review emphasizes the scope and limitations \nof these complementary experimental approaches when seeking to understand \nprotein conformational dynamics. These methods are illustrated using a limited \nset of examples including myoglobin and haemoglobin in complex with carbon \nmonoxide, the simple light-driven proton pump bacteriorhodopsin, and the \nsuperoxide scavenger superoxide reductase. In conclusion, likely future \ndevelopments of these methods at synchrotron X-ray sources and the potential \nimpact of emerging X-ray free-electron laser facilities are speculated upon.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7397
cosine_accuracy@3 0.8472
cosine_accuracy@5 0.8925
cosine_accuracy@10 0.9293
cosine_precision@1 0.7397
cosine_precision@3 0.6058
cosine_precision@5 0.5296
cosine_precision@10 0.411
cosine_recall@1 0.2276
cosine_recall@3 0.3939
cosine_recall@5 0.4954
cosine_recall@10 0.6262
cosine_ndcg@10 0.7037
cosine_mrr@10 0.8042
cosine_map@100 0.65

Information Retrieval

Metric Value
cosine_accuracy@1 0.7327
cosine_accuracy@3 0.843
cosine_accuracy@5 0.8883
cosine_accuracy@10 0.9151
cosine_precision@1 0.7327
cosine_precision@3 0.5964
cosine_precision@5 0.5279
cosine_precision@10 0.4099
cosine_recall@1 0.2192
cosine_recall@3 0.3867
cosine_recall@5 0.4915
cosine_recall@10 0.623
cosine_ndcg@10 0.6971
cosine_mrr@10 0.7969
cosine_map@100 0.6403

Information Retrieval

Metric Value
cosine_accuracy@1 0.7228
cosine_accuracy@3 0.8373
cosine_accuracy@5 0.8769
cosine_accuracy@10 0.9109
cosine_precision@1 0.7228
cosine_precision@3 0.5893
cosine_precision@5 0.5132
cosine_precision@10 0.4048
cosine_recall@1 0.2165
cosine_recall@3 0.3844
cosine_recall@5 0.4707
cosine_recall@10 0.6082
cosine_ndcg@10 0.6857
cosine_mrr@10 0.7889
cosine_map@100 0.6255

Information Retrieval

Metric Value
cosine_accuracy@1 0.7072
cosine_accuracy@3 0.8076
cosine_accuracy@5 0.8458
cosine_accuracy@10 0.8967
cosine_precision@1 0.7072
cosine_precision@3 0.5606
cosine_precision@5 0.4877
cosine_precision@10 0.3819
cosine_recall@1 0.2132
cosine_recall@3 0.3572
cosine_recall@5 0.4428
cosine_recall@10 0.5764
cosine_ndcg@10 0.652
cosine_mrr@10 0.7681
cosine_map@100 0.5861

Information Retrieval

Metric Value
cosine_accuracy@1 0.6436
cosine_accuracy@3 0.7666
cosine_accuracy@5 0.8048
cosine_accuracy@10 0.8416
cosine_precision@1 0.6436
cosine_precision@3 0.5116
cosine_precision@5 0.4501
cosine_precision@10 0.3511
cosine_recall@1 0.1851
cosine_recall@3 0.3181
cosine_recall@5 0.3926
cosine_recall@10 0.5118
cosine_ndcg@10 0.5894
cosine_mrr@10 0.7115
cosine_map@100 0.5197

Training Details

Training Dataset

Unnamed Dataset

  • Size: 40,482 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 16.0 tokens
    • max: 32 tokens
    • min: 4 tokens
    • mean: 287.89 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What is the implication of histone lysine methylation in medulloblastoma? Aberrant patterns of H3K4, H3K9, and H3K27 histone lysine methylation were shown to result in histone code alterations, which induce changes in gene expression, and affect the proliferation rate of cells in medulloblastoma.
    What is the implication of histone lysine methylation in medulloblastoma? Recent studies showed frequent mutations in histone H3 lysine 27 (H3K27)
    demethylases in medulloblastomas of Group 3 and Group 4, suggesting a role for
    H3K27 methylation in these tumors. Indeed, trimethylated H3K27 (H3K27me3) levels
    were shown to be higher in Group 3 and 4 tumors compared to WNT and SHH
    medulloblastomas, also in tumors without detectable mutations in demethylases.
    Here, we report that polycomb genes, required for H3K27 methylation, are
    consistently upregulated in Group 3 and 4 tumors. These tumors show high
    expression of the homeobox transcription factor OTX2. Silencing of OTX2 in D425
    medulloblastoma cells resulted in downregulation of polycomb genes such as EZH2,
    EED, SUZ12 and RBBP4 and upregulation of H3K27 demethylases KDM6A, KDM6B, JARID2
    and KDM7A. This was accompanied by decreased H3K27me3 and increased H3K27me1
    levels in promoter regions. Strikingly, the decrease of H3K27me3 was most
    prominent in promoters that bind OTX2. OTX2-bound promoters showe...
    What is the implication of histone lysine methylation in medulloblastoma? We used high-resolution SNP genotyping to identify regions of genomic gain and
    loss in the genomes of 212 medulloblastomas, malignant pediatric brain tumors.
    We found focal amplifications of 15 known oncogenes and focal deletions of 20
    known tumor suppressor genes (TSG), most not previously implicated in
    medulloblastoma. Notably, we identified previously unknown amplifications and
    homozygous deletions, including recurrent, mutually exclusive, highly focal
    genetic events in genes targeting histone lysine methylation, particularly that
    of histone 3, lysine 9 (H3K9). Post-translational modification of histone
    proteins is critical for regulation of gene expression, can participate in
    determination of stem cell fates and has been implicated in carcinogenesis.
    Consistent with our genetic data, restoration of expression of genes controlling
    H3K9 methylation greatly diminishes proliferation of medulloblastoma in vitro.
    Copy number aberrations of genes with critical roles in writing...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1264 10 65.1116 - - - - -
0.2528 20 52.0541 - - - - -
0.3791 30 36.0158 - - - - -
0.5055 40 26.0258 - - - - -
0.6319 50 24.2254 - - - - -
0.7583 60 21.8763 - - - - -
0.8847 70 18.0685 - - - - -
1.0 80 17.7443 0.7094 0.7054 0.6895 0.6487 0.5783
1.1264 90 14.5363 - - - - -
1.2528 100 14.1097 - - - - -
1.3791 110 13.5251 - - - - -
1.5055 120 13.3574 - - - - -
1.6319 130 13.3079 - - - - -
1.7583 140 12.926 - - - - -
1.8847 150 12.0388 - - - - -
2.0 160 10.9161 0.7063 0.7005 0.6880 0.6514 0.5886
2.1264 170 10.7059 - - - - -
2.2528 180 10.1178 - - - - -
2.3791 190 10.4664 - - - - -
2.5055 200 10.4824 - - - - -
2.6319 210 10.2784 - - - - -
2.7583 220 9.2031 - - - - -
2.8847 230 8.9788 - - - - -
3.0 240 7.5905 0.7027 0.6964 0.6855 0.6515 0.5881
3.1264 250 8.4637 - - - - -
3.2528 260 9.4921 - - - - -
3.3791 270 9.0615 - - - - -
3.5055 280 9.0181 - - - - -
3.6319 290 8.6193 - - - - -
3.7583 300 8.3741 - - - - -
3.8847 310 8.9504 - - - - -
4.0 320 7.4761 0.7037 0.6971 0.6857 0.652 0.5894
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}