amentaphd's picture
Upload folder using huggingface_hub
9b08750 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:46338
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-m-v2.0
widget:
  - source_sentence: >-
      What role does ESMA play in the development of guidelines and regulatory
      technical standards related to cooperation arrangements with third
      countries as mentioned in the text?
    sentences:
      - >-
        If a planned change is implemented notwithstanding the first and second
        subparagraphs, or if an unplanned change has taken place pursuant to
        which the AIFM’s management of the AIF would no longer comply with this
        Directive or the AIFM otherwise would no longer comply with this
        Directive, the competent authorities of the home Member State of the
        AIFM shall take all due measures in accordance with Article 46,
        including, if necessary, the express prohibition of marketing of the
        AIF.


        If the changes are acceptable because they do not affect the compliance
        of the AIFM’s management of the AIF with this Directive, or the
        compliance by the AIFM with this Directive otherwise, the competent
        authorities of the home Member State of the AIFM shall, without delay,
        inform ESMA in so far as the changes concern the termination of the
        marketing of certain AIFs or additional AIFs marketed and, if
        applicable, the competent authorities of the host Member States of the
        AIFM of those changes.


        11.


        The Commission shall adopt, by means of delegated acts in accordance
        with Article 56 and subject to the conditions of Articles 57 and 58,
        measures regarding the cooperation arrangements referred to in point (a)
        of paragraph 2 in order to design a common framework to facilitate the
        establishment of those cooperation arrangements with third countries.


        12.


        In order to ensure uniform application of this Article, ESMA may develop
        guidelines to determine the conditions of application of the measures
        adopted by the Commission regarding the cooperation arrangements
        referred to in point (a) of paragraph 2.


        13.


        ESMA shall develop draft regulatory technical standards to determine the
        minimum content of the cooperation arrangements referred to in point (a)
        of paragraph 2 so as to ensure that both the competent authorities of
        the home and the host Member States receive sufficient information in
        order to be able to exercise their supervisory and investigatory powers
        under this Directive.


        Power is delegated to the Commission to adopt the regulatory technical
        standards referred to in the first subparagraph in accordance with
        Article 10 to 14 of Regulation (EU) No 1095/2010.


        14.
      - >-
        (23) This Regulation should also apply to Union institutions, bodies,
        offices and agencies when acting as a provider or deployer of an AI
        system.
      - >-
        An operator that is a natural person or a microenterprise may mandate
        the next operator or trader further down the supply chain that is not a
        natural person or a microenterprise to act as an authorised
        representative. Such next operator or trader further down the supply
        chain shall not place or make available relevant products on the market
        or export them without submitting the due diligence statement pursuant
        to Article 4(2) on behalf of that operator. In such cases, the operator
        that is a natural person or a microenterprise shall retain
        responsibility for compliance of the relevant product with Article 3,
        and shall communicate to that next operator or trader further down the
        supply chain all information necessary to confirm that due
  - source_sentence: >-
      A review is scheduled for June 2019 to determine if the regulations
      regarding hazardous substances should be broadened, based on practical
      experiences. Additionally, the Commission aims to promote alternatives to
      animal testing by reassessing testing requirements, potentially leading to
      amendments that prioritize health and environmental safety.
    sentences:
      - >-
        18 June 1994, until such plant and machinery is disposed of; (b) in the
        case of the maintenance of plant and machinery already in service within
        a Member State on 18 June 1994. For the purposes of point (a) Member
        States may, on grounds of human health protection and environmental
        protection, prohibit within their territory the use of such plant or
        machinery before it is disposed of. 25. Monomethyl-dichloro-diphenyl
        methane Trade name: Ugilec 121 Ugilec 21 Shall not be placed on the
        market, or used, as a substance or in mixtures. Articles containing the
        substance shall not be placed on the market. 26.
        Monomethyl-dibromo-diphenyl methane bromobenzylbromotoluene, mixture of
        isomers Trade name: DBBT CAS No 99688-47-8 Shall not be placed on
      - >-
        (35) |  The fight against litter is a shared effort between competent
        authorities, producers and consumers. Public authorities, including the
        Union institutions, should lead by example.
      - >-
        7.


        By 1 June 2013 the Commission shall carry out a review to assess whether
        or not, taking into account latest developments in scientific knowledge,
        to extend the scope of Article 60(3) to substances identified under
        Article 57(f) as having endocrine disrupting properties. On the basis of
        that review the Commission may, if appropriate, present legislative
        proposals.


        8.


        By 1 June 2019, the Commission shall carry out a review to assess
        whether or not to extend the scope of Article 33 to cover other
        dangerous substances, taking into account the practical experience in
        implementing that Article. On the basis of that review, the Commission
        may, if appropriate, present legislative proposals to extend that
        obligation.


        9.


        In accordance with the objective of promoting non-animal testing and the
        replacement, reduction or refinement of animal testing required under
        this Regulation, the Commission shall review the testing requirements of
        Section 8.7 of Annex VIII by 1 June 2019. On the basis of this review,
        while ensuring a high level of protection of health and the environment,
        the Commission may propose an amendment in accordance with the procedure
        referred to in Article 133(4).


        Article 139


        Repeals


        Directive 91/155/EEC shall be repealed.


        Directives 93/105/EC and 2000/21/EC and Regulations (EEC) No 793/93 and
        (EC) No 1488/94 shall be repealed with effect from 1 June 2008.


        Directive 93/67/EEC shall be repealed with effect from 1 August 2008.


        Directive 76/769/EEC shall be repealed with effect from 1 June 2009.


        References to the repealed acts shall be construed as references to this
        Regulation.


        Article 140


        Amendment of Directive 1999/45/EC


        Article 14 of Directive 1999/45/EC shall be deleted.


        Article 141


        Entry into force and application


        1.


        This Regulation shall enter into force on 1 June 2007.


        2.


        Titles II, III, V, VI, VII, XI and XII as well as Articles 128 and 136
        shall apply from 1 June 2008.


        3.


        Article 135 shall apply from 1 August 2008.


        4.


        Title VIII and Annex XVII shall apply from 1 June 2009.


        This Regulation shall be binding in its entirety and directly applicable
        in all Member States.


        LIST OF ANNEXES


        ANNEX I GENERAL PROVISIONS FOR ASSESSING SUBSTANCES AND PREPARING
        CHEMICAL SAFETY REPORTS ANNEX II REQUIREMENTS FOR THE COMPILATION OF
        SAFETY DATA SHEETS ANNEX III CRITERIA FOR SUBSTANCES REGISTERED IN
        QUANTITIES BETWEEN 1 AND 10 TONNES ANNEX IV EXEMPTIONS FROM THE
        OBLIGATION TO REGISTER IN ACCORDANCE WITH ARTICLE 2(7)(a) ANNEX V
        EXEMPTIONS FROM THE OBLIGATION TO REGISTER IN ACCORDANCE WITH ARTICLE
        2(7)(b) ANNEX VI INFORMATION REQUIREMENTS REFERRED TO IN ARTICLE 10
        ANNEX VII STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES MANUFACTURED
        OR IMPORTED IN QUANTITIES OF ONE TONNE OR MORE ANNEX VIII STANDARD
        INFORMATION REQUIREMENTS FOR SUBSTANCES MANUFACTURED OR IMPORTED IN
        QUANTITIES OF 10 TONNES OR MORE ANNEX IX STANDARD INFORMATION
        REQUIREMENTS FOR SUBSTANCES MANUFACTURED OR IMPORTED IN QUANTITIES OF
        100 TONNES OR MORE ANNEX X STANDARD INFORMATION REQUIREMENTS FOR
        SUBSTANCES MANUFACTURED OR IMPORTED IN QUANTITIES OF 1 000 TONNES OR
        MORE ANNEX XI GENERAL RULES FOR ADAPTATION OF THE STANDARD TESTING
        REGIME SET OUT IN ANNEXES VII TO X ANNEX XII GENERAL PROVISIONS FOR
        DOWNSTREAM USERS TO ASSESS SUBSTANCES AND PREPARE CHEMICAL SAFETY
        REPORTS ANNEX XIII CRITERIA FOR THE IDENTIFICATION OF PERSISTENT,
        BIOACCUMULATIVE AND TOXIC SUBSTANCES, AND VERY PERSISTENT AND VERY
        BIOACCUMULATIVE SUBSTANCES ANNEX XIV LIST OF SUBSTANCES SUBJECT TO
        AUTHORISATION ANNEX XV DOSSIERS ANNEX XVI SOCIO-ECONOMIC ANALYSIS ANNEX
        XVII RESTRICTIONS ON THE MANUFACTURE, PLACING ON THE MARKET AND USE OF
        CERTAIN DANGEROUS SUBSTANCES, MIXTURES AND ARTICLES


        ANNEX I


        GENERAL PROVISIONS FOR ASSESSING SUBSTANCES AND PREPARING CHEMICAL
        SAFETY REPORTS


        0. INTRODUCTION


        ▼M51
  - source_sentence: >-
      What actions must the Commission take if the economic operator does not
      provide commitments or if the provided commitments are deemed
      inappropriate or insufficient to address the distortion?
    sentences:
      - >-
        2.


        Where the economic operator concerned does not offer commitments or
        where the Commission considers that the commitments referred to in
        paragraph 1 are neither appropriate nor sufficient to fully and
        effectively remedy the distortion, the Commission shall adopt an
        implementing act in the form of a decision prohibiting the award of the
        contract to the economic operator concerned (‘decision prohibiting the
        award of the contract’). That implementing act shall be adopted in
        accordance with the advisory procedure referred to in Article 48(2).
        Following that decision, the contracting authority or contracting entity
        shall reject the tender.


        3.
      - >-
        6,5 8,9 (1)  The values for biogas production from manure include
        negative emissions for emissions saved from raw manure management. The
        value of esca considered is equal to – 45 g CO2eq/MJ manure used in
        anaerobic digestion. (2)  Maize whole plant means maize harvested as
        fodder and ensiled for preservation. (3) Transport of agricultural raw
        materials to the transformation plant is, according to the methodology
        provided in the Commission's report of 25 February 2010 on
        sustainability requirements for the use of solid and gaseous biomass
        sources in electricity, heating and cooling, included in the
        ‘cultivation’ value. The value for transport of maize silage accounts
        for 0,4 g CO2eq/MJ biogas.
      - >-
        reduction in the consumption of lightweight plastic carrier bags. It
        should be possible for Member States, while observing the general rules
        laid down in the TFEU and acting in accordance with this Regulation, to
        adopt provisions which go beyond the minimum waste prevention targets
        set out in this Regulation. When implementing such measures, Member
        States should be aware of the risk of a shift from heavier to lighter
        packaging materials and should prioritise measures that minimise that
        risk.
  - source_sentence: >-
      The content provides a comprehensive overview of numerous chemical
      substances, including their structural formulas and potential
      applications. It emphasizes the significance of specific compounds like
      acrylamide and thioacetamide, while also addressing mixtures derived from
      coal tar. The information reflects the intricate nature of chemical
      synthesis and the importance of understanding the properties and uses of
      these compounds in various industrial contexts.
    sentences:
      - >-
        2.


        Each Member State shall ensure that a producer as defined in Article
        3(1)(f)(iv) and established on its territory, which sells EEE to another
        Member State in which it is not established, appoints an authorised
        representative in that Member State as the person responsible for
        fulfilling the obligations of that producer, pursuant to this Directive,
        on the territory of that Member State.


        3.


        Appointment of an authorised representative shall be by written mandate.


        Article 18


        Administrative cooperation and exchange of information
      - >-
        (a) display to customers and potential customers, in a visible manner,
        the labels provided in accordance with Article 32(1), point (b) or (c);
        (b) make reference to the information included on the labels provided in
        accordance with Article 32(1), point (b) or (c), in visual
        advertisements or in technical promotional material for a specific
        model, in accordance with the applicable delegated acts adopted pursuant
        to Article 4; and --- --- (c) not provide or display other labels,
        marks, symbols or inscriptions that are likely to mislead or confuse
        customers and potential customers with regard to the information
        included on the label regarding ecodesign requirements. --- ---


        Article 32


        Obligations related to labels
      - >-
        [2] 612-196-00-0 202-441-6 [1] 221-627-8 [2] 95-69-2 [1] 3165-93-3 [2]
        ►M5 — ◄ 2,4,5-Trimethylaniline [1] 2,4,5-trimethylaniline hydrochloride
        [2] 612-197-00-6 205-282-0 [1] -[2] 137-17-7 [1] 21436-97-5 [2] ►M5 — ◄
        4,4'-Thiodianiline [1] and its salts 612-198-00-1 205-370-9 [1] 139-65-1
        [1] ►M5 — ◄ 4,4'-Oxydianiline [1] and its salts p-Aminophenyl ether [1]
        612-199-00-7 202-977-0 [1] 101-80-4 [1] ►M5 — ◄ 2,4-Diaminoanisole [1]
        4-methoxy-m-phenylenediamine 2,4-diaminoanisole sulphate [2]
        612-200-00-0 210-406-1 [1] 254-323-9 [2] 615-05-4 [1] 39156-41-7 [2] N,
        N,N',N'-tetramethyl-4,4'-methylendianiline 612-201-00-6 202-959-2
        101-61-1 C.I. Basic Violet 3 with ≥ 0,1 % of Michler's ketone (EC No
        202-027-5) 612-205-00-8 208-953-6 548-62-9 ►M5 — ◄ 6-Methoxy-m-toluidine
        p-cresidine 612-209-00-X 204-419-1 120-71-8 ►M5 — ◄
        [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
        "32012R0109: INSERTED") Biphenyl-3,3′,4,4′-tetrayltetraamine;
        Diaminobenzidine 612-239-00-3 202-110-6 91-95-2
        (2-chloroethyl)(3-hydroxypropyl)ammonium chloride 612-246-00-1 429-740-6
        40722-80-3 3-Amino-9-ethyl carbazole; 9-Ethylcarbazol-3-ylamine
        612-280-00-7 205-057-7 132-32-1
        [▼M49](./../../../legal-content/EN/AUTO/?uri=celex:32018R0675
        "32018R0675: INSERTED") Reaction products of paraformaldehyde and
        2-hydroxypropylamine (ratio 3:2); [formaldehyde released from
        3,3′-methylenebis[5-methyloxazolidine]; formaldehyde released from
        oxazolidin]; [MBO] 612-290-00-1 — — Reaction products of
        paraformaldehyde with 2-hydroxypropylamine (ratio 1:1); [formaldehyde
        released from
        α,α,α-trimethyl-1,3,5-triazine-1,3,5(2H,4H,6H)-triethanol]; [HPT]
        612-291-00-7 — — Methylhydrazine 612-292-00-2 200-471-4 60-34-4
        [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
        "32006R1907R(01): REPLACED") Ethyleneimine; aziridine 613-001-00-1
        205-793-9 151-56-4 2-Methylaziridine; propyleneimine 613-033-00-6
        200-878-7 75-55-8 ►M5 — ◄ Captafol (ISO);
        1,2,3,6-tetrahydro-N-(1,1,2,2-tetrachloroethylthio) phthalimide
        613-046-00-7 219-363-3 2425-06-1 Carbadox (INN); methyl
        3-(quinoxalin-2-ylmethylene)carbazate 1,4-dioxide;
        2-(methoxycarbonylhydrazonomethyl) quinoxaline 1,4-dioxide 613-050-00-9
        229-879-0 6804-07-5 A mixture of:
        1,3,5-tris(3-aminomethylphenyl)-1,3,5-(1H,3H,5H)-triazine-2,4,6-trione;
        a mixture of oligomers of
        3,5-bis(3-aminomethylphenyl)-1-poly[3,5-bis(3-aminomethylphenyl)-2,4,6-trioxo-1,3,5-(1H,3H,5H)-triazin-1-yl]-1,3,5-(1H,3H,5H)-triazine-2,4,6-trione
        613-199-00-X 421-550-1 —
        [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
        "32012R0109: INSERTED") Quinoline 613-281-00-5 202-051-6 91-22-5
        [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
        "32006R1907R(01): REPLACED") Acrylamide 616-003-00-0 201-173-7 79-06-1
        [▼M69](./../../../legal-content/EN/AUTO/?uri=celex:32021R2204
        "32021R2204: INSERTED") Butanone oxime; ethyl methyl ketoxime; ethyl
        methyl ketone oxime 616-014-00-0 202-496-6 96-29-7
        [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
        "32006R1907R(01): REPLACED") Thioacetamide 616-026-00-6 200-541-4
        62-55-5 A mixture of:
        N-[3-hydroxy-2-(2-methylacryloylamino-methoxy)propoxymethyl]-2-methylacrylamide;
        N-[2,3-Bis-(2-methylacryloylamino-methoxy)propoxymethyl]-2-methylacrylamide;
        methacrylamide;
        2-methyl-N-(2-methyl-acryloylaminomethoxymethyl)-acrylamide;
        N-2,3-dihydroxypropoxymethyl)-2-methylacrylamide 616-057-00-5 412-790-8
        — [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
        "32012R0109: INSERTED")
        N-[6,9-dihydro-9-[[2-hydroxy-1-(hydroxymethyl)ethoxy]methyl]-6-oxo-1H-purin-2-yl]acetamide
        616-148-00-X 424-550-1 84245-12-5
        [▼M69](./../../../legal-content/EN/AUTO/?uri=celex:32021R2204
        "32021R2204: INSERTED") N-(hydroxymethyl)acrylamide; methylolacrylamide;
        [NMA] 616-230-00-5 213-103-2 924-42-5
        [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
        "32006R1907R(01): REPLACED") Distillates (coal tar), benzole fraction;
        Light oil (A complex combination of hydrocarbons obtained by the
        distillation of coal tar. It consists of hydrocarbons having carbon
        numbers primarily in the range of C4 to C10 and distilling in the
        approximate range of 80 to 160 °C.) 648-001-00-0 283-482-7 84650-02-2
        Tar oils, brown-coal; Light oil (The distillate from lignite tar boiling
        in the range of approximately 80 to 250 °C. Composed primarily of
        aliphatic and aromatic hydrocarbons and monobasic phenols.) 648-002-00-6
        302-674-4 94114-40-6 J Benzol forerunnings (coal); Light oil
        redistillate, low boiling
  - source_sentence: >-
      How does the new Eurostat methodology differ in scope from the indicators
      used in this Directive for calculating energy consumption?
    sentences:
      - >-
        (29) The methodology for calculation of primary energy consumption and
        final energy consumption is aligned with the new Eurostat methodology,
        but the indicators used for the purpose of this Directive have a
        different scope, in that they exclude ambient energy and include energy
        consumption in international aviation for the targets in primary energy
        consumption and final energy consumption. The use of new indicators also
        implies that any changes in energy consumption of blast furnaces are now
        only reflected in primary energy consumption.
      - >-
        (92) InvestEU is the Union flagship programme to boost investment,
        especially the green and digital transition, by providing financing and
        technical assistance, for instance through blending mechanisms. Such an
        approach contributes to crowd in additional public and private capital.
        Moreover, Member States are encouraged to contribute to the InvestEU
        Member State compartment to support financial products available to
        net-zero technology manufacturing, without prejudice to applicable State
        aid rules.
      - >-
        be used, filled or transported through the system; --- --- (iii) specify
        the terms and conditions for proper handling and packaging use; --- ---
        (iv) specify detailed requirements for packaging reconditioning; --- ---
        (v) specify the requirements for packaging collection; --- --- (vi)
        specify the requirements for packaging storage; --- --- (vii) specify
        the requirements for packaging filling or uploading; --- --- (viii)
        specify rules to ensure the effective and efficient collection of
        reusable packaging, including by providing for incentives for end users
        to return the packaging to the collection points or grouped collection
        system; --- --- (ix) specify rules to ensure equal and fair access to
        the re-use system, including for vulnerable
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.7136198860693941
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9243915069911963
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9589159330226135
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.981874676333506
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7136198860693941
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.30813050233039874
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1917831866045227
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09818746763335057
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7136198860693941
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9243915069911963
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9589159330226135
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.981874676333506
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8626251072928146
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8227635844026309
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8236564067385257
            name: Cosine Map@100

SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m-v2.0. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-m-v2.0
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: GteModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'How does the new Eurostat methodology differ in scope from the indicators used in this Directive for calculating energy consumption?',
    '(29) The methodology for calculation of primary energy consumption and final energy consumption is aligned with the new Eurostat methodology, but the indicators used for the purpose of this Directive have a different scope, in that they exclude ambient energy and include energy consumption in international aviation for the targets in primary energy consumption and final energy consumption. The use of new indicators also implies that any changes in energy consumption of blast furnaces are now only reflected in primary energy consumption.',
    '(92) InvestEU is the Union flagship programme to boost investment, especially the green and digital transition, by providing financing and technical assistance, for instance through blending mechanisms. Such an approach contributes to crowd in additional public and private capital. Moreover, Member States are encouraged to contribute to the InvestEU Member State compartment to support financial products available to net-zero technology manufacturing, without prejudice to applicable State aid rules.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7136
cosine_accuracy@3 0.9244
cosine_accuracy@5 0.9589
cosine_accuracy@10 0.9819
cosine_precision@1 0.7136
cosine_precision@3 0.3081
cosine_precision@5 0.1918
cosine_precision@10 0.0982
cosine_recall@1 0.7136
cosine_recall@3 0.9244
cosine_recall@5 0.9589
cosine_recall@10 0.9819
cosine_ndcg@10 0.8626
cosine_mrr@10 0.8228
cosine_map@100 0.8237

Training Details

Training Dataset

Unnamed Dataset

  • Size: 46,338 training samples
  • Columns: query_text and doc_text
  • Approximate statistics based on the first 1000 samples:
    query_text doc_text
    type string string
    details
    • min: 9 tokens
    • mean: 39.44 tokens
    • max: 311 tokens
    • min: 7 tokens
    • mean: 233.15 tokens
    • max: 1900 tokens
  • Samples:
    query_text doc_text
    The regulation's applicability extends to various stakeholders involved in AI systems, including providers, deployers, importers, and manufacturers, regardless of their location. It specifically addresses high-risk AI systems and outlines the limitations of its scope, particularly concerning national security and military applications. Additionally, it clarifies that it does not interfere with the responsibilities of member states regarding national security or the operations of public authorities and international organizations in specific contexts. (180) The European Data Protection Supervisor and the European Data Protection Board were consulted in accordance with Article 42(1) and (2) of Regulation (EU) 2018/1725 and delivered their joint opinion on 18 June 2021,

    HAVE ADOPTED THIS REGULATION:

    CHAPTER I

    GENERAL PROVISIONS

    Article 1

    Subject matter`

    1. The purpose of this Regulation is to improve the functioning of the internal market and promote the uptake of human-centric and trustworthy artificial intelligence (AI), while ensuring a high level of protection of health, safety, fundamental rights enshrined in the Charter, including democracy, the rule of law and environmental protection, against the harmful effects of AI systems in the Union and supporting innovation.

    2. This Regulation lays down:

    (a) harmonised rules for the placing on the market, the putting into service, and the use of AI systems in the Union; (b) prohibitions of certain AI practices; --- --- (c) specific requirements for high-risk AI systems and oblig...
    How should loans with unknown use of proceeds be allocated in terms of sectors and alignment metrics? instruments. For loans whose use of proceeds is known, the value shall be included for the relevant sector and alignment metric. For loans whose use of proceeds is unknown, the gross carrying amount of the exposure shall be allocated to the relevant sectors and alignment metrics based on the counterparties’ activity distribution, including by counterparties’ turnover by activity. Institutions shall add a row in the template for each relevant combination of sectors disclosed in column (b) and alignment metrics included in column (d). ---
    What measures must AIFMs implement to ensure they do not rely solely on credit ratings for assessing the creditworthiness of AIFs' assets? ▼M1

    The measures specifying the risk-management systems referred to in point (a) of the first subparagraph shall ensure that the AIFMs are prevented from relying solely or mechanistically on credit ratings, as referred to in the first subparagraph of paragraph 2, for assessing the creditworthiness of the AIFs’ assets.

    ▼B

    Article 16

    Liquidity management

    1.

    AIFMs shall, for each AIF that they manage which is not an unleveraged closed- ended AIF, employ an appropriate liquidity management system and adopt procedures which enable them to monitor the liquidity risk of the AIF and to ensure that the liquidity profile of the investments of the AIF complies with its underlying obligations.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss cosine_ndcg@10
-1 -1 - 0.7763
0.0863 500 0.2343 -
0.1726 1000 0.1259 0.814
0.2589 1500 0.1027 -
0.3452 2000 0.0757 0.8288
0.4316 2500 0.0617 -
0.5179 3000 0.0651 0.8288
0.6042 3500 0.0863 -
0.6905 4000 0.06 0.8376
0.7768 4500 0.0579 -
0.8631 5000 0.0593 0.8342
0.9494 5500 0.0485 -
1.0357 6000 0.0465 0.8384
1.1220 6500 0.0276 -
1.2084 7000 0.0353 0.8392
1.2947 7500 0.0335 -
1.3810 8000 0.0292 0.8436
1.4673 8500 0.0276 -
1.5536 9000 0.0404 0.8485
1.6399 9500 0.0476 -
1.7262 10000 0.0265 0.8601
1.8125 10500 0.017 -
1.8988 11000 0.0217 0.8549
1.9852 11500 0.0329 -
2.0715 12000 0.0207 0.8577
2.1578 12500 0.0199 -
2.2441 13000 0.015 0.8544
2.3304 13500 0.0143 -
2.4167 14000 0.0117 0.8574
2.5030 14500 0.0204 -
2.5893 15000 0.0141 0.8595
2.6756 15500 0.0123 -
2.7620 16000 0.0211 0.8538
2.8483 16500 0.0207 -
2.9346 17000 0.0134 0.8562
3.0209 17500 0.0276 -
3.1072 18000 0.0106 0.8552
3.1935 18500 0.0129 -
3.2798 19000 0.0157 0.8582
3.3661 19500 0.0164 -
3.4524 20000 0.0192 0.8614
3.5388 20500 0.0138 -
3.6251 21000 0.0141 0.8601
3.7114 21500 0.0109 -
3.7977 22000 0.0178 0.8605
3.8840 22500 0.0088 -
3.9703 23000 0.0255 0.8626
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.15
  • Sentence Transformers: 4.0.2
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cu126
  • Accelerate: 0.26.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}