Add new CrossEncoder model
Browse files- README.md +79 -79
- config.json +34 -31
- onnx/model.onnx +3 -0
- special_tokens_map.json +35 -5
    	
        README.md
    CHANGED
    
    | @@ -1,80 +1,80 @@ | |
| 1 | 
            -
            ---
         | 
| 2 | 
            -
            license: apache-2.0
         | 
| 3 | 
            -
            datasets:
         | 
| 4 | 
            -
            - sentence-transformers/msmarco
         | 
| 5 | 
            -
            language:
         | 
| 6 | 
            -
            - en
         | 
| 7 | 
            -
            base_model:
         | 
| 8 | 
            -
            - cross-encoder/ms-marco-MiniLM-L12-v2
         | 
| 9 | 
            -
            pipeline_tag: text-ranking
         | 
| 10 | 
            -
            library_name: sentence-transformers
         | 
| 11 | 
            -
            tags:
         | 
| 12 | 
            -
            - transformers
         | 
| 13 | 
            -
            ---
         | 
| 14 | 
            -
            # Cross-Encoder for MS Marco
         | 
| 15 | 
            -
             | 
| 16 | 
            -
            This model was trained on the [MS Marco Passage Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) task.
         | 
| 17 | 
            -
             | 
| 18 | 
            -
            The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See [SBERT.net Retrieve & Re-rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) for more details. The training code is available here: [SBERT.net Training MS Marco](https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/ms_marco)
         | 
| 19 | 
            -
             | 
| 20 | 
            -
             | 
| 21 | 
            -
            ## Usage with SentenceTransformers
         | 
| 22 | 
            -
             | 
| 23 | 
            -
            The usage is easy when you have [SentenceTransformers](https://www.sbert.net/) installed. Then you can use the pre-trained models like this:
         | 
| 24 | 
            -
            ```python
         | 
| 25 | 
            -
            from sentence_transformers import CrossEncoder
         | 
| 26 | 
            -
             | 
| 27 | 
            -
            model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L6-v2')
         | 
| 28 | 
            -
            scores = model.predict([
         | 
| 29 | 
            -
                ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
         | 
| 30 | 
            -
                ("How many people live in Berlin?", "Berlin is well known for its museums."),
         | 
| 31 | 
            -
            ])
         | 
| 32 | 
            -
            print(scores)
         | 
| 33 | 
            -
            # [ 8.607138 -4.320078]
         | 
| 34 | 
            -
            ```
         | 
| 35 | 
            -
             | 
| 36 | 
            -
             | 
| 37 | 
            -
            ## Usage with Transformers
         | 
| 38 | 
            -
             | 
| 39 | 
            -
            ```python
         | 
| 40 | 
            -
            from transformers import AutoTokenizer, AutoModelForSequenceClassification
         | 
| 41 | 
            -
            import torch
         | 
| 42 | 
            -
             | 
| 43 | 
            -
            model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L6-v2')
         | 
| 44 | 
            -
            tokenizer = AutoTokenizer.from_pretrained('cross-encoder/ms-marco-MiniLM-L6-v2')
         | 
| 45 | 
            -
             | 
| 46 | 
            -
            features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
         | 
| 47 | 
            -
             | 
| 48 | 
            -
            model.eval()
         | 
| 49 | 
            -
            with torch.no_grad():
         | 
| 50 | 
            -
                scores = model(**features).logits
         | 
| 51 | 
            -
                print(scores)
         | 
| 52 | 
            -
            ```
         | 
| 53 | 
            -
             | 
| 54 | 
            -
             | 
| 55 | 
            -
            ## Performance
         | 
| 56 | 
            -
            In the following table, we provide various pre-trained Cross-Encoders together with their performance on the [TREC Deep Learning 2019](https://microsoft.github.io/TREC-2019-Deep-Learning/) and the [MS Marco Passage Reranking](https://github.com/microsoft/MSMARCO-Passage-Ranking/) dataset. 
         | 
| 57 | 
            -
             | 
| 58 | 
            -
             | 
| 59 | 
            -
            | Model-Name        | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev)  | Docs / Sec |
         | 
| 60 | 
            -
            | ------------- |:-------------| -----| --- | 
         | 
| 61 | 
            -
            | **Version 2 models** | | | 
         | 
| 62 | 
            -
            | cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000
         | 
| 63 | 
            -
            | cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100
         | 
| 64 | 
            -
            | cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500
         | 
| 65 | 
            -
            | cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800
         | 
| 66 | 
            -
            | cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960
         | 
| 67 | 
            -
            | **Version 1 models** | | | 
         | 
| 68 | 
            -
            | cross-encoder/ms-marco-TinyBERT-L2  | 67.43 | 30.15  | 9000
         | 
| 69 | 
            -
            | cross-encoder/ms-marco-TinyBERT-L4  | 68.09 | 34.50  | 2900
         | 
| 70 | 
            -
            | cross-encoder/ms-marco-TinyBERT-L6 |  69.57 | 36.13  | 680
         | 
| 71 | 
            -
            | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340
         | 
| 72 | 
            -
            | **Other models** | | | 
         | 
| 73 | 
            -
            | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 
         | 
| 74 | 
            -
            | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 
         | 
| 75 | 
            -
            | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 
         | 
| 76 | 
            -
            | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 
         | 
| 77 | 
            -
            | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 
         | 
| 78 | 
            -
            | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720
         | 
| 79 | 
            -
             
         | 
| 80 | 
             
             Note: Runtime was computed on a V100 GPU.
         | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            license: apache-2.0
         | 
| 3 | 
            +
            datasets:
         | 
| 4 | 
            +
            - sentence-transformers/msmarco
         | 
| 5 | 
            +
            language:
         | 
| 6 | 
            +
            - en
         | 
| 7 | 
            +
            base_model:
         | 
| 8 | 
            +
            - cross-encoder/ms-marco-MiniLM-L12-v2
         | 
| 9 | 
            +
            pipeline_tag: text-ranking
         | 
| 10 | 
            +
            library_name: sentence-transformers
         | 
| 11 | 
            +
            tags:
         | 
| 12 | 
            +
            - transformers
         | 
| 13 | 
            +
            ---
         | 
| 14 | 
            +
            # Cross-Encoder for MS Marco
         | 
| 15 | 
            +
             | 
| 16 | 
            +
            This model was trained on the [MS Marco Passage Ranking](https://github.com/microsoft/MSMARCO-Passage-Ranking) task.
         | 
| 17 | 
            +
             | 
| 18 | 
            +
            The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order. See [SBERT.net Retrieve & Re-rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) for more details. The training code is available here: [SBERT.net Training MS Marco](https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/ms_marco)
         | 
| 19 | 
            +
             | 
| 20 | 
            +
             | 
| 21 | 
            +
            ## Usage with SentenceTransformers
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            The usage is easy when you have [SentenceTransformers](https://www.sbert.net/) installed. Then you can use the pre-trained models like this:
         | 
| 24 | 
            +
            ```python
         | 
| 25 | 
            +
            from sentence_transformers import CrossEncoder
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L6-v2')
         | 
| 28 | 
            +
            scores = model.predict([
         | 
| 29 | 
            +
                ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
         | 
| 30 | 
            +
                ("How many people live in Berlin?", "Berlin is well known for its museums."),
         | 
| 31 | 
            +
            ])
         | 
| 32 | 
            +
            print(scores)
         | 
| 33 | 
            +
            # [ 8.607138 -4.320078]
         | 
| 34 | 
            +
            ```
         | 
| 35 | 
            +
             | 
| 36 | 
            +
             | 
| 37 | 
            +
            ## Usage with Transformers
         | 
| 38 | 
            +
             | 
| 39 | 
            +
            ```python
         | 
| 40 | 
            +
            from transformers import AutoTokenizer, AutoModelForSequenceClassification
         | 
| 41 | 
            +
            import torch
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L6-v2')
         | 
| 44 | 
            +
            tokenizer = AutoTokenizer.from_pretrained('cross-encoder/ms-marco-MiniLM-L6-v2')
         | 
| 45 | 
            +
             | 
| 46 | 
            +
            features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
         | 
| 47 | 
            +
             | 
| 48 | 
            +
            model.eval()
         | 
| 49 | 
            +
            with torch.no_grad():
         | 
| 50 | 
            +
                scores = model(**features).logits
         | 
| 51 | 
            +
                print(scores)
         | 
| 52 | 
            +
            ```
         | 
| 53 | 
            +
             | 
| 54 | 
            +
             | 
| 55 | 
            +
            ## Performance
         | 
| 56 | 
            +
            In the following table, we provide various pre-trained Cross-Encoders together with their performance on the [TREC Deep Learning 2019](https://microsoft.github.io/TREC-2019-Deep-Learning/) and the [MS Marco Passage Reranking](https://github.com/microsoft/MSMARCO-Passage-Ranking/) dataset. 
         | 
| 57 | 
            +
             | 
| 58 | 
            +
             | 
| 59 | 
            +
            | Model-Name        | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev)  | Docs / Sec |
         | 
| 60 | 
            +
            | ------------- |:-------------| -----| --- | 
         | 
| 61 | 
            +
            | **Version 2 models** | | | 
         | 
| 62 | 
            +
            | cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000
         | 
| 63 | 
            +
            | cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100
         | 
| 64 | 
            +
            | cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500
         | 
| 65 | 
            +
            | cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800
         | 
| 66 | 
            +
            | cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960
         | 
| 67 | 
            +
            | **Version 1 models** | | | 
         | 
| 68 | 
            +
            | cross-encoder/ms-marco-TinyBERT-L2  | 67.43 | 30.15  | 9000
         | 
| 69 | 
            +
            | cross-encoder/ms-marco-TinyBERT-L4  | 68.09 | 34.50  | 2900
         | 
| 70 | 
            +
            | cross-encoder/ms-marco-TinyBERT-L6 |  69.57 | 36.13  | 680
         | 
| 71 | 
            +
            | cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340
         | 
| 72 | 
            +
            | **Other models** | | | 
         | 
| 73 | 
            +
            | nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 
         | 
| 74 | 
            +
            | nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 
         | 
| 75 | 
            +
            | nboost/pt-bert-large-msmarco | 73.36 | 36.48 | 100 
         | 
| 76 | 
            +
            | Capreolus/electra-base-msmarco | 71.23 | 36.89 | 340 
         | 
| 77 | 
            +
            | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 
         | 
| 78 | 
            +
            | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720
         | 
| 79 | 
            +
             
         | 
| 80 | 
             
             Note: Runtime was computed on a V100 GPU.
         | 
    	
        config.json
    CHANGED
    
    | @@ -1,31 +1,34 @@ | |
| 1 | 
            -
            {
         | 
| 2 | 
            -
              " | 
| 3 | 
            -
             | 
| 4 | 
            -
             | 
| 5 | 
            -
               | 
| 6 | 
            -
              " | 
| 7 | 
            -
              "gradient_checkpointing": false,
         | 
| 8 | 
            -
              "hidden_act": "gelu",
         | 
| 9 | 
            -
              "hidden_dropout_prob": 0.1,
         | 
| 10 | 
            -
              "hidden_size": 384,
         | 
| 11 | 
            -
              "id2label": {
         | 
| 12 | 
            -
                "0": "LABEL_0"
         | 
| 13 | 
            -
              },
         | 
| 14 | 
            -
              "initializer_range": 0.02,
         | 
| 15 | 
            -
              "intermediate_size": 1536,
         | 
| 16 | 
            -
              "label2id": {
         | 
| 17 | 
            -
                "LABEL_0": 0
         | 
| 18 | 
            -
              },
         | 
| 19 | 
            -
              "layer_norm_eps": 1e-12,
         | 
| 20 | 
            -
              "max_position_embeddings": 512,
         | 
| 21 | 
            -
              "model_type": "bert",
         | 
| 22 | 
            -
              "num_attention_heads": 12,
         | 
| 23 | 
            -
              "num_hidden_layers": 6,
         | 
| 24 | 
            -
              "pad_token_id": 0,
         | 
| 25 | 
            -
              "position_embedding_type": "absolute",
         | 
| 26 | 
            -
              " | 
| 27 | 
            -
             | 
| 28 | 
            -
             | 
| 29 | 
            -
               | 
| 30 | 
            -
              " | 
| 31 | 
            -
             | 
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "architectures": [
         | 
| 3 | 
            +
                "BertForSequenceClassification"
         | 
| 4 | 
            +
              ],
         | 
| 5 | 
            +
              "attention_probs_dropout_prob": 0.1,
         | 
| 6 | 
            +
              "classifier_dropout": null,
         | 
| 7 | 
            +
              "gradient_checkpointing": false,
         | 
| 8 | 
            +
              "hidden_act": "gelu",
         | 
| 9 | 
            +
              "hidden_dropout_prob": 0.1,
         | 
| 10 | 
            +
              "hidden_size": 384,
         | 
| 11 | 
            +
              "id2label": {
         | 
| 12 | 
            +
                "0": "LABEL_0"
         | 
| 13 | 
            +
              },
         | 
| 14 | 
            +
              "initializer_range": 0.02,
         | 
| 15 | 
            +
              "intermediate_size": 1536,
         | 
| 16 | 
            +
              "label2id": {
         | 
| 17 | 
            +
                "LABEL_0": 0
         | 
| 18 | 
            +
              },
         | 
| 19 | 
            +
              "layer_norm_eps": 1e-12,
         | 
| 20 | 
            +
              "max_position_embeddings": 512,
         | 
| 21 | 
            +
              "model_type": "bert",
         | 
| 22 | 
            +
              "num_attention_heads": 12,
         | 
| 23 | 
            +
              "num_hidden_layers": 6,
         | 
| 24 | 
            +
              "pad_token_id": 0,
         | 
| 25 | 
            +
              "position_embedding_type": "absolute",
         | 
| 26 | 
            +
              "sentence_transformers": {
         | 
| 27 | 
            +
                "activation_fn": "torch.nn.modules.linear.Identity",
         | 
| 28 | 
            +
                "version": "4.1.0.dev0"
         | 
| 29 | 
            +
              },
         | 
| 30 | 
            +
              "transformers_version": "4.52.0.dev0",
         | 
| 31 | 
            +
              "type_vocab_size": 2,
         | 
| 32 | 
            +
              "use_cache": true,
         | 
| 33 | 
            +
              "vocab_size": 30522
         | 
| 34 | 
            +
            }
         | 
    	
        onnx/model.onnx
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:5d3e70fd0c9ff14b9b5169a51e957b7a9c74897afd0a35ce4bd318150c1d4d4a
         | 
| 3 | 
            +
            size 91011230
         | 
    	
        special_tokens_map.json
    CHANGED
    
    | @@ -1,7 +1,37 @@ | |
| 1 | 
             
            {
         | 
| 2 | 
            -
              "cls_token":  | 
| 3 | 
            -
             | 
| 4 | 
            -
             | 
| 5 | 
            -
             | 
| 6 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 7 | 
             
            }
         | 
|  | |
| 1 | 
             
            {
         | 
| 2 | 
            +
              "cls_token": {
         | 
| 3 | 
            +
                "content": "[CLS]",
         | 
| 4 | 
            +
                "lstrip": false,
         | 
| 5 | 
            +
                "normalized": false,
         | 
| 6 | 
            +
                "rstrip": false,
         | 
| 7 | 
            +
                "single_word": false
         | 
| 8 | 
            +
              },
         | 
| 9 | 
            +
              "mask_token": {
         | 
| 10 | 
            +
                "content": "[MASK]",
         | 
| 11 | 
            +
                "lstrip": false,
         | 
| 12 | 
            +
                "normalized": false,
         | 
| 13 | 
            +
                "rstrip": false,
         | 
| 14 | 
            +
                "single_word": false
         | 
| 15 | 
            +
              },
         | 
| 16 | 
            +
              "pad_token": {
         | 
| 17 | 
            +
                "content": "[PAD]",
         | 
| 18 | 
            +
                "lstrip": false,
         | 
| 19 | 
            +
                "normalized": false,
         | 
| 20 | 
            +
                "rstrip": false,
         | 
| 21 | 
            +
                "single_word": false
         | 
| 22 | 
            +
              },
         | 
| 23 | 
            +
              "sep_token": {
         | 
| 24 | 
            +
                "content": "[SEP]",
         | 
| 25 | 
            +
                "lstrip": false,
         | 
| 26 | 
            +
                "normalized": false,
         | 
| 27 | 
            +
                "rstrip": false,
         | 
| 28 | 
            +
                "single_word": false
         | 
| 29 | 
            +
              },
         | 
| 30 | 
            +
              "unk_token": {
         | 
| 31 | 
            +
                "content": "[UNK]",
         | 
| 32 | 
            +
                "lstrip": false,
         | 
| 33 | 
            +
                "normalized": false,
         | 
| 34 | 
            +
                "rstrip": false,
         | 
| 35 | 
            +
                "single_word": false
         | 
| 36 | 
            +
              }
         | 
| 37 | 
             
            }
         | 

