Created multiberts-seed_3-step_900k
Browse files- README.md +88 -0
- config.json +25 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +1 -0
- tf_model.h5 +3 -0
- tokenizer_config.json +1 -0
- vocab.txt +0 -0
    	
        README.md
    ADDED
    
    | @@ -0,0 +1,88 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            language: en
         | 
| 3 | 
            +
            tags:
         | 
| 4 | 
            +
            -   multiberts
         | 
| 5 | 
            +
            -   multiberts-seed_3
         | 
| 6 | 
            +
            -   multiberts-seed_3-step_900k
         | 
| 7 | 
            +
            license: apache-2.0
         | 
| 8 | 
            +
            ---
         | 
| 9 | 
            +
             | 
| 10 | 
            +
            # MultiBERTs, Intermediate Checkpoint - Seed 3, Step 900k
         | 
| 11 | 
            +
             | 
| 12 | 
            +
            MultiBERTs is a collection of checkpoints and a statistical library to support
         | 
| 13 | 
            +
            robust research on BERT. We provide 25 BERT-base models trained with
         | 
| 14 | 
            +
            similar hyper-parameters as
         | 
| 15 | 
            +
            [the original BERT model](https://github.com/google-research/bert) but
         | 
| 16 | 
            +
            with different random seeds, which causes variations in the initial weights and order of
         | 
| 17 | 
            +
            training instances. The aim is to distinguish findings that apply to a specific
         | 
| 18 | 
            +
            artifact (i.e., a particular instance of the model) from those that apply to the
         | 
| 19 | 
            +
            more general procedure.
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            We also provide 140 intermediate checkpoints captured
         | 
| 22 | 
            +
            during the course of pre-training (we saved 28 checkpoints for the first 5 runs).
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            The models were originally released through
         | 
| 25 | 
            +
            [http://goo.gle/multiberts](http://goo.gle/multiberts). We describe them in our
         | 
| 26 | 
            +
            paper
         | 
| 27 | 
            +
            [The MultiBERTs: BERT Reproductions for Robustness Analysis](https://arxiv.org/abs/2106.16163).
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            This is model #3, captured at step 900k (max: 2000k, i.e., 2M steps).
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            ## Model Description
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            This model was captured during a reproduction of
         | 
| 34 | 
            +
            [BERT-base uncased](https://github.com/google-research/bert), for English: it
         | 
| 35 | 
            +
            is a Transformers model pretrained on a large corpus of English data, using the
         | 
| 36 | 
            +
            Masked Language Modelling (MLM) and the Next Sentence Prediction (NSP)
         | 
| 37 | 
            +
            objectives.
         | 
| 38 | 
            +
             | 
| 39 | 
            +
            The intended uses, limitations, training data and training procedure for the fully trained model are similar
         | 
| 40 | 
            +
            to [BERT-base uncased](https://github.com/google-research/bert). Two major
         | 
| 41 | 
            +
            differences with the original model:
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            *   We pre-trained the MultiBERTs models for 2 million steps using sequence
         | 
| 44 | 
            +
                length 512 (instead of 1 million steps using sequence length 128 then 512).
         | 
| 45 | 
            +
            *   We used an alternative version of Wikipedia and Books Corpus, initially
         | 
| 46 | 
            +
                collected for [Turc et al., 2019](https://arxiv.org/abs/1908.08962).
         | 
| 47 | 
            +
             | 
| 48 | 
            +
            This is a best-effort reproduction, and so it is probable that differences with
         | 
| 49 | 
            +
            the original model have gone unnoticed. The performance of MultiBERTs on GLUE after full training is oftentimes comparable to that of original
         | 
| 50 | 
            +
            BERT, but we found significant differences on the dev set of SQuAD (MultiBERTs outperforms original BERT).
         | 
| 51 | 
            +
            See our [technical report](https://arxiv.org/abs/2106.16163) for more details.
         | 
| 52 | 
            +
             | 
| 53 | 
            +
            ### How to use
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            Using code from
         | 
| 56 | 
            +
            [BERT-base uncased](https://huggingface.co/bert-base-uncased), here is an example based on
         | 
| 57 | 
            +
            Tensorflow:
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            ```
         | 
| 60 | 
            +
            from transformers import BertTokenizer, TFBertModel
         | 
| 61 | 
            +
            tokenizer = BertTokenizer.from_pretrained('google/multiberts-seed_3-step_900k')
         | 
| 62 | 
            +
            model = TFBertModel.from_pretrained("google/multiberts-seed_3-step_900k")
         | 
| 63 | 
            +
            text = "Replace me by any text you'd like."
         | 
| 64 | 
            +
            encoded_input = tokenizer(text, return_tensors='tf')
         | 
| 65 | 
            +
            output = model(encoded_input)
         | 
| 66 | 
            +
            ```
         | 
| 67 | 
            +
             | 
| 68 | 
            +
            PyTorch version:
         | 
| 69 | 
            +
             | 
| 70 | 
            +
            ```
         | 
| 71 | 
            +
            from transformers import BertTokenizer, BertModel
         | 
| 72 | 
            +
            tokenizer = BertTokenizer.from_pretrained('google/multiberts-seed_3-step_900k')
         | 
| 73 | 
            +
            model = BertModel.from_pretrained("google/multiberts-seed_3-step_900k")
         | 
| 74 | 
            +
            text = "Replace me by any text you'd like."
         | 
| 75 | 
            +
            encoded_input = tokenizer(text, return_tensors='pt')
         | 
| 76 | 
            +
            output = model(**encoded_input)
         | 
| 77 | 
            +
            ```
         | 
| 78 | 
            +
             | 
| 79 | 
            +
            ## Citation info
         | 
| 80 | 
            +
             | 
| 81 | 
            +
            ```bibtex
         | 
| 82 | 
            +
            @article{sellam2021multiberts,
         | 
| 83 | 
            +
              title={The MultiBERTs: BERT Reproductions for Robustness Analysis},
         | 
| 84 | 
            +
              author={Thibault Sellam and Steve Yadlowsky and Jason Wei and Naomi Saphra and Alexander D'Amour and Tal Linzen and Jasmijn Bastings and Iulia Turc and Jacob Eisenstein and Dipanjan Das and Ian Tenney and Ellie Pavlick},
         | 
| 85 | 
            +
              journal={arXiv preprint arXiv:2106.16163},
         | 
| 86 | 
            +
              year={2021}
         | 
| 87 | 
            +
            }
         | 
| 88 | 
            +
            ```
         | 
    	
        config.json
    ADDED
    
    | @@ -0,0 +1,25 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "_name_or_path": "tmp-model",
         | 
| 3 | 
            +
              "architectures": [
         | 
| 4 | 
            +
                "BertForPreTraining"
         | 
| 5 | 
            +
              ],
         | 
| 6 | 
            +
              "attention_probs_dropout_prob": 0.1,
         | 
| 7 | 
            +
              "classifier_dropout": null,
         | 
| 8 | 
            +
              "hidden_act": "gelu",
         | 
| 9 | 
            +
              "hidden_dropout_prob": 0.1,
         | 
| 10 | 
            +
              "hidden_size": 768,
         | 
| 11 | 
            +
              "initializer_range": 0.02,
         | 
| 12 | 
            +
              "intermediate_size": 3072,
         | 
| 13 | 
            +
              "layer_norm_eps": 1e-12,
         | 
| 14 | 
            +
              "max_position_embeddings": 512,
         | 
| 15 | 
            +
              "model_type": "bert",
         | 
| 16 | 
            +
              "num_attention_heads": 12,
         | 
| 17 | 
            +
              "num_hidden_layers": 12,
         | 
| 18 | 
            +
              "pad_token_id": 0,
         | 
| 19 | 
            +
              "position_embedding_type": "absolute",
         | 
| 20 | 
            +
              "torch_dtype": "float32",
         | 
| 21 | 
            +
              "transformers_version": "4.12.3",
         | 
| 22 | 
            +
              "type_vocab_size": 2,
         | 
| 23 | 
            +
              "use_cache": true,
         | 
| 24 | 
            +
              "vocab_size": 30522
         | 
| 25 | 
            +
            }
         | 
    	
        pytorch_model.bin
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:a717f3adf469d9adadbac94b96995620d5ba4dd9b8018175af0b740d06439b8e
         | 
| 3 | 
            +
            size 440509027
         | 
    	
        special_tokens_map.json
    ADDED
    
    | @@ -0,0 +1 @@ | |
|  | 
|  | |
| 1 | 
            +
            {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
         | 
    	
        tf_model.h5
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:3349902f4764dd8f80cf882c034e4eb5346a77cf06c5147fe421be86bc2cc1c4
         | 
| 3 | 
            +
            size 536063536
         | 
    	
        tokenizer_config.json
    ADDED
    
    | @@ -0,0 +1 @@ | |
|  | 
|  | |
| 1 | 
            +
            {"do_lower_case": true, "do_basic_tokenize": true, "never_split": null, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "tokenizer_class": "BertTokenizer"}
         | 
    	
        vocab.txt
    ADDED
    
    | The diff for this file is too large to render. 
		See raw diff | 
|  | 

