ZoeYou commited on
Commit
0aec2ed
Β·
verified Β·
1 Parent(s): e2a82bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -8
README.md CHANGED
@@ -7,11 +7,11 @@ base_model:
7
  ---
8
  # PatentBERT - PyTorch
9
 
10
- BERT model specialized for patent classification using the **real CPC (Cooperative Patent Classification) system**. (PyTorch version of the original [PatentBert](https://github.com/jiehsheng/PatentBERT/) model.)
11
 
12
  ## πŸ“Š Specifications
13
 
14
- - **Output classes**: 656 (real CPC labels)
15
  - **Classification system**: CPC (Cooperative Patent Classification)
16
  - **Architecture**: BERT-base (768 hidden, 12 layers, 12 attention heads)
17
  - **Vocabulary**: 30,522 tokens
@@ -32,7 +32,7 @@ The model predicts classes according to the authentic CPC system used in PatentB
32
  - **H (51 classes)**: Electricity - Electronics, Power generation, Communication
33
  - **Y (9 classes)**: General Tagging of New Technological Developments
34
 
35
- ### Example Real Classes
36
 
37
  - `A01B`: SOIL WORKING IN AGRICULTURE OR FORESTRY
38
  - `B25J`: MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
@@ -63,7 +63,7 @@ with torch.no_grad():
63
  predicted_class_id = predictions.argmax().item()
64
  confidence = predictions.max().item()
65
 
66
- # Use model labels (real CPC codes)
67
  predicted_label = model.config.id2label[str(predicted_class_id)]
68
 
69
  print(f"Predicted CPC class: {predicted_label} (ID: {predicted_class_id})")
@@ -73,21 +73,20 @@ print(f"Confidence: {confidence:.2%}")
73
  ## πŸ“ Included Files
74
 
75
  - `model.safetensors`: Model weights (420 MB)
76
- - `config.json`: Configuration with integrated real CPC labels
77
  - `vocab.txt`: Tokenizer vocabulary
78
  - `tokenizer_config.json`: Tokenizer configuration
79
- - `labels.json`: Complete real CPC label mapping (656 authentic labels)
80
  - `README.md`: This documentation
81
 
82
  ## πŸ”¬ Performance
83
 
84
- This model was trained on a large patent corpus to automatically classify documents according to the real CPC system, using the exact same 656 CPC codes from the original PatentBERT training data.
85
 
86
  ## πŸ“– References
87
 
88
  - [Cooperative Patent Classification (CPC)](https://www.cooperativepatentclassification.org/)
89
  - [Original PatentBERT Paper](https://arxiv.org/abs/2103.02557)
90
- - [Hugging Face Transformers](https://huggingface.co/transformers/)
91
 
92
  ## πŸ“ Citation
93
 
 
7
  ---
8
  # PatentBERT - PyTorch
9
 
10
+ BERT model specialized for patent classification using the **CPC (Cooperative Patent Classification) system**. (PyTorch version of the original [PatentBert](https://github.com/jiehsheng/PatentBERT/) model.)
11
 
12
  ## πŸ“Š Specifications
13
 
14
+ - **Output classes**: 656 (CPC subclass labels)
15
  - **Classification system**: CPC (Cooperative Patent Classification)
16
  - **Architecture**: BERT-base (768 hidden, 12 layers, 12 attention heads)
17
  - **Vocabulary**: 30,522 tokens
 
32
  - **H (51 classes)**: Electricity - Electronics, Power generation, Communication
33
  - **Y (9 classes)**: General Tagging of New Technological Developments
34
 
35
+ ### Example of CPC Subclasses
36
 
37
  - `A01B`: SOIL WORKING IN AGRICULTURE OR FORESTRY
38
  - `B25J`: MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
 
63
  predicted_class_id = predictions.argmax().item()
64
  confidence = predictions.max().item()
65
 
66
+ # Use model labels (CPC codes)
67
  predicted_label = model.config.id2label[str(predicted_class_id)]
68
 
69
  print(f"Predicted CPC class: {predicted_label} (ID: {predicted_class_id})")
 
73
  ## πŸ“ Included Files
74
 
75
  - `model.safetensors`: Model weights (420 MB)
76
+ - `config.json`: Configuration with integrated CPC labels
77
  - `vocab.txt`: Tokenizer vocabulary
78
  - `tokenizer_config.json`: Tokenizer configuration
79
+ - `labels.json`: Complete CPC label mapping (656 authentic labels)
80
  - `README.md`: This documentation
81
 
82
  ## πŸ”¬ Performance
83
 
84
+ This model was trained on a large patent corpus to automatically classify documents according to the CPC system, using the exact same 656 CPC codes from the original PatentBERT training data.
85
 
86
  ## πŸ“– References
87
 
88
  - [Cooperative Patent Classification (CPC)](https://www.cooperativepatentclassification.org/)
89
  - [Original PatentBERT Paper](https://arxiv.org/abs/2103.02557)
 
90
 
91
  ## πŸ“ Citation
92