Update README.md
Browse files
README.md
CHANGED
@@ -7,11 +7,11 @@ base_model:
|
|
7 |
---
|
8 |
# PatentBERT - PyTorch
|
9 |
|
10 |
-
BERT model specialized for patent classification using the **
|
11 |
|
12 |
## π Specifications
|
13 |
|
14 |
-
- **Output classes**: 656 (
|
15 |
- **Classification system**: CPC (Cooperative Patent Classification)
|
16 |
- **Architecture**: BERT-base (768 hidden, 12 layers, 12 attention heads)
|
17 |
- **Vocabulary**: 30,522 tokens
|
@@ -32,7 +32,7 @@ The model predicts classes according to the authentic CPC system used in PatentB
|
|
32 |
- **H (51 classes)**: Electricity - Electronics, Power generation, Communication
|
33 |
- **Y (9 classes)**: General Tagging of New Technological Developments
|
34 |
|
35 |
-
### Example
|
36 |
|
37 |
- `A01B`: SOIL WORKING IN AGRICULTURE OR FORESTRY
|
38 |
- `B25J`: MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
|
@@ -63,7 +63,7 @@ with torch.no_grad():
|
|
63 |
predicted_class_id = predictions.argmax().item()
|
64 |
confidence = predictions.max().item()
|
65 |
|
66 |
-
# Use model labels (
|
67 |
predicted_label = model.config.id2label[str(predicted_class_id)]
|
68 |
|
69 |
print(f"Predicted CPC class: {predicted_label} (ID: {predicted_class_id})")
|
@@ -73,21 +73,20 @@ print(f"Confidence: {confidence:.2%}")
|
|
73 |
## π Included Files
|
74 |
|
75 |
- `model.safetensors`: Model weights (420 MB)
|
76 |
-
- `config.json`: Configuration with integrated
|
77 |
- `vocab.txt`: Tokenizer vocabulary
|
78 |
- `tokenizer_config.json`: Tokenizer configuration
|
79 |
-
- `labels.json`: Complete
|
80 |
- `README.md`: This documentation
|
81 |
|
82 |
## π¬ Performance
|
83 |
|
84 |
-
This model was trained on a large patent corpus to automatically classify documents according to the
|
85 |
|
86 |
## π References
|
87 |
|
88 |
- [Cooperative Patent Classification (CPC)](https://www.cooperativepatentclassification.org/)
|
89 |
- [Original PatentBERT Paper](https://arxiv.org/abs/2103.02557)
|
90 |
-
- [Hugging Face Transformers](https://huggingface.co/transformers/)
|
91 |
|
92 |
## π Citation
|
93 |
|
|
|
7 |
---
|
8 |
# PatentBERT - PyTorch
|
9 |
|
10 |
+
BERT model specialized for patent classification using the **CPC (Cooperative Patent Classification) system**. (PyTorch version of the original [PatentBert](https://github.com/jiehsheng/PatentBERT/) model.)
|
11 |
|
12 |
## π Specifications
|
13 |
|
14 |
+
- **Output classes**: 656 (CPC subclass labels)
|
15 |
- **Classification system**: CPC (Cooperative Patent Classification)
|
16 |
- **Architecture**: BERT-base (768 hidden, 12 layers, 12 attention heads)
|
17 |
- **Vocabulary**: 30,522 tokens
|
|
|
32 |
- **H (51 classes)**: Electricity - Electronics, Power generation, Communication
|
33 |
- **Y (9 classes)**: General Tagging of New Technological Developments
|
34 |
|
35 |
+
### Example of CPC Subclasses
|
36 |
|
37 |
- `A01B`: SOIL WORKING IN AGRICULTURE OR FORESTRY
|
38 |
- `B25J`: MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
|
|
|
63 |
predicted_class_id = predictions.argmax().item()
|
64 |
confidence = predictions.max().item()
|
65 |
|
66 |
+
# Use model labels (CPC codes)
|
67 |
predicted_label = model.config.id2label[str(predicted_class_id)]
|
68 |
|
69 |
print(f"Predicted CPC class: {predicted_label} (ID: {predicted_class_id})")
|
|
|
73 |
## π Included Files
|
74 |
|
75 |
- `model.safetensors`: Model weights (420 MB)
|
76 |
+
- `config.json`: Configuration with integrated CPC labels
|
77 |
- `vocab.txt`: Tokenizer vocabulary
|
78 |
- `tokenizer_config.json`: Tokenizer configuration
|
79 |
+
- `labels.json`: Complete CPC label mapping (656 authentic labels)
|
80 |
- `README.md`: This documentation
|
81 |
|
82 |
## π¬ Performance
|
83 |
|
84 |
+
This model was trained on a large patent corpus to automatically classify documents according to the CPC system, using the exact same 656 CPC codes from the original PatentBERT training data.
|
85 |
|
86 |
## π References
|
87 |
|
88 |
- [Cooperative Patent Classification (CPC)](https://www.cooperativepatentclassification.org/)
|
89 |
- [Original PatentBERT Paper](https://arxiv.org/abs/2103.02557)
|
|
|
90 |
|
91 |
## π Citation
|
92 |
|