File size: 1,190 Bytes
33f65d9
 
7116e96
 
 
 
 
 
 
 
 
33f65d9
 
7116e96
33f65d9
7116e96
33f65d9
7116e96
33f65d9
7116e96
 
33f65d9
 
7116e96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
library_name: transformers
datasets:
- SoFairOA/sofair_softcite_somesci
metrics:
- recall
- precision
- f1
- accuracy
base_model:
- answerdotai/ModernBERT-base
---

# SoFair ModernBERT base filter

Fine-tuned ModernBERT to identify candidate documents for software mention extraction.

It was trained on [SoFairOA/sofair_softcite_somesci](https://huggingface.co/datasets/SoFairOA/sofair_softcite_somesci) (sofair_softcite_somesci_documents) to classify whether the given document contains at least one annotation.

## Usage
We created [https://github.com/SoFairOA/filter](https://github.com/SoFairOA/filter), a simple command-line tool to use this model for processing a collection of documents.

## Evaluation
We evaluated this model on the test set of [SoFairOA/sofair_softcite_somesci](https://huggingface.co/datasets/SoFairOA/sofair_softcite_somesci) (sofair_softcite_somesci_documents) dataset:

<table>
  <tr>
    <th>precision</th>
    <td>0.8625730994152047</td>
  </tr>
  <tr>
    <th>recall</th>
    <td>0.9104938271604939</td>
  </tr>
  <tr>
    <th>f1</th>
    <td>0.8858858858858859</td>
  </tr>
  <tr>
    <th>accuracy</th>
    <td>0.9268527430221367</td>
  </tr>
</table>