nl-bert / README.md
thearod5's picture
Upload tokenizer
2c93634 verified
---
library_name: transformers
tags:
- software engineering
- software traceability
---
# Model Card for nl-bert
Provides TAPT (Task Adaptive Pretraining) model from "Enhancing Automated Software Traceability by
Transfer Learning from Open-World Data".
## Model Details
### Model Description
This model was trained to predict trace links between issue and commits on GitHub data from 2016-21.
- **Developed by:** Jinfeng Lin, University of Notre Dame
- **Shared by [optional]:** Alberto Rodriguez, University of Notre Dame
- **Model type:** BertForSequenceClassification
- **Language(s) (NLP):** EN
- **License:** MIT
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/thearod5/se-models
- **Paper:** https://arxiv.org/abs/2207.01084
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## Training Details
Please see cite paper for full training details.
## Evaluation
Please see cited paper for full evaluation.
### Results
The model achieved a MAP score improvement of over 20% compared to baseline models. See cited paper for full details.
## Environmental Impact
- **Hardware Type:** Distributed machine pool
- **Hours used:** 72 hours
# Technical Specifications [optional]
# Model Architecture and Objective
The model uses a Single-BERT architecture from the TBERT framework, which performs well on traceability tasks by encoding concatenated source and target artifacts.
# Compute Infrastructure
Hardware
300 servers in a distributed machine pool
# Software
- Transformers library
- PyTorch
- HTCondor for distributed computation
## Citation
**BibTeX:**
@misc{lin2022enhancing,
title={Enhancing Automated Software Traceability by Transfer Learning from Open-World Data},
author={Jinfeng Lin and Amrit Poudel and Wenhao Yu and Qingkai Zeng and Meng Jiang and Jane Cleland-Huang},
year={2022},
eprint={2207.01084},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
## Model Card Authors
Alberto Rodriguez
## Model Card Contact
Alberto Rodriguez ([email protected])