---
library_name: transformers
tags:
- software engineering
- software traceability
---

# Model Card for nl-bert

Provides TAPT (Task Adaptive Pretraining) model from "Enhancing Automated Software Traceability by 
Transfer Learning from Open-World Data".

## Model Details

### Model Description

This model was trained to predict trace links between issue and commits on GitHub data from 2016-21.


- **Developed by:** Jinfeng Lin, University of Notre Dame
- **Shared by [optional]:** Alberto Rodriguez, University of Notre Dame
- **Model type:** BertForSequenceClassification
- **Language(s) (NLP):** EN
- **License:** MIT

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/thearod5/se-models
- **Paper:** https://arxiv.org/abs/2207.01084

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.


## Training Details
Please see cite paper for full training details.

## Evaluation

Please see cited paper for full evaluation.

### Results

The model achieved a MAP score improvement of over 20% compared to baseline models. See cited paper for full details.

## Environmental Impact

- **Hardware Type:** Distributed machine pool
- **Hours used:** 72 hours

# Technical Specifications [optional]
# Model Architecture and Objective
The model uses a Single-BERT architecture from the TBERT framework, which performs well on traceability tasks by encoding concatenated source and target artifacts.

# Compute Infrastructure
Hardware
300 servers in a distributed machine pool

# Software
- Transformers library
- PyTorch
- HTCondor for distributed computation

## Citation

**BibTeX:**

@misc{lin2022enhancing,
      title={Enhancing Automated Software Traceability by Transfer Learning from Open-World Data}, 
      author={Jinfeng Lin and Amrit Poudel and Wenhao Yu and Qingkai Zeng and Meng Jiang and Jane Cleland-Huang},
      year={2022},
      eprint={2207.01084},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

## Model Card Authors
Alberto Rodriguez

## Model Card Contact
Alberto Rodriguez (arodri39@nd.edu)