File size: 3,120 Bytes
24c764d
 
2c93634
 
 
24c764d
 
86ac494
24c764d
86ac494
 
24c764d
 
 
 
 
86ac494
24c764d
 
86ac494
 
 
 
 
24c764d
 
 
 
 
86ac494
 
24c764d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86ac494
24c764d
 
 
86ac494
24c764d
 
 
86ac494
24c764d
 
 
86ac494
 
24c764d
86ac494
 
 
24c764d
86ac494
 
 
24c764d
86ac494
 
 
 
24c764d
86ac494
24c764d
 
 
86ac494
 
 
 
 
 
 
 
24c764d
86ac494
 
24c764d
 
86ac494
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
library_name: transformers
tags:
- software engineering
- software traceability
---

# Model Card for nl-bert

Provides TAPT (Task Adaptive Pretraining) model from "Enhancing Automated Software Traceability by 
Transfer Learning from Open-World Data".

## Model Details

### Model Description

This model was trained to predict trace links between issue and commits on GitHub data from 2016-21.


- **Developed by:** Jinfeng Lin, University of Notre Dame
- **Shared by [optional]:** Alberto Rodriguez, University of Notre Dame
- **Model type:** BertForSequenceClassification
- **Language(s) (NLP):** EN
- **License:** MIT

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/thearod5/se-models
- **Paper:** https://arxiv.org/abs/2207.01084

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.


## Training Details
Please see cite paper for full training details.

## Evaluation

Please see cited paper for full evaluation.

### Results

The model achieved a MAP score improvement of over 20% compared to baseline models. See cited paper for full details.

## Environmental Impact

- **Hardware Type:** Distributed machine pool
- **Hours used:** 72 hours

# Technical Specifications [optional]
# Model Architecture and Objective
The model uses a Single-BERT architecture from the TBERT framework, which performs well on traceability tasks by encoding concatenated source and target artifacts.

# Compute Infrastructure
Hardware
300 servers in a distributed machine pool

# Software
- Transformers library
- PyTorch
- HTCondor for distributed computation

## Citation

**BibTeX:**

@misc{lin2022enhancing,
      title={Enhancing Automated Software Traceability by Transfer Learning from Open-World Data}, 
      author={Jinfeng Lin and Amrit Poudel and Wenhao Yu and Qingkai Zeng and Meng Jiang and Jane Cleland-Huang},
      year={2022},
      eprint={2207.01084},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

## Model Card Authors
Alberto Rodriguez

## Model Card Contact
Alberto Rodriguez ([email protected])