thearod5
/

nl-bert

Text Classification

software engineering

software traceability

Model card Files Files and versions Community

nl-bert / README.md

thearod5's picture

Upload tokenizer

2c93634 verified about 1 year ago

|

history blame contribute delete

3.12 kB

	---
	library_name: transformers
	tags:
	- software engineering
	- software traceability
	---

	# Model Card for nl-bert

	Provides TAPT (Task Adaptive Pretraining) model from "Enhancing Automated Software Traceability by
	Transfer Learning from Open-World Data".

	## Model Details

	### Model Description

	This model was trained to predict trace links between issue and commits on GitHub data from 2016-21.


	- Developed by: Jinfeng Lin, University of Notre Dame
	- Shared by [optional]: Alberto Rodriguez, University of Notre Dame
	- Model type: BertForSequenceClassification
	- Language(s) (NLP): EN
	- License: MIT

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/thearod5/se-models
	- Paper: https://arxiv.org/abs/2207.01084

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	[More Information Needed]

	### Downstream Use [optional]

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	[More Information Needed]

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	[More Information Needed]

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	[More Information Needed]

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.


	## Training Details
	Please see cite paper for full training details.

	## Evaluation

	Please see cited paper for full evaluation.

	### Results

	The model achieved a MAP score improvement of over 20% compared to baseline models. See cited paper for full details.

	## Environmental Impact

	- Hardware Type: Distributed machine pool
	- Hours used: 72 hours

	# Technical Specifications [optional]
	# Model Architecture and Objective
	The model uses a Single-BERT architecture from the TBERT framework, which performs well on traceability tasks by encoding concatenated source and target artifacts.

	# Compute Infrastructure
	Hardware
	300 servers in a distributed machine pool

	# Software
	- Transformers library
	- PyTorch
	- HTCondor for distributed computation

	## Citation

	BibTeX:

	@misc{lin2022enhancing,
	title={Enhancing Automated Software Traceability by Transfer Learning from Open-World Data},
	author={Jinfeng Lin and Amrit Poudel and Wenhao Yu and Qingkai Zeng and Meng Jiang and Jane Cleland-Huang},
	year={2022},
	eprint={2207.01084},
	archivePrefix={arXiv},
	primaryClass={cs.SE}
	}

	## Model Card Authors
	Alberto Rodriguez

	## Model Card Contact
	Alberto Rodriguez ([email protected])