model base: https://huggingface.co/google-bert/bert-base-uncased

dataset: https://github.com/ramybaly/Article-Bias-Prediction

training parameters:

  • batch_size: 100
  • epochs: 5
  • dropout: 0.05
  • max_length: 512
  • learning_rate: 3e-5
  • warmup_steps: 100
  • random_state: 239

training methodology:

  • sanitize dataset following specific rule-set, utilize random split as provided in the dataset
  • train on train split and evaluate on validation split in each epoch
  • evaluate test split only on the model that performed best on validation loss

result summary:

  • throughout the five training epochs, model of second epoch achieved the lowest validation loss of 0.3314
  • on test split second epoch model achieved f1 score of 0.9041

usage:

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer


def main(repository: str):

    model = AutoModelForSequenceClassification.from_pretrained(repository)

    tokenizer = AutoTokenizer.from_pretrained(repository)

    nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)

    print(nlp("the masses are controlled by media."))

if __name__ == "__main__":
    main(repository="premsa/political-bias-prediction-allsides-BERT")
Downloads last month
815
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.