Description

Profile text data classification using SpaCy textcat components. A super tiny model with no vectors. Internally uses an ensemle of BOW and single-headed attention. I've got better performance scores with transformer-based models but they are much slower to train/run on CPU and much larger. Sticking to the basics so far.

The model is trained on specifically formatted profile fields [name] ([login], [email]) in [location] so it's critical to format the data with the provided format_input helper before applying a model to the text.

Usage

from en_textcat_gender import load as load_gender_model
from en_textcat_gender.utils import format_input

gender_model = load_gender_model()

class PredictGenderByProfileInput(BaseModel):
  login: str
  name: str
  location: str | None = None
  email: str | None = None

type Gender = Literal["Male", "Female", "Neutral"]

# Example of model use in the context of FastAPI route
@router.post("/predictGenderByProfile")
async def predictGenderByProfile(body: list[PredictGenderByProfileInput]) -> list[Gender]:
  texts = [format_input({
    "login": input.login,
    "name": input.name,
    "location": input.location,
    "email": input.email,
  }) for input in body]
  docs = gender_model.pipe(texts)
  genders: list[GenderAlt] = []
  for doc in docs:
    match max(doc.cats, key=doc.cats.get):
      case "FEMALE": genders.append("Female")
      case "MALE": genders.append("Male")
      case "NEUTRAL": genders.append("Neutral")
      case _: raise HTTPException(500, "Invalid enum value")
  return genders
Feature Description
Name en_textcat_gender
Version 0.0.3
spaCy >=3.8.7,<3.9.0
Default Pipeline textcat
Components textcat
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources n/a
License n/a
Author n/a

Label Scheme

Component Labels
textcat MALE, FEMALE, NEUTRAL
  • MALE is a prediction of male human profiles
  • FEMALE is a prediction for female human profiles
  • NEUTRAL is a prediction for gender-neutral names, unintelligible sequences ("Foo Bar") and non-human profiles (organizations, companies)

Accuracy

Type Score
CATS_SCORE 93.37
CATS_MICRO_P 93.19
CATS_MICRO_R 93.19
CATS_MICRO_F 93.19
CATS_MACRO_P 93.75
CATS_MACRO_R 93.04
CATS_MACRO_F 93.37
CATS_MACRO_AUC 98.76
CATS_MACRO_AUC_PER_TYPE 0.00
TEXTCAT_LOSS 105.21
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support