---
license: cc-by-4.0
---

Support vector machine classifiers (one per OCEAN trait) trained with Qwen3-8B embeddings of Pennebaker and King's Essays dataset.

The training was conducted using `test_size=0.2` and `random_state=42`.

Accuracy metrics:
* Openness (O): 65.6%
* Conscientiousness (C): 58.9%
* Extraversion (E): 60.5%
* Agreeableness (A): 61.1%
* Neuroticism (N): 57.9%
* Mean accuracy: 60.81%

Download all `.joblib` files and use like:
```py
from sentence_transformers import SentenceTransformer
from joblib import load
import numpy as np


embedder = SentenceTransformer("Qwen/Qwen3-Embedding-8B", device='cuda')
traits = ['o', 'c', 'e', 'a', 'n']
classifiers = {trait: load(f"models_qwen3_8b/{trait}.joblib") for trait in traits}


def predict_personality(texts, batch_size=8):
    embeddings = embedder.encode(
        texts,
        batch_size=batch_size,
        convert_to_numpy=True,
        normalize_embeddings=True,
        show_progress_bar=True
    )
    results = []
    for embedding in embeddings:
        scores = []
        for trait in traits:
            pred = classifiers[trait].predict(embedding.reshape(1, -1))[0]
            scores.append(int(pred))
        results.append(scores)
    return results


texts = [
   "I enjoy working in solitude and reflecting deeply.",
   "I love going out, meeting new people, and trying new things!"
]
predictions = predict_personality(texts)
for text, profile in zip(texts, predictions):
   print(f"\nText: {text}\nOCEAN: {profile}")
```

We ran the script using an A40 with 32GB of VRAM. The embedding process supports up to 8 texts at a time with this setup.