--- license: cc-by-4.0 --- Support vector machine classifiers (one per OCEAN trait) trained with Qwen3-8B embeddings of Pennebaker and King's Essays dataset. The training was conducted using `test_size=0.2` and `random_state=42`. Accuracy metrics: * Openness (O): 65.6% * Conscientiousness (C): 58.9% * Extraversion (E): 60.5% * Agreeableness (A): 61.1% * Neuroticism (N): 57.9% * Mean accuracy: 60.81% Download all `.joblib` files and use like: ```py from sentence_transformers import SentenceTransformer from joblib import load import numpy as np embedder = SentenceTransformer("Qwen/Qwen3-Embedding-8B", device='cuda') traits = ['o', 'c', 'e', 'a', 'n'] classifiers = {trait: load(f"models_qwen3_8b/{trait}.joblib") for trait in traits} def predict_personality(texts, batch_size=8): embeddings = embedder.encode( texts, batch_size=batch_size, convert_to_numpy=True, normalize_embeddings=True, show_progress_bar=True ) results = [] for embedding in embeddings: scores = [] for trait in traits: pred = classifiers[trait].predict(embedding.reshape(1, -1))[0] scores.append(int(pred)) results.append(scores) return results texts = [ "I enjoy working in solitude and reflecting deeply.", "I love going out, meeting new people, and trying new things!" ] predictions = predict_personality(texts) for text, profile in zip(texts, predictions): print(f"\nText: {text}\nOCEAN: {profile}") ``` We ran the script using an A40 with 32GB of VRAM. The embedding process supports up to 8 texts at a time with this setup.