File size: 5,482 Bytes
5214531
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108

---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---

# sb_clustering_topics

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. 
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. 

## Usage 

To use this model, please install BERTopic:

```
pip install -U bertopic
```

You can use the model as follows:

```python
from bertopic import BERTopic
topic_model = BERTopic.load("Thabet/sb_clustering_topics")

topic_model.get_topic_info()
```

## Topic overview

* Number of topics: 40
* Number of training documents: 1636

<details>
  <summary>Click here for an overview of all topics.</summary>
  
  | Topic ID | Topic Keywords | Topic Frequency | Label | 
|----------|----------------|-----------------|-------| 
| -1 | jt - actu - fte - ville - 57 | 11 | -1_jt_actu_fte_ville | 
| 0 | vef - invit - invite - portrait - diff | 613 | 0_vef_invit_invite_portrait | 
| 1 | foot - football - ligue - fc - match | 62 | 1_foot_football_ligue_fc | 
| 2 | festival - jazz - dition - pommiers - jazz pommiers | 59 | 2_festival_jazz_dition_pommiers | 
| 3 | renvoi - college - collge - lyce - lcole | 59 | 3_renvoi_college_collge_lyce | 
| 4 | tribunal - proces - procs - affaire - permis conduire | 56 | 4_tribunal_proces_procs_affaire | 
| 5 | tourisme - weekend - ascension - lascension - weekend ascension | 48 | 5_tourisme_weekend_ascension_lascension | 
| 6 | urgences - non - soignants non - soignants - non vaccins | 48 | 6_urgences_non_soignants non_soignants | 
| 7 | muse - expo - chteau - chateau - monument | 44 | 7_muse_expo_chteau_chateau | 
| 8 | eau - deau - leau - eaux - qualite | 42 | 8_eau_deau_leau_eaux | 
| 9 | culture - teaser chronique - chronique - teaser - jour | 39 | 9_culture_teaser chronique_chronique_teaser | 
| 10 | homophobie - contre - lgbt - contre lhomophobie - lhomophobie | 36 | 10_homophobie_contre_lgbt_contre lhomophobie | 
| 11 | basket - d69 basket - asvel - fminin villeneuve - finale | 33 | 11_basket_d69 basket_asvel_fminin villeneuve | 
| 12 | rugby - mont marsan - marsan - dublin - finale dublin | 31 | 12_rugby_mont marsan_marsan_dublin | 
| 13 | roues folie - roues - srie roues - folie - srie | 29 | 13_roues folie_roues_srie roues_folie | 
| 14 | grve - sncf - brve - sncf dijon - breve | 27 | 14_grve_sncf_brve_sncf dijon | 
| 15 | rue - rue pierre - parking - mauroy - pierre mauroy | 26 | 15_rue_rue pierre_parking_mauroy | 
| 16 | ouvrier - ouvrier france - serie - france - meilleur ouvrier | 23 | 16_ouvrier_ouvrier france_serie_france | 
| 17 | feux - agricoles - vols - agriculteurs - vols gps | 23 | 17_feux_agricoles_vols_agriculteurs | 
| 18 | vertbaudet - centre - commerants - commerce - centreville | 22 | 18_vertbaudet_centre_commerants_commerce | 
| 19 | archives - policier - policiers - congrs ps - politique | 22 | 19_archives_policier_policiers_congrs ps | 
| 20 | recyclage - made in - made - transforme - carton | 22 | 20_recyclage_made in_made_transforme | 
| 21 | cvdl - trail - routes - route - cvdl invite | 21 | 21_cvdl_trail_routes_route | 
| 22 | sniors - ans - dune - maison - secondaires | 20 | 22_sniors_ans_dune_maison | 
| 23 | sports - sport - loc sport - loc - aim | 20 | 23_sports_sport_loc sport_loc | 
| 24 | cannes - festival cannes - festival - cannes festival - d06 | 18 | 24_cannes_festival cannes_festival_cannes festival | 
| 25 | maires - maire - dmission - maire veyrac - dep dmission | 17 | 25_maires_maire_dmission_maire veyrac | 
| 26 | solidaire - bo - bouquinerie solidaire - rdvcv - rdvcv bo | 16 | 26_solidaire_bo_bouquinerie solidaire_rdvcv | 
| 27 | armada - vins - mer - larmada - maritime | 15 | 27_armada_vins_mer_larmada | 
| 28 | accident - accident mortel - mortel - fayssal - mortel minibus | 15 | 28_accident_accident mortel_mortel_fayssal | 
| 29 | dunkerque - jours dunkerque - jours - tape - dunkerque tape | 14 | 29_dunkerque_jours dunkerque_jours_tape | 
| 30 | armes anciennes - participants - twirling bton - twirling - convention | 13 | 30_armes anciennes_participants_twirling bton_twirling | 
| 31 | cpop - carmina - eyes - planete - savaoo application | 12 | 31_cpop_carmina_eyes_planete | 
| 32 | secheresse - scurit - levage - limplantation - poules salmagne | 12 | 32_secheresse_scurit_levage_limplantation | 
| 33 | bio - aides - d86 - hugues bioret - dossier presse | 12 | 33_bio_aides_d86_hugues bioret | 
| 34 | collectif - camping - collecte - infimiers libraux - sr d51 | 12 | 34_collectif_camping_collecte_infimiers libraux | 
| 35 | grand prix - prix - grand - pau - race | 11 | 35_grand prix_prix_grand_pau | 
| 36 | boom - technique - entreprise - emploi open - futurs | 11 | 36_boom_technique_entreprise_emploi open | 
| 37 | championnat - escrime - 57 - championnat escrime - 57 championnat | 11 | 37_championnat_escrime_57_championnat escrime | 
| 38 | oiseaux - population - oiseaux lpo - lpo - gl69 oorion | 11 | 38_oiseaux_population_oiseaux lpo_lpo |
  
</details>

## Training hyperparameters

* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False

## Framework versions

* Numpy: 1.23.5
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.15.0
* Python: 3.10.12