File size: 5,482 Bytes
5214531 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---
# sb_clustering_topics
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("Thabet/sb_clustering_topics")
topic_model.get_topic_info()
```
## Topic overview
* Number of topics: 40
* Number of training documents: 1636
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | jt - actu - fte - ville - 57 | 11 | -1_jt_actu_fte_ville |
| 0 | vef - invit - invite - portrait - diff | 613 | 0_vef_invit_invite_portrait |
| 1 | foot - football - ligue - fc - match | 62 | 1_foot_football_ligue_fc |
| 2 | festival - jazz - dition - pommiers - jazz pommiers | 59 | 2_festival_jazz_dition_pommiers |
| 3 | renvoi - college - collge - lyce - lcole | 59 | 3_renvoi_college_collge_lyce |
| 4 | tribunal - proces - procs - affaire - permis conduire | 56 | 4_tribunal_proces_procs_affaire |
| 5 | tourisme - weekend - ascension - lascension - weekend ascension | 48 | 5_tourisme_weekend_ascension_lascension |
| 6 | urgences - non - soignants non - soignants - non vaccins | 48 | 6_urgences_non_soignants non_soignants |
| 7 | muse - expo - chteau - chateau - monument | 44 | 7_muse_expo_chteau_chateau |
| 8 | eau - deau - leau - eaux - qualite | 42 | 8_eau_deau_leau_eaux |
| 9 | culture - teaser chronique - chronique - teaser - jour | 39 | 9_culture_teaser chronique_chronique_teaser |
| 10 | homophobie - contre - lgbt - contre lhomophobie - lhomophobie | 36 | 10_homophobie_contre_lgbt_contre lhomophobie |
| 11 | basket - d69 basket - asvel - fminin villeneuve - finale | 33 | 11_basket_d69 basket_asvel_fminin villeneuve |
| 12 | rugby - mont marsan - marsan - dublin - finale dublin | 31 | 12_rugby_mont marsan_marsan_dublin |
| 13 | roues folie - roues - srie roues - folie - srie | 29 | 13_roues folie_roues_srie roues_folie |
| 14 | grve - sncf - brve - sncf dijon - breve | 27 | 14_grve_sncf_brve_sncf dijon |
| 15 | rue - rue pierre - parking - mauroy - pierre mauroy | 26 | 15_rue_rue pierre_parking_mauroy |
| 16 | ouvrier - ouvrier france - serie - france - meilleur ouvrier | 23 | 16_ouvrier_ouvrier france_serie_france |
| 17 | feux - agricoles - vols - agriculteurs - vols gps | 23 | 17_feux_agricoles_vols_agriculteurs |
| 18 | vertbaudet - centre - commerants - commerce - centreville | 22 | 18_vertbaudet_centre_commerants_commerce |
| 19 | archives - policier - policiers - congrs ps - politique | 22 | 19_archives_policier_policiers_congrs ps |
| 20 | recyclage - made in - made - transforme - carton | 22 | 20_recyclage_made in_made_transforme |
| 21 | cvdl - trail - routes - route - cvdl invite | 21 | 21_cvdl_trail_routes_route |
| 22 | sniors - ans - dune - maison - secondaires | 20 | 22_sniors_ans_dune_maison |
| 23 | sports - sport - loc sport - loc - aim | 20 | 23_sports_sport_loc sport_loc |
| 24 | cannes - festival cannes - festival - cannes festival - d06 | 18 | 24_cannes_festival cannes_festival_cannes festival |
| 25 | maires - maire - dmission - maire veyrac - dep dmission | 17 | 25_maires_maire_dmission_maire veyrac |
| 26 | solidaire - bo - bouquinerie solidaire - rdvcv - rdvcv bo | 16 | 26_solidaire_bo_bouquinerie solidaire_rdvcv |
| 27 | armada - vins - mer - larmada - maritime | 15 | 27_armada_vins_mer_larmada |
| 28 | accident - accident mortel - mortel - fayssal - mortel minibus | 15 | 28_accident_accident mortel_mortel_fayssal |
| 29 | dunkerque - jours dunkerque - jours - tape - dunkerque tape | 14 | 29_dunkerque_jours dunkerque_jours_tape |
| 30 | armes anciennes - participants - twirling bton - twirling - convention | 13 | 30_armes anciennes_participants_twirling bton_twirling |
| 31 | cpop - carmina - eyes - planete - savaoo application | 12 | 31_cpop_carmina_eyes_planete |
| 32 | secheresse - scurit - levage - limplantation - poules salmagne | 12 | 32_secheresse_scurit_levage_limplantation |
| 33 | bio - aides - d86 - hugues bioret - dossier presse | 12 | 33_bio_aides_d86_hugues bioret |
| 34 | collectif - camping - collecte - infimiers libraux - sr d51 | 12 | 34_collectif_camping_collecte_infimiers libraux |
| 35 | grand prix - prix - grand - pau - race | 11 | 35_grand prix_prix_grand_pau |
| 36 | boom - technique - entreprise - emploi open - futurs | 11 | 36_boom_technique_entreprise_emploi open |
| 37 | championnat - escrime - 57 - championnat escrime - 57 championnat | 11 | 37_championnat_escrime_57_championnat escrime |
| 38 | oiseaux - population - oiseaux lpo - lpo - gl69 oorion | 11 | 38_oiseaux_population_oiseaux lpo_lpo |
</details>
## Training hyperparameters
* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False
## Framework versions
* Numpy: 1.23.5
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.15.0
* Python: 3.10.12
|