hebashakeel commited on
Commit
21b65d4
·
verified ·
1 Parent(s): 9129f2e

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # BERTopic_topics_agriculture
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("hebashakeel/BERTopic_topics_agriculture")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 86
34
+ * Number of training documents: 58144
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | - - - - | 162 | -1____ |
42
+ | 0 | sauce - you - recipe - add - with | 1 | 0_sauce_you_recipe_add |
43
+ | 1 | crops - crop - ha - wheat - spring | 3467 | 1_crops_crop_ha_wheat |
44
+ | 2 | tractor - deere - the - with - it | 3097 | 2_tractor_deere_the_with |
45
+ | 3 | cows - milk - is - cattle - grazing | 2373 | 3_cows_milk_is_cattle |
46
+ | 4 | prices - wheat - corn - year - million | 2297 | 4_prices_wheat_corn_year |
47
+ | 5 | soil - plants - plant - fruit - is | 2325 | 5_soil_plants_plant_fruit |
48
+ | 6 | canada - ontario - agriculture - canadian - and | 2274 | 6_canada_ontario_agriculture_canadian |
49
+ | 7 | we - that - to - the - of | 2362 | 7_we_that_to_the |
50
+ | 8 | market - analysis - report - global - forecast | 4450 | 8_market_analysis_report_global |
51
+ | 9 | usda - program - and - agriculture - programs | 1103 | 9_usda_program_and_agriculture |
52
+ | 10 | uk - trade - workers - government - the | 1697 | 10_uk_trade_workers_government |
53
+ | 11 | corn - percent - moisture - crop - planting | 1406 | 11_corn_percent_moisture_crop |
54
+ | 12 | poultry - birds - chickens - feed - eggs | 909 | 12_poultry_birds_chickens_feed |
55
+ | 13 | pig - pigs - npa - producers - said | 719 | 13_pig_pigs_npa_producers |
56
+ | 14 | mental - health - people - farmers - charity | 559 | 14_mental_health_people_farmers |
57
+ | 15 | police - crime - dog - rural - said | 595 | 15_police_crime_dog_rural |
58
+ | 16 | organic - farming - crops - soil - certification | 654 | 16_organic_farming_crops_soil |
59
+ | 17 | candle - you - your - amazon - candles | 649 | 17_candle_you_your_amazon |
60
+ | 18 | fish - pond - water - catfish - ponds | 732 | 18_fish_pond_water_catfish |
61
+ | 19 | kg - lamb - lambs - cattle - beef | 514 | 19_kg_lamb_lambs_cattle |
62
+ | 20 | scheme - defra - farmers - sfi - land | 601 | 20_scheme_defra_farmers_sfi |
63
+ | 21 | based - plant - foods - meat - company | 792 | 21_based_plant_foods_meat |
64
+ | 22 | gene - plant - plants - genetic - research | 731 | 22_gene_plant_plants_genetic |
65
+ | 23 | we - our - quarter - that - think | 742 | 23_we_our_quarter_that |
66
+ | 24 | carbon - emissions - farmers - to - and | 538 | 24_carbon_emissions_farmers_to |
67
+ | 25 | deforestation - forest - forests - eu - brazil | 1191 | 25_deforestation_forest_forests_eu |
68
+ | 26 | laws - farmers - fence - government - minister | 560 | 26_laws_farmers_fence_government |
69
+ | 27 | pig - pigs - sows - sow - farrowing | 417 | 27_pig_pigs_sows_sow |
70
+ | 28 | safety - was - fire - farm - the | 481 | 28_safety_was_fire_farm |
71
+ | 29 | fish - ocean - marine - the - seafood | 537 | 29_fish_ocean_marine_the |
72
+ | 30 | land - agricultural - property - title - registration | 436 | 30_land_agricultural_property_title |
73
+ | 31 | land - farmland - acre - acres - values | 452 | 31_land_farmland_acre_acres |
74
+ | 32 | snail - snails - read - also - you | 511 | 32_snail_snails_read_also |
75
+ | 33 | de - que - la - en - el | 302 | 33_de_que_la_en |
76
+ | 34 | antibiotics - antibiotic - pigs - animal - animals | 375 | 34_antibiotics_antibiotic_pigs_animal |
77
+ | 35 | blood - vitamin - health - cancer - broccoli | 402 | 35_blood_vitamin_health_cancer |
78
+ | 36 | disney - rss - websites - turning - url | 449 | 36_disney_rss_websites_turning |
79
+ | 37 | exports - milk - dairy - beef - year | 330 | 37_exports_milk_dairy_beef |
80
+ | 38 | we - sheep - ewes - lambs - have | 552 | 38_we_sheep_ewes_lambs |
81
+ | 39 | your - you - diet - soy - body | 500 | 39_your_you_diet_soy |
82
+ | 40 | milking - mastitis - teat - cows - cow | 630 | 40_milking_mastitis_teat_cows |
83
+ | 41 | farming - nfu - scotland - the - will | 290 | 41_farming_nfu_scotland_the |
84
+ | 42 | my - was - his - he - it | 747 | 42_my_was_his_he |
85
+ | 43 | ethanol - rail - fuel - e15 - biofuels | 956 | 43_ethanol_rail_fuel_e15 |
86
+ | 44 | bees - species - bee - study - of | 348 | 44_bees_species_bee_study |
87
+ | 45 | you - your - to - that - it | 366 | 45_you_your_to_that |
88
+ | 46 | birds - avian - poultry - influenza - flu | 476 | 46_birds_avian_poultry_influenza |
89
+ | 47 | swine - asf - disease - virus - fever | 274 | 47_swine_asf_disease_virus |
90
+ | 48 | scheme - payments - welsh - bps - payment | 282 | 48_scheme_payments_welsh_bps |
91
+ | 49 | tb - bovine - test - cattle - badger | 459 | 49_tb_bovine_test_cattle |
92
+ | 50 | tax - business - be - insurance - or | 264 | 50_tax_business_be_insurance |
93
+ | 51 | agriculture - agricultural - state - of - the | 541 | 51_agriculture_agricultural_state_of |
94
+ | 52 | tenant - tenants - landlords - tenancy - scheme | 595 | 52_tenant_tenants_landlords_tenancy |
95
+ | 53 | soil - carbon - water - crop - and | 302 | 53_soil_carbon_water_crop |
96
+ | 54 | court - epa - rule - law - plaintiffs | 853 | 54_court_epa_rule_law |
97
+ | 55 | litre - milk - price - dairy - arla | 408 | 55_litre_milk_price_dairy |
98
+ | 56 | feeder - pounds - steers - cattle - week | 259 | 56_feeder_pounds_steers_cattle |
99
+ | 57 | you - they - chicken - that - them | 202 | 57_you_they_chicken_that |
100
+ | 58 | autonomous - robots - robot - technology - the | 417 | 58_autonomous_robots_robot_technology |
101
+ | 59 | wool - micron - merino - cargill - strike | 300 | 59_wool_micron_merino_cargill |
102
+ | 60 | agree - cookies - website - privacy - analytics | 201 | 60_agree_cookies_website_privacy |
103
+ | 61 | farm - you - her - your - to | 231 | 61_farm_you_her_your |
104
+ | 62 | statements - forward - looking - company - uncertainties | 534 | 62_statements_forward_looking_company |
105
+ | 63 | woodland - trees - forestry - carbon - planting | 221 | 63_woodland_trees_forestry_carbon |
106
+ | 64 | protein - oz - powder - ends - powders | 285 | 64_protein_oz_powder_ends |
107
+ | 65 | closed - at - hogs - down - cents | 203 | 65_closed_at_hogs_down |
108
+ | 66 | link - place - related - services - agric4profits | 226 | 66_link_place_related_services |
109
+ | 67 | levy - ahdb - payers - vote - growers | 344 | 67_levy_ahdb_payers_vote |
110
+ | 68 | of - the - in - were - was | 186 | 68_of_the_in_were |
111
+ | 69 | sugar - beet - growers - yellows - british | 182 | 69_sugar_beet_growers_yellows |
112
+ | 70 | ukraine - food - fertiliser - prices - cf | 172 | 70_ukraine_food_fertiliser_prices |
113
+ | 71 | head - slaughter - average - cattle - volumes | 363 | 71_head_slaughter_average_cattle |
114
+ | 72 | urban - agriculture - food - gardens - community | 176 | 72_urban_agriculture_food_gardens |
115
+ | 73 | insects - insect - larvae - fly - honey | 249 | 73_insects_insect_larvae_fly |
116
+ | 74 | pork - the - covid - survey - of | 176 | 74_pork_the_covid_survey |
117
+ | 75 | meat - protein - based - plant - food | 399 | 75_meat_protein_based_plant |
118
+ | 76 | campaign - wild - boar - pigs - dairy | 287 | 76_campaign_wild_boar_pigs |
119
+ | 77 | party - bill - minister - election - liberal | 183 | 77_party_bill_minister_election |
120
+ | 78 | silage - hay - grass - forage - feed | 194 | 78_silage_hay_grass_forage |
121
+ | 79 | school - food - schools - meals - snap | 285 | 79_school_food_schools_meals |
122
+ | 80 | meal - creditor - collateral - debtor - creditors | 231 | 80_meal_creditor_collateral_debtor |
123
+ | 81 | organic - black - farmers - veterans - program | 142 | 81_organic_black_farmers_veterans |
124
+ | 82 | rabbit - rabbits - deere - uaw - they | 191 | 82_rabbit_rabbits_deere_uaw |
125
+ | 83 | avian - influenza - birds - poultry - flocks | 108 | 83_avian_influenza_birds_poultry |
126
+ | 84 | egg - eggs - free - range - cage | 162 | 84_egg_eggs_free_range |
127
+
128
+ </details>
129
+
130
+ ## Training hyperparameters
131
+
132
+ * calculate_probabilities: False
133
+ * language: None
134
+ * low_memory: False
135
+ * min_topic_size: 10
136
+ * n_gram_range: (1, 1)
137
+ * nr_topics: None
138
+ * seed_topic_list: None
139
+ * top_n_words: 10
140
+ * verbose: True
141
+ * zeroshot_min_similarity: 0.7
142
+ * zeroshot_topic_list: None
143
+
144
+ ## Framework versions
145
+
146
+ * Numpy: 1.26.4
147
+ * HDBSCAN: 0.8.40
148
+ * UMAP: 0.5.7
149
+ * Pandas: 2.2.3
150
+ * Scikit-Learn: 1.2.2
151
+ * Sentence-transformers: 3.4.1
152
+ * Transformers: 4.51.1
153
+ * Numba: 0.60.0
154
+ * Plotly: 5.24.1
155
+ * Python: 3.11.11
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": false,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": true,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b79b8ce6694d3caa18d5f3d2e6931c8e69fb5d63b1cf1d89b58d87acc22e4b86
3
+ size 9704604
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98e1fd193b9822710c70592086215060e7466c97401b12d6b8c549e6345029f0
3
+ size 132184
topics.json ADDED
The diff for this file is too large to render. See raw diff