Add BERTopic model
Browse files- README.md +155 -0
- config.json +16 -0
- ctfidf.safetensors +3 -0
- ctfidf_config.json +0 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# BERTopic_topics_agriculture
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("hebashakeel/BERTopic_topics_agriculture")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 86
|
34 |
+
* Number of training documents: 58144
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| -1 | - - - - | 162 | -1____ |
|
42 |
+
| 0 | sauce - you - recipe - add - with | 1 | 0_sauce_you_recipe_add |
|
43 |
+
| 1 | crops - crop - ha - wheat - spring | 3467 | 1_crops_crop_ha_wheat |
|
44 |
+
| 2 | tractor - deere - the - with - it | 3097 | 2_tractor_deere_the_with |
|
45 |
+
| 3 | cows - milk - is - cattle - grazing | 2373 | 3_cows_milk_is_cattle |
|
46 |
+
| 4 | prices - wheat - corn - year - million | 2297 | 4_prices_wheat_corn_year |
|
47 |
+
| 5 | soil - plants - plant - fruit - is | 2325 | 5_soil_plants_plant_fruit |
|
48 |
+
| 6 | canada - ontario - agriculture - canadian - and | 2274 | 6_canada_ontario_agriculture_canadian |
|
49 |
+
| 7 | we - that - to - the - of | 2362 | 7_we_that_to_the |
|
50 |
+
| 8 | market - analysis - report - global - forecast | 4450 | 8_market_analysis_report_global |
|
51 |
+
| 9 | usda - program - and - agriculture - programs | 1103 | 9_usda_program_and_agriculture |
|
52 |
+
| 10 | uk - trade - workers - government - the | 1697 | 10_uk_trade_workers_government |
|
53 |
+
| 11 | corn - percent - moisture - crop - planting | 1406 | 11_corn_percent_moisture_crop |
|
54 |
+
| 12 | poultry - birds - chickens - feed - eggs | 909 | 12_poultry_birds_chickens_feed |
|
55 |
+
| 13 | pig - pigs - npa - producers - said | 719 | 13_pig_pigs_npa_producers |
|
56 |
+
| 14 | mental - health - people - farmers - charity | 559 | 14_mental_health_people_farmers |
|
57 |
+
| 15 | police - crime - dog - rural - said | 595 | 15_police_crime_dog_rural |
|
58 |
+
| 16 | organic - farming - crops - soil - certification | 654 | 16_organic_farming_crops_soil |
|
59 |
+
| 17 | candle - you - your - amazon - candles | 649 | 17_candle_you_your_amazon |
|
60 |
+
| 18 | fish - pond - water - catfish - ponds | 732 | 18_fish_pond_water_catfish |
|
61 |
+
| 19 | kg - lamb - lambs - cattle - beef | 514 | 19_kg_lamb_lambs_cattle |
|
62 |
+
| 20 | scheme - defra - farmers - sfi - land | 601 | 20_scheme_defra_farmers_sfi |
|
63 |
+
| 21 | based - plant - foods - meat - company | 792 | 21_based_plant_foods_meat |
|
64 |
+
| 22 | gene - plant - plants - genetic - research | 731 | 22_gene_plant_plants_genetic |
|
65 |
+
| 23 | we - our - quarter - that - think | 742 | 23_we_our_quarter_that |
|
66 |
+
| 24 | carbon - emissions - farmers - to - and | 538 | 24_carbon_emissions_farmers_to |
|
67 |
+
| 25 | deforestation - forest - forests - eu - brazil | 1191 | 25_deforestation_forest_forests_eu |
|
68 |
+
| 26 | laws - farmers - fence - government - minister | 560 | 26_laws_farmers_fence_government |
|
69 |
+
| 27 | pig - pigs - sows - sow - farrowing | 417 | 27_pig_pigs_sows_sow |
|
70 |
+
| 28 | safety - was - fire - farm - the | 481 | 28_safety_was_fire_farm |
|
71 |
+
| 29 | fish - ocean - marine - the - seafood | 537 | 29_fish_ocean_marine_the |
|
72 |
+
| 30 | land - agricultural - property - title - registration | 436 | 30_land_agricultural_property_title |
|
73 |
+
| 31 | land - farmland - acre - acres - values | 452 | 31_land_farmland_acre_acres |
|
74 |
+
| 32 | snail - snails - read - also - you | 511 | 32_snail_snails_read_also |
|
75 |
+
| 33 | de - que - la - en - el | 302 | 33_de_que_la_en |
|
76 |
+
| 34 | antibiotics - antibiotic - pigs - animal - animals | 375 | 34_antibiotics_antibiotic_pigs_animal |
|
77 |
+
| 35 | blood - vitamin - health - cancer - broccoli | 402 | 35_blood_vitamin_health_cancer |
|
78 |
+
| 36 | disney - rss - websites - turning - url | 449 | 36_disney_rss_websites_turning |
|
79 |
+
| 37 | exports - milk - dairy - beef - year | 330 | 37_exports_milk_dairy_beef |
|
80 |
+
| 38 | we - sheep - ewes - lambs - have | 552 | 38_we_sheep_ewes_lambs |
|
81 |
+
| 39 | your - you - diet - soy - body | 500 | 39_your_you_diet_soy |
|
82 |
+
| 40 | milking - mastitis - teat - cows - cow | 630 | 40_milking_mastitis_teat_cows |
|
83 |
+
| 41 | farming - nfu - scotland - the - will | 290 | 41_farming_nfu_scotland_the |
|
84 |
+
| 42 | my - was - his - he - it | 747 | 42_my_was_his_he |
|
85 |
+
| 43 | ethanol - rail - fuel - e15 - biofuels | 956 | 43_ethanol_rail_fuel_e15 |
|
86 |
+
| 44 | bees - species - bee - study - of | 348 | 44_bees_species_bee_study |
|
87 |
+
| 45 | you - your - to - that - it | 366 | 45_you_your_to_that |
|
88 |
+
| 46 | birds - avian - poultry - influenza - flu | 476 | 46_birds_avian_poultry_influenza |
|
89 |
+
| 47 | swine - asf - disease - virus - fever | 274 | 47_swine_asf_disease_virus |
|
90 |
+
| 48 | scheme - payments - welsh - bps - payment | 282 | 48_scheme_payments_welsh_bps |
|
91 |
+
| 49 | tb - bovine - test - cattle - badger | 459 | 49_tb_bovine_test_cattle |
|
92 |
+
| 50 | tax - business - be - insurance - or | 264 | 50_tax_business_be_insurance |
|
93 |
+
| 51 | agriculture - agricultural - state - of - the | 541 | 51_agriculture_agricultural_state_of |
|
94 |
+
| 52 | tenant - tenants - landlords - tenancy - scheme | 595 | 52_tenant_tenants_landlords_tenancy |
|
95 |
+
| 53 | soil - carbon - water - crop - and | 302 | 53_soil_carbon_water_crop |
|
96 |
+
| 54 | court - epa - rule - law - plaintiffs | 853 | 54_court_epa_rule_law |
|
97 |
+
| 55 | litre - milk - price - dairy - arla | 408 | 55_litre_milk_price_dairy |
|
98 |
+
| 56 | feeder - pounds - steers - cattle - week | 259 | 56_feeder_pounds_steers_cattle |
|
99 |
+
| 57 | you - they - chicken - that - them | 202 | 57_you_they_chicken_that |
|
100 |
+
| 58 | autonomous - robots - robot - technology - the | 417 | 58_autonomous_robots_robot_technology |
|
101 |
+
| 59 | wool - micron - merino - cargill - strike | 300 | 59_wool_micron_merino_cargill |
|
102 |
+
| 60 | agree - cookies - website - privacy - analytics | 201 | 60_agree_cookies_website_privacy |
|
103 |
+
| 61 | farm - you - her - your - to | 231 | 61_farm_you_her_your |
|
104 |
+
| 62 | statements - forward - looking - company - uncertainties | 534 | 62_statements_forward_looking_company |
|
105 |
+
| 63 | woodland - trees - forestry - carbon - planting | 221 | 63_woodland_trees_forestry_carbon |
|
106 |
+
| 64 | protein - oz - powder - ends - powders | 285 | 64_protein_oz_powder_ends |
|
107 |
+
| 65 | closed - at - hogs - down - cents | 203 | 65_closed_at_hogs_down |
|
108 |
+
| 66 | link - place - related - services - agric4profits | 226 | 66_link_place_related_services |
|
109 |
+
| 67 | levy - ahdb - payers - vote - growers | 344 | 67_levy_ahdb_payers_vote |
|
110 |
+
| 68 | of - the - in - were - was | 186 | 68_of_the_in_were |
|
111 |
+
| 69 | sugar - beet - growers - yellows - british | 182 | 69_sugar_beet_growers_yellows |
|
112 |
+
| 70 | ukraine - food - fertiliser - prices - cf | 172 | 70_ukraine_food_fertiliser_prices |
|
113 |
+
| 71 | head - slaughter - average - cattle - volumes | 363 | 71_head_slaughter_average_cattle |
|
114 |
+
| 72 | urban - agriculture - food - gardens - community | 176 | 72_urban_agriculture_food_gardens |
|
115 |
+
| 73 | insects - insect - larvae - fly - honey | 249 | 73_insects_insect_larvae_fly |
|
116 |
+
| 74 | pork - the - covid - survey - of | 176 | 74_pork_the_covid_survey |
|
117 |
+
| 75 | meat - protein - based - plant - food | 399 | 75_meat_protein_based_plant |
|
118 |
+
| 76 | campaign - wild - boar - pigs - dairy | 287 | 76_campaign_wild_boar_pigs |
|
119 |
+
| 77 | party - bill - minister - election - liberal | 183 | 77_party_bill_minister_election |
|
120 |
+
| 78 | silage - hay - grass - forage - feed | 194 | 78_silage_hay_grass_forage |
|
121 |
+
| 79 | school - food - schools - meals - snap | 285 | 79_school_food_schools_meals |
|
122 |
+
| 80 | meal - creditor - collateral - debtor - creditors | 231 | 80_meal_creditor_collateral_debtor |
|
123 |
+
| 81 | organic - black - farmers - veterans - program | 142 | 81_organic_black_farmers_veterans |
|
124 |
+
| 82 | rabbit - rabbits - deere - uaw - they | 191 | 82_rabbit_rabbits_deere_uaw |
|
125 |
+
| 83 | avian - influenza - birds - poultry - flocks | 108 | 83_avian_influenza_birds_poultry |
|
126 |
+
| 84 | egg - eggs - free - range - cage | 162 | 84_egg_eggs_free_range |
|
127 |
+
|
128 |
+
</details>
|
129 |
+
|
130 |
+
## Training hyperparameters
|
131 |
+
|
132 |
+
* calculate_probabilities: False
|
133 |
+
* language: None
|
134 |
+
* low_memory: False
|
135 |
+
* min_topic_size: 10
|
136 |
+
* n_gram_range: (1, 1)
|
137 |
+
* nr_topics: None
|
138 |
+
* seed_topic_list: None
|
139 |
+
* top_n_words: 10
|
140 |
+
* verbose: True
|
141 |
+
* zeroshot_min_similarity: 0.7
|
142 |
+
* zeroshot_topic_list: None
|
143 |
+
|
144 |
+
## Framework versions
|
145 |
+
|
146 |
+
* Numpy: 1.26.4
|
147 |
+
* HDBSCAN: 0.8.40
|
148 |
+
* UMAP: 0.5.7
|
149 |
+
* Pandas: 2.2.3
|
150 |
+
* Scikit-Learn: 1.2.2
|
151 |
+
* Sentence-transformers: 3.4.1
|
152 |
+
* Transformers: 4.51.1
|
153 |
+
* Numba: 0.60.0
|
154 |
+
* Plotly: 5.24.1
|
155 |
+
* Python: 3.11.11
|
config.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": false,
|
3 |
+
"language": null,
|
4 |
+
"low_memory": false,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
1
|
9 |
+
],
|
10 |
+
"nr_topics": null,
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 10,
|
13 |
+
"verbose": true,
|
14 |
+
"zeroshot_min_similarity": 0.7,
|
15 |
+
"zeroshot_topic_list": null
|
16 |
+
}
|
ctfidf.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b79b8ce6694d3caa18d5f3d2e6931c8e69fb5d63b1cf1d89b58d87acc22e4b86
|
3 |
+
size 9704604
|
ctfidf_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:98e1fd193b9822710c70592086215060e7466c97401b12d6b8c549e6345029f0
|
3 |
+
size 132184
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|