Thang203 commited on
Commit
99e3382
·
verified ·
1 Parent(s): c4cc8aa

Add BERTopic model

Browse files
Files changed (6) hide show
  1. README.md +234 -0
  2. config.json +17 -0
  3. ctfidf.bin +3 -0
  4. ctfidf_config.json +0 -0
  5. topic_embeddings.bin +3 -0
  6. topics.json +0 -0
README.md ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # general_nlp_research_paper
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("Thang203/general_nlp_research_paper")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 165
34
+ * Number of training documents: 11000
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | language - models - model - data - translation | 10 | -1_language_models_model_data |
42
+ | 0 | question - answer - questions - answering - question answering | 3488 | 0_question_answer_questions_answering |
43
+ | 1 | speech - speech recognition - acoustic - recognition - asr | 513 | 1_speech_speech recognition_acoustic_recognition |
44
+ | 2 | summarization - summaries - abstractive - summary - extractive | 345 | 2_summarization_summaries_abstractive_summary |
45
+ | 3 | clinical - medical - biomedical - extraction - notes | 337 | 3_clinical_medical_biomedical_extraction |
46
+ | 4 | translation - machine translation - parallel - machine - nmt | 258 | 4_translation_machine translation_parallel_machine |
47
+ | 5 | emotion - emotions - emotional - emotion recognition - affective | 211 | 5_emotion_emotions_emotional_emotion recognition |
48
+ | 6 | word - embeddings - word embeddings - similarity - vector | 164 | 6_word_embeddings_word embeddings_similarity |
49
+ | 7 | bert - probing - tasks - pretraining - pretrained | 145 | 7_bert_probing_tasks_pretraining |
50
+ | 8 | relation - relation extraction - extraction - relations - distant | 138 | 8_relation_relation extraction_extraction_relations |
51
+ | 9 | hate - hate speech - offensive - detection - speech | 134 | 9_hate_hate speech_offensive_detection |
52
+ | 10 | arabic - sanskrit - kurdish - transliteration - rules | 118 | 10_arabic_sanskrit_kurdish_transliteration |
53
+ | 11 | aspect - sentiment - sentiment analysis - aspectbased sentiment - aspectbased | 118 | 11_aspect_sentiment_sentiment analysis_aspectbased sentiment |
54
+ | 12 | morphological - inflection - languages - morphology - morphological analysis | 112 | 12_morphological_inflection_languages_morphology |
55
+ | 13 | ner - named entity - named - entity recognition - named entity recognition | 107 | 13_ner_named entity_named_entity recognition |
56
+ | 14 | multimodal - image - visual - captions - images | 101 | 14_multimodal_image_visual_captions |
57
+ | 15 | discourse - discourse relation - discourse parsing - implicit discourse - discourse relations | 98 | 15_discourse_discourse relation_discourse parsing_implicit discourse |
58
+ | 16 | chinese - segmentation - word segmentation - chinese word - chinese word segmentation | 89 | 16_chinese_segmentation_word segmentation_chinese word |
59
+ | 17 | crosslingual - bilingual - embeddings - crosslingual word - word embeddings | 84 | 17_crosslingual_bilingual_embeddings_crosslingual word |
60
+ | 18 | entropy - law - languages - script - frequency | 79 | 18_entropy_law_languages_script |
61
+ | 19 | argument - argumentation - arguments - argumentative - mining | 77 | 19_argument_argumentation_arguments_argumentative |
62
+ | 20 | nmt - neural machine - neural machine translation - translation - machine translation | 77 | 20_nmt_neural machine_neural machine translation_translation |
63
+ | 21 | parsing - dependency - dependency parsing - parser - transitionbased | 76 | 21_parsing_dependency_dependency parsing_parser |
64
+ | 22 | syntactic - rnns - grammatical - language models - agreement | 71 | 22_syntactic_rnns_grammatical_language models |
65
+ | 23 | generation - datatotext - text generation - datatotext generation - text | 71 | 23_generation_datatotext_text generation_datatotext generation |
66
+ | 24 | topic - topics - topic models - topic modeling - lda | 71 | 24_topic_topics_topic models_topic modeling |
67
+ | 25 | knowledge - knowledge graph - entities - relation - graph | 68 | 25_knowledge_knowledge graph_entities_relation |
68
+ | 26 | gender - bias - gender bias - biases - embeddings | 66 | 26_gender_bias_gender bias_biases |
69
+ | 27 | story - stories - story generation - narrative - plot | 65 | 27_story_stories_story generation_narrative |
70
+ | 28 | dialogue - dialog - user - taskoriented - agent | 65 | 28_dialogue_dialog_user_taskoriented |
71
+ | 29 | transformer - attention - selfattention - heads - layers | 65 | 29_transformer_attention_selfattention_heads |
72
+ | 30 | srl - semantic role - role labeling - semantic role labeling - role | 64 | 30_srl_semantic role_role labeling_semantic role labeling |
73
+ | 31 | change - semantic change - diachronic - lexical semantic - semantic | 64 | 31_change_semantic change_diachronic_lexical semantic |
74
+ | 32 | sense - wsd - disambiguation - word sense - sense disambiguation | 64 | 32_sense_wsd_disambiguation_word sense |
75
+ | 33 | paraphrase - paraphrases - paraphrase generation - paraphrasing - paraphrase identification | 63 | 33_paraphrase_paraphrases_paraphrase generation_paraphrasing |
76
+ | 34 | linking - entity linking - entity - el - entities | 62 | 34_linking_entity linking_entity_el |
77
+ | 35 | authorship - attribution - authorship attribution - authors - stylistic | 60 | 35_authorship_attribution_authorship attribution_authors |
78
+ | 36 | tracking - state tracking - dialogue state - state - dialogue | 54 | 36_tracking_state tracking_dialogue state_state |
79
+ | 37 | nli - natural language inference - language inference - inference - natural language | 54 | 37_nli_natural language inference_language inference_inference |
80
+ | 38 | act - dialogue act - dialogue - dialog act - dialog | 51 | 38_act_dialogue act_dialogue_dialog act |
81
+ | 39 | commonsense - reasoning - commonsense reasoning - knowledge - commonsense knowledge | 49 | 39_commonsense_reasoning_commonsense reasoning_knowledge |
82
+ | 40 | crosslingual - multilingual - transfer - crosslingual transfer - mbert | 49 | 40_crosslingual_multilingual_transfer_crosslingual transfer |
83
+ | 41 | coreference - resolution - coreference resolution - mention - pronoun | 49 | 41_coreference_resolution_coreference resolution_mention |
84
+ | 42 | legal - patent - court - case - legal domain | 48 | 42_legal_patent_court_case |
85
+ | 43 | dialect - identification - language identification - dialect identification - arabic | 47 | 43_dialect_identification_language identification_dialect identification |
86
+ | 44 | amr - amr parsing - parsing - meaning representation - meaning | 46 | 44_amr_amr parsing_parsing_meaning representation |
87
+ | 45 | adversarial - adversarial examples - attacks - attack - examples | 46 | 45_adversarial_adversarial examples_attacks_attack |
88
+ | 46 | health - mental - mental health - social media - media | 45 | 46_health_mental_mental health_social media |
89
+ | 47 | offensive - offensive language - subtask - offensive language identification - hostile | 45 | 47_offensive_offensive language_subtask_offensive language identification |
90
+ | 48 | semantic parsing - parsing - semantic - compositional generalization - logical | 44 | 48_semantic parsing_parsing_semantic_compositional generalization |
91
+ | 49 | recurrent - language modeling - rnn - lstm - modeling | 44 | 49_recurrent_language modeling_rnn_lstm |
92
+ | 50 | sql - texttosql - database - queries - query | 44 | 50_sql_texttosql_database_queries |
93
+ | 51 | indian - smt - translation - machine translation - machine | 43 | 51_indian_smt_translation_machine translation |
94
+ | 52 | style - style transfer - transfer - text style - text style transfer | 43 | 52_style_style transfer_transfer_text style |
95
+ | 53 | poetry - poems - lyrics - music - verse | 43 | 53_poetry_poems_lyrics_music |
96
+ | 54 | codeswitching - cs - codeswitched - codemixed - monolingual | 43 | 54_codeswitching_cs_codeswitched_codemixed |
97
+ | 55 | sentiment - polarity - sentiment analysis - analysis - prior polarity | 41 | 55_sentiment_polarity_sentiment analysis_analysis |
98
+ | 56 | sarcasm - sarcasm detection - sarcastic - detection - irony | 41 | 56_sarcasm_sarcasm detection_sarcastic_detection |
99
+ | 57 | gec - grammatical error - grammatical error correction - error correction - correction | 40 | 57_gec_grammatical error_grammatical error correction_error correction |
100
+ | 58 | intent - intent detection - slot - slot filling - filling | 40 | 58_intent_intent detection_slot_slot filling |
101
+ | 59 | temporal - events - temporal relations - expressions - temporal relation | 39 | 59_temporal_events_temporal relations_expressions |
102
+ | 60 | adaptation - domain - domain adaptation - indomain - translation | 37 | 60_adaptation_domain_domain adaptation_indomain |
103
+ | 61 | stance - stance detection - detection - tweets - veracity | 37 | 61_stance_stance detection_detection_tweets |
104
+ | 62 | codemixed - sentiment - sentiment analysis - analysis - semeval2020 | 36 | 62_codemixed_sentiment_sentiment analysis_analysis |
105
+ | 63 | keyphrase - keyphrases - keyphrase extraction - keyphrase generation - extraction | 35 | 63_keyphrase_keyphrases_keyphrase extraction_keyphrase generation |
106
+ | 64 | nmt - subword - translation - vocabulary - neural machine translation | 35 | 64_nmt_subword_translation_vocabulary |
107
+ | 65 | calculus - logic - semantics - proof - typelogical | 35 | 65_calculus_logic_semantics_proof |
108
+ | 66 | simplification - text simplification - sentence simplification - sentence - ts | 35 | 66_simplification_text simplification_sentence simplification_sentence |
109
+ | 67 | annotation - xml - formats - tei - standards | 35 | 67_annotation_xml_formats_tei |
110
+ | 68 | correction - spelling - ocr - spelling correction - errors | 33 | 68_correction_spelling_ocr_spelling correction |
111
+ | 69 | sentiment - sentiment classification - sentiment analysis - classification - analysis | 33 | 69_sentiment_sentiment classification_sentiment analysis_classification |
112
+ | 70 | complexity - readability - lexical complexity - assessment - readability assessment | 31 | 70_complexity_readability_lexical complexity_assessment |
113
+ | 71 | postediting - ape - automatic postediting - mt - translation | 30 | 71_postediting_ape_automatic postediting_mt |
114
+ | 72 | gender - gender bias - bias - translation - pronouns | 30 | 72_gender_gender bias_bias_translation |
115
+ | 73 | tagger - tagging - taggers - pos - partofspeech | 30 | 73_tagger_tagging_taggers_pos |
116
+ | 74 | meeting - summarization - podcast - abstractive - summaries | 30 | 74_meeting_summarization_podcast_abstractive |
117
+ | 75 | domain - domain adaptation - adaptation - domains - target domain | 30 | 75_domain_domain adaptation_adaptation_domains |
118
+ | 76 | documentlevel - context - translation - nmt - neural machine | 29 | 76_documentlevel_context_translation_nmt |
119
+ | 77 | text classification - classification - convolutional - networks - convolutional neural | 29 | 77_text classification_classification_convolutional_networks |
120
+ | 78 | news - fake - fake news - clickbait - satirical | 29 | 78_news_fake_fake news_clickbait |
121
+ | 79 | grammars - grammar - stochastic - contextfree - contextfree grammars | 29 | 79_grammars_grammar_stochastic_contextfree |
122
+ | 80 | ontology - rogets - thesaurus - wordnet - concepts | 29 | 80_ontology_rogets_thesaurus_wordnet |
123
+ | 81 | vietnamese - ner - named entity recognition - entity recognition - named entity | 28 | 81_vietnamese_ner_named entity recognition_entity recognition |
124
+ | 82 | claim - verification - evidence - claims - fever | 27 | 82_claim_verification_evidence_claims |
125
+ | 83 | metrics - nlg - language generation - evaluation - natural language generation | 27 | 83_metrics_nlg_language generation_evaluation |
126
+ | 84 | responses - response - response generation - adversarial - generation | 27 | 84_responses_response_response generation_adversarial |
127
+ | 85 | robustness - nmt - translation - neural machine - neural machine translation | 27 | 85_robustness_nmt_translation_neural machine |
128
+ | 86 | revision - editing - seq2seq - revisions - rewriting | 27 | 86_revision_editing_seq2seq_revisions |
129
+ | 87 | phonological - phonology - finitestate - reduplication - prosody | 26 | 87_phonological_phonology_finitestate_reduplication |
130
+ | 88 | geolocation - location - geographic - twitter - names | 26 | 88_geolocation_location_geographic_twitter |
131
+ | 89 | event - event extraction - extraction - event types - argument | 26 | 89_event_event extraction_extraction_event types |
132
+ | 90 | mt - human - translation - evaluation - parity | 25 | 90_mt_human_translation_evaluation |
133
+ | 91 | arabic - sentiment - sentiment analysis - arabic sentiment - arabic sentiment analysis | 25 | 91_arabic_sentiment_sentiment analysis_arabic sentiment |
134
+ | 92 | emoji - emojis - emoji prediction - emoticons - sentiment | 25 | 92_emoji_emojis_emoji prediction_emoticons |
135
+ | 93 | constituency - latent tree - parsing - constituency parsing - tree learning | 25 | 93_constituency_latent tree_parsing_constituency parsing |
136
+ | 94 | spatial - instructions - 3d - environment - robot | 24 | 94_spatial_instructions_3d_environment |
137
+ | 95 | persona - responses - personality - traits - consistency | 23 | 95_persona_responses_personality_traits |
138
+ | 96 | matching - response - retrievalbased - chatbots - multiturn | 23 | 96_matching_response_retrievalbased_chatbots |
139
+ | 97 | entity - entity typing - typing - finegrained entity - type | 22 | 97_entity_entity typing_typing_finegrained entity |
140
+ | 98 | math - word problems - math word - word problem - problems | 21 | 98_math_word problems_math word_word problem |
141
+ | 99 | bert - multilingual - multilingual bert - bert model - multilingual models | 21 | 99_bert_multilingual_multilingual bert_bert model |
142
+ | 100 | financial - stock - market - news - price | 21 | 100_financial_stock_market_news |
143
+ | 101 | video - multimodal - sceneaware - dialog - visual | 21 | 101_video_multimodal_sceneaware_dialog |
144
+ | 102 | sense - multisense - senses - word sense - word | 21 | 102_sense_multisense_senses_word sense |
145
+ | 103 | game - games - agents - communication - pragmatic | 21 | 103_game_games_agents_communication |
146
+ | 104 | graph - amrtotext - amrtotext generation - amr - graphs | 20 | 104_graph_amrtotext_amrtotext generation_amr |
147
+ | 105 | nmt - translation - neural machine translation - neural machine - machine translation | 20 | 105_nmt_translation_neural machine translation_neural machine |
148
+ | 106 | normalization - text normalization - normalizing - text - historical | 20 | 106_normalization_text normalization_normalizing_text |
149
+ | 107 | privacy - policies - anonymization - deidentification - vague | 20 | 107_privacy_policies_anonymization_deidentification |
150
+ | 108 | beam - beam search - search - decoding - constraints | 20 | 108_beam_beam search_search_decoding |
151
+ | 109 | hypernymy - distributional - pathbased - hypernymy detection - hypernyms | 19 | 109_hypernymy_distributional_pathbased_hypernymy detection |
152
+ | 110 | political - bias - articles - news - ideology | 19 | 110_political_bias_articles_news |
153
+ | 111 | generative adversarial - gans - gan - generative - generative adversarial networks | 18 | 111_generative adversarial_gans_gan_generative |
154
+ | 112 | pos - tagger - tagging - pos tagging - codemixed | 17 | 112_pos_tagger_tagging_pos tagging |
155
+ | 113 | humor - humorous - headlines - funny - puns | 17 | 113_humor_humorous_headlines_funny |
156
+ | 114 | metaphor - metaphors - metaphoric - metaphorical - literal | 17 | 114_metaphor_metaphors_metaphoric_metaphorical |
157
+ | 115 | codeswitching - cs - asr - speech - speech recognition | 17 | 115_codeswitching_cs_asr_speech |
158
+ | 116 | event coreference - event - coreference - coreference resolution - resolution | 17 | 116_event coreference_event_coreference_coreference resolution |
159
+ | 117 | reviews - review - helpfulness - opinion - online reviews | 17 | 117_reviews_review_helpfulness_opinion |
160
+ | 118 | covid19 - tweets - wnut2020 - twitter - informative | 17 | 118_covid19_tweets_wnut2020_twitter |
161
+ | 119 | anaphora - resolution - pronouns - pronoun - anaphora resolution | 17 | 119_anaphora_resolution_pronouns_pronoun |
162
+ | 120 | bilingual - dictionary - comparability - termhood - comparable corpora | 17 | 120_bilingual_dictionary_comparability_termhood |
163
+ | 121 | discourse - translation - pronouns - dp - discourse phenomena | 17 | 121_discourse_translation_pronouns_dp |
164
+ | 122 | color - colour - naming - colors - character embeddings | 16 | 122_color_colour_naming_colors |
165
+ | 123 | nonautoregressive - autoregressive - nat - nonautoregressive neural - decoding | 16 | 123_nonautoregressive_autoregressive_nat_nonautoregressive neural |
166
+ | 124 | nlg - natural language generation - language generation - spoken dialogue - generation | 16 | 124_nlg_natural language generation_language generation_spoken dialogue |
167
+ | 125 | crowdsourcing - workers - examples - protocols - data collection | 16 | 125_crowdsourcing_workers_examples_protocols |
168
+ | 126 | african - revolution - african languages - technology - african language | 16 | 126_african_revolution_african languages_technology |
169
+ | 127 | grading - scoring - essay - short answer - essay scoring | 16 | 127_grading_scoring_essay_short answer |
170
+ | 128 | treebanks - treebank - parsing - crosslingual - dependency | 16 | 128_treebanks_treebank_parsing_crosslingual |
171
+ | 129 | reviews - summarization - review - product - summaries | 16 | 129_reviews_summarization_review_product |
172
+ | 130 | gaze - reading - eyetracking - eye - behaviour | 16 | 130_gaze_reading_eyetracking_eye |
173
+ | 131 | nlp - natural - natural language - nlg - language | 15 | 131_nlp_natural_natural language_nlg |
174
+ | 132 | news translation - news translation task - translation task - news - submission | 14 | 132_news translation_news translation task_translation task_news |
175
+ | 133 | eat - meaning - semantics - formal - theory | 14 | 133_eat_meaning_semantics_formal |
176
+ | 134 | sign - sign language - sl - asl - deaf | 14 | 134_sign_sign language_sl_asl |
177
+ | 135 | multitask - labels - mtl - sequence - multitask learning | 14 | 135_multitask_labels_mtl_sequence |
178
+ | 136 | phylogenetic - cognate - indoeuropean - historical linguistics - indoeuropean language | 14 | 136_phylogenetic_cognate_indoeuropean_historical linguistics |
179
+ | 137 | syntax - translation - neural machine translation - neural machine - nmt | 14 | 137_syntax_translation_neural machine translation_neural machine |
180
+ | 138 | explanations - explanation - explainers - nl explanations - faithful | 14 | 138_explanations_explanation_explainers_nl explanations |
181
+ | 139 | slot - slot filling - filling - slots - nlu | 13 | 139_slot_slot filling_filling_slots |
182
+ | 140 | personality - traits - profiling - author profiling - author | 13 | 140_personality_traits_profiling_author profiling |
183
+ | 141 | preposition - prepositions - supersenses - prepositional - supersense | 13 | 141_preposition_prepositions_supersenses_prepositional |
184
+ | 142 | scientific - application areas - application - areas - literature | 13 | 142_scientific_application areas_application_areas |
185
+ | 143 | russian - similarity - semantic similarity - similarity task - semantic similarity task | 13 | 143_russian_similarity_semantic similarity_similarity task |
186
+ | 144 | code - source code - documentation - code generation - programming | 13 | 144_code_source code_documentation_code generation |
187
+ | 145 | semantic web - translation - machinetranslation - machine translation - technologies | 12 | 145_semantic web_translation_machinetranslation_machine translation |
188
+ | 146 | knowledge - knowledgegrounded - response - dialogue generation - dialogue | 12 | 146_knowledge_knowledgegrounded_response_dialogue generation |
189
+ | 147 | sentence - sentence representations - sentence embeddings - transfer - tasks | 12 | 147_sentence_sentence representations_sentence embeddings_transfer |
190
+ | 148 | distributional - distributional semantics - semantics - functional distributional - functional distributional semantics | 12 | 148_distributional_distributional semantics_semantics_functional distributional |
191
+ | 149 | compositionality - sc - distributional - sememe knowledge - phrase | 12 | 149_compositionality_sc_distributional_sememe knowledge |
192
+ | 150 | ud - annotation - treebank - treebanks - universal dependencies | 12 | 150_ud_annotation_treebank_treebanks |
193
+ | 151 | acronym - abbreviation - acronyms - abbreviations - disambiguation | 12 | 151_acronym_abbreviation_acronyms_abbreviations |
194
+ | 152 | propaganda - task 11 - 11 - propaganda detection - semeval2020 task | 12 | 152_propaganda_task 11_11_propaganda detection |
195
+ | 153 | open - open information extraction - open information - information extraction - tuples | 12 | 153_open_open information extraction_open information_information extraction |
196
+ | 154 | hebrew - bible - intertextuality - restoration - homographs | 11 | 154_hebrew_bible_intertextuality_restoration |
197
+ | 155 | typological - typology - typological features - languages - linguistic typology | 11 | 155_typological_typology_typological features_languages |
198
+ | 156 | label - text classification - multilabel - labels - classification | 11 | 156_label_text classification_multilabel_labels |
199
+ | 157 | variational - latent - variational autoencoders - variational autoencoder - autoencoders | 11 | 157_variational_latent_variational autoencoders_variational autoencoder |
200
+ | 158 | crisis - messages - disasters - disaster - emergency | 11 | 158_crisis_messages_disasters_disaster |
201
+ | 159 | adversarial - rc - rc models - robustness - comprehension | 11 | 159_adversarial_rc_rc models_robustness |
202
+ | 160 | tree - treelstm - trees - tree structures - syntactic | 11 | 160_tree_treelstm_trees_tree structures |
203
+ | 161 | headline - headlines - news - headline generation - synthetic news | 11 | 161_headline_headlines_news_headline generation |
204
+ | 162 | reasoning - kg - paths - kgs - multihop | 11 | 162_reasoning_kg_paths_kgs |
205
+ | 163 | text classification - classification - runtime - fasttext - text | 10 | 163_text classification_classification_runtime_fasttext |
206
+
207
+ </details>
208
+
209
+ ## Training hyperparameters
210
+
211
+ * calculate_probabilities: False
212
+ * language: english
213
+ * low_memory: False
214
+ * min_topic_size: 10
215
+ * n_gram_range: (1, 1)
216
+ * nr_topics: None
217
+ * seed_topic_list: None
218
+ * top_n_words: 10
219
+ * verbose: True
220
+ * zeroshot_min_similarity: 0.7
221
+ * zeroshot_topic_list: None
222
+
223
+ ## Framework versions
224
+
225
+ * Numpy: 1.25.2
226
+ * HDBSCAN: 0.8.33
227
+ * UMAP: 0.5.6
228
+ * Pandas: 2.0.3
229
+ * Scikit-Learn: 1.2.2
230
+ * Sentence-transformers: 2.6.1
231
+ * Transformers: 4.38.2
232
+ * Numba: 0.58.1
233
+ * Plotly: 5.15.0
234
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": false,
3
+ "language": "english",
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": true,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null,
16
+ "embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
17
+ }
ctfidf.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6170503c784d41556bc2c88b8bbd736e18ac6857e55450edafa3b1309bfacae7
3
+ size 8094339
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3536e3c34439f78af23d7137749e159fd9932b20d61df388e2cac85abc9444a2
3
+ size 254729
topics.json ADDED
The diff for this file is too large to render. See raw diff