ancatmara commited on
Commit
1802ac2
·
verified ·
1 Parent(s): 8d273c5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc
3
+ language:
4
+ - ga
5
+ - sga
6
+ - mga
7
+ - ghc
8
+ pipeline_tag: feature-extraction
9
+ ---
10
+
11
+ ### Training Data
12
+
13
+ **Historical Irish FastText models** were trained on Old, Middle, Early Modern, Classical Modern and pre-reform Modern Irish texts from St. Gall Glosses, Würzburg Glosses, [CELT](https://celt.ucc.ie/publishd.html) and the book subcorpus [Historical Irish Corpus](http://corpas.ria.ie/index.php?fsg_function=1). The training data spans ca. 550 — 1926 and covers a wide variety of genres, such as bardic poetry, native Irish stories, translations and adaptations of continental epic and romance, annals, genealogies, grammatical and medical tracts, diaries, and religious writing. Due to code-switching in some texts, the models have some Latin in the vocabulary.
14
+
15
+ ### Available Models
16
+
17
+ There are 3 models in this familily:
18
+
19
+ - **Cased**: `historical_irish_cased_ft_100_5_2.txt`
20
+ - **Lowercase**: `historical_irish_lower_ft_100_5_2.txt`
21
+ - **Lowercase with initial mutations removed**: `historical_irish_lower_demutated_ft_100_5_2.txt`
22
+
23
+ All models are trained with the same hyperparameters (`emb_size=100, window=5, min_count=2, n_epochs=100`) and saved as `KeyedVectors` (see [Gensim Documentation](https://radimrehurek.com/gensim/models/keyedvectors.html)).
24
+
25
+ ### Usage
26
+
27
+ ```python
28
+ from gensim.models import KeyedVectors
29
+ from huggingface_hub import hf_hub_download
30
+
31
+ model_path = hf_hub_download(repo_id="ancatmara/historical-irish-ft-vectors", filename="historical_irish_lower_demutated_ft_100_5_2.txt")
32
+ model = KeyedVectors.load_word2vec_format(model_path, binary=False)
33
+
34
+ model.similar_by_word('coíca')
35
+ ```
36
+
37
+ Out:
38
+ ```python
39
+ >>> [('coícat', 0.6620370149612427),
40
+ ('coícait', 0.6584151983261108),
41
+ ('coíctu', 0.550497829914093),
42
+ ('trícha', 0.537602424621582),
43
+ ('cóeca', 0.531631350517273),
44
+ ('cóecta', 0.5148215889930725),
45
+ ('cóecait', 0.5108019113540649),
46
+ ('tríchad', 0.5059043765068054),
47
+ ('tríchaid', 0.5049244165420532),
48
+ ('cóecat', 0.5042815804481506)]
49
+ ```