slone
/

fastText-LID-323

Text Classification

language-identification

Model card Files Files and versions Community

cointegrated commited on Sep 20, 2022

Commit

0f716f8

·

1 Parent(s): 209ad11

Create README.md

Files changed (1) hide show

README.md +25 -0

README.md ADDED Viewed

	@@ -0,0 +1,25 @@

+---
+library_name: fasttext
+tags:
+- language-identification
+---
+This is a fastText-based language classification model from the paper "The first neural machine translation system for the Erzya language".
+It supports 323 languages used in Wikipedia (as of July 2022), and has extended support of the Erzya (`myv`) and Moksha (`mdf`) languages.
+Example usage:
+```Python
+import fasttext
+import urllib.request
+import os
+model_path = 'lid.323.ftz'
+url = 'https://huggingface.co/slone/fastText-LID-323/resolve/main/lid.323.ftz'
+if not os.path.exists(model_path):
+    urllib.request.urlretrieve(url, model_path)  # or just download it manually
+model = fasttext.load_model(model_path)
+languages, scores = model.predict("эрзянь кель", k=3)  # k is the number of returned hypotheses
+```
+The model was trained on texts of articles randomly sampled from Wikipedia. It works better with sentences and longer texts than with words, and may be sensitive to noise.