Spaces:
Sleeping
Sleeping
Constantin Orasan
commited on
Commit
路
a514775
1
Parent(s):
538dc0d
Updated the app and the models
Browse files- app.py +16 -3
- bpe-ECB.model +0 -0
- bpe-EMEA.model +0 -0
app.py
CHANGED
|
@@ -5,7 +5,11 @@ examples = [
|
|
| 5 |
"Hello, world!",
|
| 6 |
"European Central bank has announced cuts.",
|
| 7 |
"This document is a summary of the European Public Assessment Report (EPAR).",
|
| 8 |
-
"En el presente documento se resume el Informe P煤blico Europeo de Evaluaci贸n (EPAR)."
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
|
| 11 |
def greet(sentence):
|
|
@@ -25,9 +29,18 @@ def greet(sentence):
|
|
| 25 |
"</div>")
|
| 26 |
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
demo = gr.Interface(fn=greet, inputs="text", outputs="html",
|
| 29 |
-
examples=examples, title="SentencePiece
|
| 30 |
-
description=
|
| 31 |
cache_examples="lazy",
|
| 32 |
concurrency_limit=30,
|
| 33 |
css=".output {font-size: 150%;}")
|
|
|
|
| 5 |
"Hello, world!",
|
| 6 |
"European Central bank has announced cuts.",
|
| 7 |
"This document is a summary of the European Public Assessment Report (EPAR).",
|
| 8 |
+
"En el presente documento se resume el Informe P煤blico Europeo de Evaluaci贸n (EPAR).",
|
| 9 |
+
"Solution for injection",
|
| 10 |
+
"How is Abilify used?",
|
| 11 |
+
"驴Para qu茅 se utiliza Abilify?",
|
| 12 |
+
"Tratado de la Uni贸n Europea y Tratado de Funcionamiento de la Uni贸n Europea"]
|
| 13 |
|
| 14 |
|
| 15 |
def greet(sentence):
|
|
|
|
| 29 |
"</div>")
|
| 30 |
|
| 31 |
|
| 32 |
+
description = """
|
| 33 |
+
Demo for SentencePiece. The model is trained on ECB and EMEA datasets in order to see the differences in tokenization.
|
| 34 |
+
The ECB dataset contains financial news articles, while the EMEA dataset contains medical articles.
|
| 35 |
+
The texts included in the training are in English and Spanish, for this reason the tokenisation will work best for these languages.
|
| 36 |
+
You can try some other languages and see how the tokenisation works. However, make sure you use only Latin characters.
|
| 37 |
+
The model did not see any non-Latin characters during training, so the results for languages that do not use Latin characters will be unpredictable.
|
| 38 |
+
Both variants are trained with 5000 vocab size.
|
| 39 |
+
"""
|
| 40 |
+
|
| 41 |
demo = gr.Interface(fn=greet, inputs="text", outputs="html",
|
| 42 |
+
examples=examples, title="SentencePiece",
|
| 43 |
+
description=description,
|
| 44 |
cache_examples="lazy",
|
| 45 |
concurrency_limit=30,
|
| 46 |
css=".output {font-size: 150%;}")
|
bpe-ECB.model
CHANGED
|
Binary files a/bpe-ECB.model and b/bpe-ECB.model differ
|
|
|
bpe-EMEA.model
CHANGED
|
Binary files a/bpe-EMEA.model and b/bpe-EMEA.model differ
|
|
|