Tools for de-toxifying public domain data, especially multilingual and historical text data and data with OCR errors.
PleIAs
company
AI & ML interests
Open Science LLMs
Organization Card
PleIAs is a French private AI Lab training the next generation of Language Models for document processing.
PleIAs is committed to open science and has coordinated the release of some of the largest open corpus for pre-training.
For more information, visit our website : https://pleias.fr/
Contact us : [email protected]
models
15
PleIAs/Pleias-369000
Updated
•
4
PleIAs/Pleias-checkpoint
Updated
PleIAs/Pleias-export
Updated
•
22
PleIAs/journaux-lm-v1
Updated
•
35
PleIAs/OCRonos-Vintage-CT2
Updated
•
6
PleIAs/celadon
Text Classification
•
Updated
•
78
•
8
PleIAs/Cassandre-RAG
Updated
•
178
•
6
PleIAs/Segmentext
Token Classification
•
Updated
•
130
•
12
PleIAs/Florence-PDF
Updated
•
15
•
2
PleIAs/OCRonos-Vintage
Text Generation
•
Updated
•
1.35k
•
66
datasets
42
PleIAs/post-ocr
Viewer
•
Updated
•
618k
•
2.12k
•
4
PleIAs/new-tokenized-annealing
Updated
•
184
PleIAs/statistics_compiled
Viewer
•
Updated
•
809M
•
2
PleIAs/ToxicCommons
Viewer
•
Updated
•
1.96M
•
26
•
4
PleIAs/Persian-PD
Viewer
•
Updated
•
1.38k
•
21
PleIAs/Arabic-PD
Viewer
•
Updated
•
1.82k
•
12
PleIAs/Bengali-PD
Viewer
•
Updated
•
3.23k
•
21
PleIAs/Urdu-PD
Viewer
•
Updated
•
2.28k
•
19
PleIAs/Sanskrit-PD
Viewer
•
Updated
•
3.91k
•
17
PleIAs/Catalan-PD
Preview
•
Updated
•
16