Post
1370
stop building parser pipelines ππ»
there's a new document parser that is small, fast, Apache 2.0 licensed and is better than all the other ones! π±
echo840/MonkeyOCR is a 3B model that can parse everything (charts, formules, tables etc) in a document π€
> the authors show in the paper that document parsing pipelines often have errors propagating back
> using singular e2e models are better but they're too heavy to use
this model addresses both: it's lighter, faster, stronger π₯
there's a new document parser that is small, fast, Apache 2.0 licensed and is better than all the other ones! π±
echo840/MonkeyOCR is a 3B model that can parse everything (charts, formules, tables etc) in a document π€
> the authors show in the paper that document parsing pipelines often have errors propagating back
> using singular e2e models are better but they're too heavy to use
this model addresses both: it's lighter, faster, stronger π₯