Donkey Small

DonkeySmall

AI & ML interests

None yet

Recent Activity

Organizations

OCR-Datasets's profile picture LiteRT Community (FKA TFLite)'s profile picture

DonkeySmall's activity

reacted to AdinaY's post with ๐Ÿ”ฅ 16 days ago
reacted to merve's post with โค๏ธ 16 days ago
view post
Post
2544
emerging trend: models that can understand image + text and generate image + text

don't miss out โคต๏ธ
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! ๐Ÿ˜ฑ

I keep track of all any input โ†’ any output models here https://huggingface.co/collections/merve/any-to-any-models-6822042ee8eb7fb5e38f9b62
  • 1 reply
ยท
reacted to fdaudens's post with ๐Ÿ”ฅ 4 months ago
view post
Post
3131
Is this the best tool to extract clean info from PDFs, handwriting and complex documents yet?

Open source olmOCR just dropped and the results are impressive.

Tested the free demo with various documents, including a handwritten Claes Oldenburg letter. The speed is impressive: 3000 tokens/second on your own GPU - that's 1/32 the cost of GPT-4o ($190/million pages). Game-changer for content extraction and digital archives.

To achieve this, Ai2 trained a 7B vision language model on 260K pages from 100K PDFs using "document anchoring" - combining PDF metadata with page images.

Best part: it actually understands document structure (columns, tables, equations) instead of just jumbling everything together like most OCR tools. Their human eval results back this up.

๐Ÿ‘‰ Try the demo: https://olmocr.allenai.org

Going right into the AI toolkit: JournalistsonHF/ai-toolkit
  • 3 replies
ยท
reacted to schuler's post with โค๏ธ 4 months ago
view post
Post
3431
๐Ÿ”ฎ GPT-3 implemented in pure Free Pascal!
https://github.com/joaopauloschuler/gpt-3-for-pascal

This implementation follows the GPT-3 Small architecture from the landmark paper "Language Models are Few-Shot Learners":
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     Input Layer       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Token & Positional    โ”‚
โ”‚     Embedding         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   12x Transformer     โ”‚
โ”‚      Blocks           โ”‚
โ”‚  - 12 heads           โ”‚
โ”‚  - 768 hidden dims    โ”‚
โ”‚  - 3072 intermediate  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   Output Layer        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Clean Pascal Implementation
for CntLayer := 1 to {Layers=}12 do
begin
  Result.AddTransformerBlockCAI(
    {Heads=}12, 
    {intermediate dimensions=}4*768, 
    {NoForward=}true, 
    {HasNorm=}true, 
    false
  );
end;

reacted to lorraine2's post with ๐Ÿ‘ 6 months ago
view post
Post
1220
New NVIDIA paper: โšก Multi-student Diffusion Distillation for Better One-step Generators โšก

Do you want to make your diffusion models (a) run in a single step, (b) run with a smaller model, and (c) have improved quality simultaneously? Check out our multi-student distillation (MSD) method, which is simple and applicable to most diffusion models! The only catch is now we have to distill (and store) a mixture-of-expert student generators.

Explore the MSD project page to learn more: https://research.nvidia.com/labs/toronto-ai/MSD/

Work led by Yanke Song along with Weili Nie, Karsten Kreis and James Lucas

Check out more work from the Toronto AI Lab here: https://research.nvidia.com/labs/toronto-ai/
  • 1 reply
ยท
reacted to rwightman's post with ๐Ÿš€ 7 months ago
view post
Post
1620
New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.

reacted to gokaygokay's post with ๐Ÿ‘ 11 months ago