Jordan Legg's picture

Jordan Legg PRO

takarajordan

AI & ML interests

Chief AI Officer @takara.ai. Diffusion, Inference optimisation and all things MultiModal.

Recent Activity

published a dataset 13 days ago
takarajordan/personas-instruction-4-1
updated a dataset 13 days ago
takarajordan/personas-instruction-4-1
updated a dataset 13 days ago
takarajordan/distilabel-example
View all activity

Organizations

Social Post Explorers's profile picture Cohere Labs Community's profile picture takara.ai's profile picture Hugging Face Discord Community's profile picture Intelligent Estate's profile picture open/ acc's profile picture Donut Earthers 🍩's profile picture

takarajordan's activity

posted an update 19 days ago
view post
Post
348
Cool to see the new model lightonai/Reason-ModernColBERT

Made with late interaction I'd love to recreate the dataset to see a proper apache 2.0 version!

reacted to clem's post with ❀️ about 1 month ago
view post
Post
4064
What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?
Β·
replied to clem's post about 1 month ago
replied to their post 2 months ago
replied to their post 2 months ago
view reply

@ThomasTheMaker it's just the raw attention and transformer architecture in golang designed for serverless so performance will definitely be less than ggml and llama.cpp since it's not accelerated by GPU's but if you're into edge AI CPU only, this is the first, only and best way to compute attention.

Quantization can definitely be supported as it's just a math model!

posted an update 2 months ago
view post
Post
626
🎌 Two months in, https://github.com/takara-ai/go-attention has passed 429 stars on GitHub.

We built this library at takara.ai to bring attention mechanisms and transformer layers to Go β€” in a form that's lightweight, clean, and dependency-free.

We’re proud to say that every part of this project reflects what we set out to do.

- Pure Go β€” no external dependencies, built entirely on the Go standard library
- Core support for DotProductAttention and MultiHeadAttention
- Full transformer layers with LayerNorm, feed-forward networks, and residual connections
- Designed for edge, embedded, and real-time environments where simplicity and performance matter

Thank you to everyone who has supported this so far β€” the stars, forks, and feedback mean a lot.
  • 4 replies
Β·
posted an update 2 months ago
view post
Post
1592
AI research over coffee β˜•οΈ
No abstracts, just bullet points.
Start your day here: https://tldr.takara.ai
  • 1 reply
Β·
replied to samchain's post 3 months ago
view reply

This is a pretty big update for sure. The models have improved significantly which is great for everyone involved, especially the end user. Those datasets look very promising as well!

replied to wassemgtk's post 3 months ago
view reply

Sounds interesting, I’ll check it out!

replied to etemiz's post 3 months ago
view reply

This is a really interesting post. I’ve been looking at the DeepSeek models for sure. This shows a pretty nice improvement, would love to see some example changes!

replied to chansung's post 3 months ago
posted an update 3 months ago
view post
Post
1873
Takara takes 3rd place in the {tech:munich} AI hackathon with Fudeno!

A little over 2 weeks ago @aldigobbler and I set out to create the largest MultiModal SVG dataset ever created, we succeeded in this and when I was in Munich, Germany I took it one step further and made an entire app with it!

We fine-tuned Mistral Small, made a Next.JS application and blew some minds, taking 3rd place out of over 100 hackers. So cool!

If you want to see the dataset, please see below.

takara-ai/fudeno-instruct-4M
replied to their post 5 months ago
view reply

Sir, basically I want to create a generative AI university helpdesk chatbot, and for this, I have created datasets myself and also fine-tuned models, but I am not getting satisfactory results. Sir, if you have time, could you please check my datasets in my profile and help me understand how I can improve my dataset and work on it so that my task gets completed? I would be very grateful to you.

I would enhance your dataset to use multi turn conversations if you can at all for llama2 you could do something like this:

<s>[INST] Is the BS Physics program a part-time or full-time course? [/INST] The BS Physics program is a full-time undergraduate program that requires regular on-campus attendance. </s><s>[INST] How many units per semester? [/INST] A typical semester load consists of 15-18 units. </s>

hope this helps! Again, please reach out to me on discord here: takarajordan_82155

replied to s3nh's post 6 months ago
reacted to s3nh's post with ❀️ 6 months ago
view post
Post
2158
Welcome back,

Small Language Models Enthusiasts and GPU Poor oss enjoyers lets connect.
Just created an organization which main target is to have fun with smaller models tuneable on consumer range GPUs, feel free to join and lets have some fun, much love ;3

SmolTuners
Β·
replied to merve's post 6 months ago
reacted to merve's post with πŸš€ 6 months ago
view post
Post
2849
Aya by Cohere For AI can now see! πŸ‘€

C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B 🌱 works on 8 languages! πŸ—£οΈ

The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo

Dataset maya-multimodal/pretrain

Model maya-multimodal/maya πŸ‘
kudos @nahidalam and team
  • 1 reply
Β·
reacted to merve's post with πŸš€ 6 months ago
view post
Post
3619
Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models 🧢

✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2
✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench

The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️

Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
they evaluate sampling strategies, scaling laws for models and datasets, video representation and more!
> The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled πŸ“ˆ scaling dataset has diminishing returns for smaller models
> They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal
> They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2
they find google/siglip-so400m-patch14-384 to be most powerful πŸ”₯
> they also compare freezing different parts of models, training all stages with some frozen parts give the best yield

They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models πŸ”₯
Β·
replied to sayakpaul's post 6 months ago
reacted to sayakpaul's post with πŸš€ 6 months ago
view post
Post
2270
In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.
  • 1 reply
Β·