Mahwiz Khalil's picture

Mahwiz Khalil

mahwizzzz

AI & ML interests

Low-Resource NLP

Recent Activity

published a dataset 1 day ago
mahwizzzz/tiny-llama-urdu-ds
updated a model 1 day ago
mahwizzzz/CAT
reacted to Kseniase's post with 👍 1 day ago
12 Foundational AI Model Types Let’s refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets): 1. LLM - Large Language Model (GPT, LLaMA) -> https://huggingface.co/papers/2402.06196 + history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.) 2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) https://huggingface.co/papers/2410.20011 Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs 3. VLM - Vision-Language Model (CLIP, Flamingo) -> https://huggingface.co/papers/2405.17247 Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both 4. MLLM - Multimodal Large Language Model (Gemini) -> https://huggingface.co/papers/2306.13549 A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc. 5. LAM - Large Action Model (InstructDiffusion, RT-2) -> https://huggingface.co/papers/2412.10047 Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc. Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN below👇 Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
View all activity

Organizations

Proxima AI Company's profile picture 2damn's profile picture Social Post Explorers's profile picture Hugging Face Party @ PyTorch Conference's profile picture Hugging Face MCP Course's profile picture

mahwizzzz's activity

reacted to Kseniase's post with 👍 1 day ago
view post
Post
4341
12 Foundational AI Model Types

Let’s refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):

1. LLM - Large Language Model (GPT, LLaMA) -> Large Language Models: A Survey (2402.06196)
+ history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs
It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.)

2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) A Survey of Small Language Models (2410.20011)
Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs

3. VLM - Vision-Language Model (CLIP, Flamingo) -> An Introduction to Vision-Language Modeling (2405.17247)
Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both

4. MLLM - Multimodal Large Language Model (Gemini) -> A Survey on Multimodal Large Language Models (2306.13549)
A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc.

5. LAM - Large Action Model (InstructDiffusion, RT-2) -> Large Action Models: From Inception to Implementation (2412.10047)
Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc.

Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN below👇

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
  • 2 replies
·
published a model 3 days ago
published a model 22 days ago
published a model about 1 month ago
published a model about 2 months ago
updated a model about 2 months ago
published a model about 2 months ago