Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post 4 days ago
9 Multimodal Chain-of-Thought methods How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer. Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: 1. KAM-CoT -> https://huggingface.co/papers/2401.12863 This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy 2. Multimodal Visualization-of-Thought (MVoT) -> https://huggingface.co/papers/2501.07542 Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality 3. Compositional CoT (CCoT) -> https://huggingface.co/papers/2311.17076 Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks 4. URSA -> https://huggingface.co/papers/2501.04686 Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification 5. MM-Verify -> https://huggingface.co/papers/2502.13383 Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning 6. Duty-Distinct CoT (DDCoT) -> https://huggingface.co/papers/2310.16436 Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process 7. Multimodal-CoT from Amazon Web Services -> https://huggingface.co/papers/2302.00923 A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs 8. Graph-of-Thought (GoT) -> https://huggingface.co/papers/2305.16582 This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks More in the comments👇
posted an update 4 days ago
9 Multimodal Chain-of-Thought methods How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer. Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: 1. KAM-CoT -> https://huggingface.co/papers/2401.12863 This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy 2. Multimodal Visualization-of-Thought (MVoT) -> https://huggingface.co/papers/2501.07542 Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality 3. Compositional CoT (CCoT) -> https://huggingface.co/papers/2311.17076 Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks 4. URSA -> https://huggingface.co/papers/2501.04686 Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification 5. MM-Verify -> https://huggingface.co/papers/2502.13383 Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning 6. Duty-Distinct CoT (DDCoT) -> https://huggingface.co/papers/2310.16436 Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process 7. Multimodal-CoT from Amazon Web Services -> https://huggingface.co/papers/2302.00923 A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs 8. Graph-of-Thought (GoT) -> https://huggingface.co/papers/2305.16582 This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks More in the comments👇
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity

replied to their post 4 days ago
posted an update 4 days ago
view post
Post
1727
9 Multimodal Chain-of-Thought methods

How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.

Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source:

1. KAM-CoT -> KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (2401.12863)
This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy

2. Multimodal Visualization-of-Thought (MVoT) -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality

3. Compositional CoT (CCoT) -> Compositional Chain-of-Thought Prompting for Large Multimodal Models (2311.17076)
Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks

4. URSA -> URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)
Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification

5. MM-Verify -> MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification (2502.13383)
Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning

6. Duty-Distinct CoT (DDCoT) -> DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models (2310.16436)
Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process

7. Multimodal-CoT from Amazon Web Services -> Multimodal Chain-of-Thought Reasoning in Language Models (2302.00923)
A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs

8. Graph-of-Thought (GoT) -> Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models (2305.16582)
This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks

More in the comments👇
  • 1 reply
·
reacted to their post with 🚀❤️👀 9 days ago
view post
Post
4975
8 types of RoPE

As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
  • 1 reply
·
replied to their post 11 days ago
posted an update 11 days ago
view post
Post
4975
8 types of RoPE

As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
  • 1 reply
·
replied to their post 18 days ago
view reply
  1. Adaptive attention -> https://huggingface.co/papers/1612.01887
    Dynamically adjusts its attention behavior – when or whether to use attention, or how broad the attention should be.

  2. Scaled Dot-Product attention -> https://huggingface.co/papers/2404.16629
    Attention scores are computed by the dot product between a query vector and a key vector, and then divided by the square root of the key dimension before applying softmax.

  3. Additive attention -> https://huggingface.co/papers/1409.0473
    Computes attention scores using a small feed-forward that combines the query and key vectors.

  4. Global attention -> https://huggingface.co/papers/1508.04025
    Is a form of soft attention that considers all possible positions in the input sequence.

  5. Local attention -> https://huggingface.co/papers/1508.04025
    It's a compromise between hard and soft attention. The model only attends to a restricted subset of inputs at a given step.

  6. Sparse attention -> https://huggingface.co/papers/1602.02068
    Applies patterns that limit what each word can focus on.

  7. Hierarchical attention -> https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf
    Model first applies attention at the word level and produces a sentence representation. Then it applies another attention at the sentence level to determine which sentences are important for the document representation.

  8. Temporal attention -> https://huggingface.co/papers/1502.08029
    Deals with time-series or sequential data, allowing a model to focus on particular time steps or time segments.

posted an update 18 days ago
view post
Post
7714
15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel.​ The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇
  • 1 reply
·
reacted to their post with 🔥🧠 24 days ago
view post
Post
4050
5 New implementations of Diffusion Models

Diffusion models are widely used for image and video generation but remain underexplored in text generation, where autoregressive models (ARMs) dominate. Unlike ARMs, which produce tokens sequentially, diffusion models iteratively refine noise through denoising steps, offering greater flexibility and speed.
Recent advancements show a shift toward using diffusion models in place of, or alongside, ARMs. Researchers also combine strengths from both methods and integrate autoregressive concepts into diffusion.

Here are 5 new implementations of diffusion models:

1. Mercury family of diffusion LLMs (dLLMs) by Inception Labs -> https://www.inceptionlabs.ai/news
It applies diffusion to text and code data, enabling sequence generation 10x faster than today's top LLMs. Now available Mercury Coder can run at over 1,000 tokens/sec on NVIDIA H100s.

2. Diffusion of Thoughts (DoT) -> Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models (2402.07754)
Integrates diffusion models with Chain-of-Thought. DoT allows reasoning steps to diffuse gradually over time. This flexibility enables balancing between reasoning quality and computational cost.

3. LLaDA -> Large Language Diffusion Models (2502.09992)
Shows diffusion models' potential in replacing ARMs. Trained with pre-training and SFT, LLaDA masks tokens, predicts them via a Transformer, and optimizes a likelihood bound. LLaDA matches key LLM skills, and surpasses GPT-4o in reversal poetry.

4. LanDiff -> The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation (2503.04606)
This hybrid text-to-video model combines autoregressive and diffusion paradigms, introducing a semantic tokenizer, an LM for token generation, and a streaming diffusion model. LanDiff outperforms models like Sora.

5. General Interpolating Discrete Diffusion (GIDD) -> Generalized Interpolating Discrete Diffusion (2503.04482)
A flexible noising process with a novel diffusion ELBO enables combining masking and uniform noise, allowing diffusion models to correct mistakes, where ARMs struggle.
  • 3 replies
·
reacted to clem's post with đź‘Ť 25 days ago
view post
Post
7264
I was chatting with @peakji , one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).

He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.

As a thank you to the community, he shared 100 invite code first-come first serve, just use “HUGGINGFACE” to get access!
·
replied to their post 25 days ago
posted an update 25 days ago
view post
Post
4050
5 New implementations of Diffusion Models

Diffusion models are widely used for image and video generation but remain underexplored in text generation, where autoregressive models (ARMs) dominate. Unlike ARMs, which produce tokens sequentially, diffusion models iteratively refine noise through denoising steps, offering greater flexibility and speed.
Recent advancements show a shift toward using diffusion models in place of, or alongside, ARMs. Researchers also combine strengths from both methods and integrate autoregressive concepts into diffusion.

Here are 5 new implementations of diffusion models:

1. Mercury family of diffusion LLMs (dLLMs) by Inception Labs -> https://www.inceptionlabs.ai/news
It applies diffusion to text and code data, enabling sequence generation 10x faster than today's top LLMs. Now available Mercury Coder can run at over 1,000 tokens/sec on NVIDIA H100s.

2. Diffusion of Thoughts (DoT) -> Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models (2402.07754)
Integrates diffusion models with Chain-of-Thought. DoT allows reasoning steps to diffuse gradually over time. This flexibility enables balancing between reasoning quality and computational cost.

3. LLaDA -> Large Language Diffusion Models (2502.09992)
Shows diffusion models' potential in replacing ARMs. Trained with pre-training and SFT, LLaDA masks tokens, predicts them via a Transformer, and optimizes a likelihood bound. LLaDA matches key LLM skills, and surpasses GPT-4o in reversal poetry.

4. LanDiff -> The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation (2503.04606)
This hybrid text-to-video model combines autoregressive and diffusion paradigms, introducing a semantic tokenizer, an LM for token generation, and a streaming diffusion model. LanDiff outperforms models like Sora.

5. General Interpolating Discrete Diffusion (GIDD) -> Generalized Interpolating Discrete Diffusion (2503.04482)
A flexible noising process with a novel diffusion ELBO enables combining masking and uniform noise, allowing diffusion models to correct mistakes, where ARMs struggle.
  • 3 replies
·
reacted to their post with 🚀🔥 about 1 month ago
view post
Post
6167
9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors
posted an update about 1 month ago
view post
Post
6167
9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors
reacted to their post with ❤️🔥➕ about 1 month ago
view post
Post
9656
8 Free Sources about AI Agents:

Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:

1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents
Covers agents, their functions, tool use and how they differ from models

2. "Agents in the Long Game of AI. Computational Cognitive Modeling for Trustworthy, Hybrid AI" book by Marjorie McShane, Sergei Nirenburg, and Jesse English -> https://direct.mit.edu/books/oa-monograph/5833/Agents-in-the-Long-Game-of-AIComputational
Explores building AI agents, using Hybrid AI, that combines ML with knowledge-based reasoning

3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw
Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more

4. AI Agents Course from Hugging Face -> https://huggingface.co/learn/agents-course/en/unit0/introduction
Agents' theory and practice to learn how to build them using top libraries and tools

5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html
Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty

6. "Intelligent Agents: Theory and Practice" book by Michael Wooldridge -> https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/ker95/ker95-html.html
A fascinating option to dive into how agents were seen in 1995 and explore their theory, architectures and agent languages

7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> @Kseniase
We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge

8. Our collection "8 Free Sources to Master Building AI Agents" -> https://www.turingpost.com/p/building-ai-agents-sources
·