yohn-maistre (Yose Marthin Giyay)

liked 2 models 8 days ago

konnik/DiffusionPen

Text-to-Image • Updated Oct 23, 2024 • 8

fofr/flux-handwriting

Text-to-Image • Updated Dec 14, 2024 • 124 • • 123

liked 3 Spaces 8 days ago

345

Direct3D S2 V1.0 Demo

💻

Generate 3D models using spatial sparse attention

16

Radiology

🩻

Google I/O 25: Radiology with MedGemma, Gemini Native TTS

4

Multilingual TTS

💻

Generate speech from text in multiple languages

liked 2 Spaces 15 days ago

1.05k

Chatterbox TTS

🍿

Expressive Zeroshot TTS

68

Chatterbox TTS (no-CFG)

🍿

Expressive Zeroshot TTS

liked a model 15 days ago

deepseek-ai/DeepSeek-R1-0528

Text Generation • Updated 16 days ago • 120k • • 1.97k

liked a dataset about 1 month ago

Zyphra/Zyda

Viewer • Updated Jun 19, 2024 • 4.35B • 2.92k • 74

liked a model about 1 month ago

mradermacher/Josiefied-Qwen3-8B-abliterated-v1-i1-GGUF

Updated May 1 • 1.32k • 5

reacted to KnutJaegersberg's post with 👀 about 1 month ago

Post

2748

A Brief Survey of Associations Between Meta-Learning and General AI

The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized:

1. General AI (AGI) and Meta-Learning:
- AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks.
- Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences.

2. Neural Network Design in Meta-Learning:
- Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks.
- Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks.

3. Coevolution:
- Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance.
- Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks.

4. Curiosity in Meta-Learning:
- Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima.
- Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks.

5. Forgetting Mechanisms:
- Forgetting is crucial to avoid memory overload in AI systems

https://arxiv.org/abs/2101.04283

reacted to KnutJaegersberg's post with 👀 about 1 month ago

Post

745

Mimicking Consciousness in LLMs: Ascending the Dimensions of Thought with Recurrent Processing

This blog post explores how **recurrent processing** can transform Large Language Models (LLMs) to mimic aspects of human thought by engaging in iterative feedback loops. Inspired by string theory, the post describes how LLMs can "ascend dimensions" of cognition, progressing through foundational cognitive loops—such as basic cognition, executive functions, and meta-cognition—before advancing into **world simulation**. In this stage, LLMs explore higher dimensions, perceiving non-linear time, simulating branching possibilities, and integrating multiple realities. The interaction between the **Generator** and **Reflective Compass** allows AI systems to refine their outputs iteratively, moving toward a **point attractor** where ideas become coherent and polished. While this process doesn't bestow true consciousness, it offers a compelling imitation of reflective and adaptive thinking, leading to smarter dialogue, enhanced creativity, and more robust problem-solving.

https://huggingface.co/blog/KnutJaegersberg/oscillatory-recurrence-for-llms

reacted to BlinkDL's post with 🔥 about 1 month ago

Post

7926

RWKV-7 "Goose" 0.4B trained w/ ctx4k automatically extrapolates to ctx32k+, and perfectly solves NIAH ctx16k 🤯 100% RNN and attention-free. Only trained on the Pile. No finetuning. Replicable training runs. tested by our community: https://github.com/Jellyfish042/LongMamba

reacted to KnutJaegersberg's post with 👀 about 1 month ago

Post

902

Mining LLM Pretraining Data: Topics, Skills, and Cognitive Patterns

Summary
The technical blog post details an analysis of pretraining data from various Large Language Models (LLMs) like GPT-2, Falcon, and Gemma2. Using text mining techniques including embeddings, clustering, and LLM-based annotation on datasets like OpenWebText, The Pile, and C4, the study identified key patterns.

Findings show the data is dominated by topics like Technology, Politics, Health, Business, and Culture, originating from diverse sources including web scrapes, academic papers, code repositories, and news media. The data reflects the work of professionals primarily in Journalism/Media, Content Creation, Analysis/Research, Academia, and Tech/Engineering. Consequently, LLMs learn corresponding skills (e.g., Research, Critical Thinking, Communication, Domain Expertise) and task representations (e.g., Analysis, Content Creation, Compliance).

The analysis also uncovered distinct writing styles, underlying cognitive frameworks (beliefs, frames, schemas, memes), and common cognitive biases (like Confirmation Bias) embedded in the data. LLM capability progression appears linked to data scale and task frequency, following a power law. The study concludes that LLMs are powerful data-driven simulators whose capabilities and limitations are shaped by the composition and inherent biases of their pretraining corpora, highlighting the importance of data understanding and curation.

https://huggingface.co/blog/KnutJaegersberg/mining-llm-pretraining-data

reacted to BlinkDL's post with 🔥 about 1 month ago

Post

7926

RWKV-7 "Goose" 0.4B trained w/ ctx4k automatically extrapolates to ctx32k+, and perfectly solves NIAH ctx16k 🤯 100% RNN and attention-free. Only trained on the Pile. No finetuning. Replicable training runs. tested by our community: https://github.com/Jellyfish042/LongMamba

reacted to Kseniase's post with 🔥 about 1 month ago

Post

7923

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇

1 reply

·

liked a Space about 1 month ago

30

RWKV HF Space

🐦

Generate text based on your prompts

liked 3 models about 1 month ago

Yose Marthin Giyay

AI & ML interests

Recent Activity

Organizations

yohn-maistre's activity

konnik/DiffusionPen

fofr/flux-handwriting

Direct3D S2 V1.0 Demo

Radiology

Multilingual TTS

Chatterbox TTS

Chatterbox TTS (no-CFG)

deepseek-ai/DeepSeek-R1-0528

Zyphra/Zyda

mradermacher/Josiefied-Qwen3-8B-abliterated-v1-i1-GGUF

RWKV HF Space

microsoft/bitnet-b1.58-2B-4T

sand-ai/MAGI-1

bartowski/Goekdeniz-Guelmez_Josiefied-Qwen3-8B-abliterated-v1-GGUF