John Smith's picture

John Smith PRO

John6666

AI & ML interests

None yet

Recent Activity

published a model about 1 hour ago
John6666/alchemix-illustrious-v25-sdxl
published a model about 1 hour ago
John6666/catpony-real-v31-sdxl
View all activity

Organizations

open/ acc's profile picture Solving Real World Problems's profile picture FashionStash Group meeting's profile picture No More Copyright's profile picture

John6666's activity

reacted to Kseniase's post with 👀 about 2 hours ago
view post
Post
507
8 types of RoPE

As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
  • 1 reply
·
reacted to hanzla's post with 🤗 about 2 hours ago
view post
Post
188
👋 Hi all!

For any AI agent, internet search 🔎 is an important tool. However, with APIs like Tavily and Exa, it becomes really difficult to keep up with the cost. In some cases, these Internet APIs cost more than the LLM.

To solve, this, I am making a playwright wrapper API on top of publicly available searXNG instances. This will enable agent applications to fetch internet results for free.

Currently, I have set up a basic GitHub repo, and I will continue developing advanced search features, such as image search 🖼️

Github: https://github.com/HanzlaJavaid/Free-Search/tree/main

🚀 Try the deployed version: https://freesearch.replit.app/docs

If you find this useful, consider starring ⭐️ the GitHub repository to support further development!
reacted to louisbrulenaudet's post with 👀 about 2 hours ago
view post
Post
198
I’ve just released logfire-callback on PyPI, designed to facilitate monitoring of Hugging Face Transformer training loops using Pydantic Logfire 🤗

The callback will automatically log training start with configuration parameters, periodic metrics and training completion ⏱️

Install the package using pip:
pip install logfire-callback

First, ensure you have a Logfire API token and set it as an environment variable:
export LOGFIRE_TOKEN=your_logfire_token

Then use the callback in your training code:
from transformers import Trainer, TrainingArguments
from logfire_callback import LogfireCallback

# Initialize your model, dataset, etc.

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    # ... other training arguments
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    callbacks=[LogfireCallback()]  # Add the Logfire callback here
)

trainer.train()

If you have any feedback, please reach out at @louisbrulenaudet
reacted to Jaward's post with 🔥 about 2 hours ago
reacted to openfree's post with 🔥 about 11 hours ago
view post
Post
1248
Korean Exam Leaderboard: LLMs vs Civil Service and Professional Qualification Exams 📝

openfree/Korean-Exam-Leaderboard

## 📊 What is this leaderboard?
This leaderboard evaluates the performance of various AI models on 22 Korean civil service and professional qualification exams. All scores are converted to a 100-point scale to show how well different LLMs can solve actual Korean civil service and professional qualification tests!

## 🏆 Current Top Performers
- **OpenAI/GPT-o1**: Bar Exam 52.5 points 🥇
- **OpenAI/GPT-4.5**: Bar Exam 49.33 points 🥈
- **OpenAI/GPT-4o**: Bar Exam 49.11 points 🥉
- **deepseek-ai/DeepSeek-R1**: Bar Exam 47.33 points

## 📋 Exams Being Evaluated
The leaderboard includes various Korean civil service and professional qualification exams:
- Korean Bar Exam
- Senior Civil Service Grade 5
- Judicial Service Grade 5
- National Assembly Grade 5
- Judicial Scrivener
- Police Executive Candidate
- And more exams!

## 🤖 Models Being Evaluated
We are testing a variety of models:
- OpenAI: GPT-o1, GPT-o3-mini, GPT-4.5, GPT-4o
- Anthropic: Claude 3.7 Sonnet
- Google: Gemini 2.0 Flash/PRO/Flash Thinking
- Meta: Llama 3.3 70B Instruct, Llama 3.2 90B Vision
- DeepSeek: DeepSeek-R1
- Qwen: QwQ-32B, Qwen2.5 Coder
- Mistral: Mistral-Small-3.1-24B
- NVIDIA models: NVIDIA Nemotron variant models
- And many more!

## 🔍 Why This Matters
Korean civil service exams are known for their high difficulty and comprehensive knowledge assessment. These exams test deep knowledge across legal, administrative, and public service domains. Success in these exams demonstrates not just language understanding but also domain expertise and reasoning ability.

## 🧪 Evaluation Methodology

🔜 Future Plans
We are continuously expanding our test coverage across all 22 exam categories. We will keep updating the scores marked "TBD" so please stay tuned!
  • 2 replies
·
replied to OFT's post 1 day ago
view reply

The Serverless Inference API used to allow 1000 free requests per day and 20000 Pro requests per day...
Well, in general, the total amount of shared resources doesn't increase that much, so I guess it's inevitable that the amount available to each user decreases as the number of good users and malicious attackers increases, but it's decreased by a factor of several minutes to several tens of minutes at once...
The appeal for users who want to use the Inference must have decreased dramatically.
I guess there was some kind of background that was quite difficult to deal with...

I hope that the other Pro benefits will increase to make up for it...
Prices of things vary greatly from country to country and region to region...

reacted to stefan-french's post with 🔥🔥 1 day ago
reacted to onekq's post with 🚀🔥 1 day ago
view post
Post
2739
Folks, let's get ready.🥳 We will be busy soon. 😅🤗https://github.com/huggingface/transformers/pull/36878
reacted to OFT's post with 😔 1 day ago
view post
Post
1640
Today I decided to cancel my PRO subscription for Hugging Face. I had a lot of fun with it but with the current changes to API and allowed limits I think it isn't worth it anymore. So I just turned everything off and cancelled my subscription. It feels like one of these movies scenes where you see an old computerlab and someone putting big white sheets over it and closing the door behind him. I am not going, I am not gone, but watching through the glass window of the door that I just closed.
·
reacted to csabakecskemeti's post with 👍 1 day ago
view post
Post
1690
Managed to get my hands on a 5090FE, it's beefy

| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 12207.44 ± 481.67 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 143.18 ± 0.18 |

Comparison with others GPUs
http://devquasar.com/gpu-gguf-inference-comparison/
reacted to merve's post with 🤗 1 day ago
view post
Post
880
So many open releases at Hugging Face past week 🤯 recapping all here ⤵️ merve/march-21-releases-67dbe10e185f199e656140ae

👀 Multimodal
> Mistral AI released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS)
> with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS)
> SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants
> SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)

💬 LLMs
> NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset
> LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B
> Dataset: Glaive AI released a new reasoning dataset of 22M+ examples
> Dataset: NVIDIA released new helpfulness dataset HelpSteer3
> Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS)
> Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B
> Dataset: GeneralThought-430K is a new reasoning dataset (OS)

🖼️ Image Generation/Computer Vision
> Roboflow released RF-DETR, new real-time sota object detector (OS) 🔥
> YOLOE is a new real-time zero-shot object detector with text and visual prompts 🥹
> Stability AI released Stable Virtual Camera, a new novel view synthesis model
> Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model
> ByteDance released InfiniteYou, new realistic photo generation model
> StarVector is a new 8B model that generates svg from images
> FlexWorld is a new model that expands 3D views (OS)

🎤 Audio
> Sesame released CSM-1B new speech generation model (OS)

🤖 Robotics
> NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset

*OS ones have Apache 2.0 or MIT license
reacted to eaddario's post with 🔥 1 day ago
view post
Post
1450
Squeezing Tensor Bits: the quest for smaller LLMs

An area of personal interest is finding ways to optimize the inference performance of LLMs when deployed in resource-constrained environments like commodity hardware, desktops, laptops, mobiles, edge devices, etc.

The method that I'm using to produce these experimental versions, for example eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF is explained in https://medium.com/@eaddario/squeezing-tensor-bits-the-quest-for-smaller-llms-86b23bd052ca

At a high level it involves using a custom version of the llama-quantize tool to selectively quantize different tensors at different levels. On average a 10% or more reduction with little loss of quality is possible.

There’re two PRs to merge these changes back into the core project but until then, the modified version will be available on GitHub https://github.com/EAddario/llama.cpp/tree/quantize

Would love to hear if you can achieve smaller sizes at higher quality!
  • 2 replies
·
reacted to kpadpa's post with 👀 2 days ago
view post
Post
573
What does this mean and how can I fix it?

"This authentication method does not have sufficient permissions to call Inference Providers on behalf of user..."
  • 2 replies
·
replied to kpadpa's post 2 days ago
reacted to LocalFaceSwap's post with 👀 2 days ago
view post
Post
379


Multilingual FaceFusion One-Click Starter Pack
FaceFusion is an advanced open-source platform focused on facial manipulation technology. It provides a series of powerful tools that allow you to easily implement face editing effects in videos and images.

Core Features:

• Face Swapping: Supports up to 6 dedicated models for high-quality face swapping effects
• Face Enhancement: Offers 10 professional models (such as clear_reality_x4 and ultra_sharp_x4) for clearer and more natural images
• Age Modification: Intelligently adjusts character age effects
• Expression Restoration: Maintains natural expressions, enhancing realism
• Lip Sync Technology: Makes speech in videos more natural and fluid
Video Processing: Includes frame colorization and frame enhancement functions, greatly improving video editing quality and effects.

FaceFusion Interface
FaceFusion Interface
Multilingual Support
• English (English)
• Chinese (中文)
• Hindi (हिन्दी)
• Spanish (Español)
• Arabic (العربية)
• French (Français)
• Portuguese (Português)
• German (Deutsch)
• Japanese (日本語)
• Russian (Русский)
• Turkish (Türkçe)
• Italian (Italiano)
• Korean (한국어)
One-Click Starter Pack Advantages
Advantage Description
✅ Ready to Use Download, extract, and run directly without complex installation
✅ Zero Dependencies No need to install any additional software or dependencies
✅ Completely Offline All functions run locally, protecting your privacy
✅ Built-in Environment Already includes Python environment, face swapping models, and CUDA environment
System Requirements:

• Operating System: Windows 10, Windows 11
• Minimum Graphics Card: NVIDIA GPU (1050 4G or higher)
User Guide
1. Download the Starter Pack
Download Pack

Get the FaceFusion One-Click Starter Pack and start using it immediately

https://aifaceswap.top/



  • 1 reply
·
reacted to nicolay-r's post with 👀 2 days ago
view post
Post
1478
The Concept behind xLSTM has recently turn into the xLSTM-7B model that showcase the performance in the category of the similar-scale Gemma 7B, LLama2 7B, FlaconMamba 7B but with higher performing Inference Kernel

Model: NX-AI/xLSTM-7b
Paper: https://arxiv.org/abs/2503.13427

  • 1 reply
·
reacted to fdaudens's post with 👍 2 days ago
view post
Post
1821
🎥 Just tested Stability AI's Stable Virtual Camera - it turns a single photo into dynamic video with AI-powered camera movements! From static meeting room to cinematic sweeps. 🚀

Try it out: stabilityai/stable-virtual-camera
reacted to burtenshaw's post with 😎 2 days ago
view post
Post
2807
The Hugging Face Agents Course now includes three major agent frameworks!

🔗 https://huggingface.co/agents-course

This includes LlamaIndex, LangChain, and our very own smolagents. We've worked to integrate the three frameworks in distinctive ways so that learners can reflect on when and where to use each.

This also means that you can follow the course if you're already familiar with one of these frameworks, and soak up some of the fundamental knowledge in earlier units.

Hopefully, this makes the agents course as open to as many people as possible.
  • 2 replies
·