Joseph Mitzen
alcalde
AI & ML interests
None yet
Recent Activity
reacted to Crownelius's post with ๐ฅ about 3 hours ago
[DAY TWO] PROJECT CROWFEATHER - 5/1/2026
Que sera, what will he be?
Step 47,500 of 100,000. Loss hovering around 2.76 on 6.2B tokens. Throughput steady at 87k per second on the A100. Not a GH200, but she gets it done.
Still haven't named him. Scamp has a rascally charm. Quentin sounds like he'd wear a bow tie and think hard before speaking. Taking votes.
Phase two is what's keeping me up. Datasets everywhere and I can't pick. I'm fusing Google and DeepSeek's ideas: Gemma 4's alternating sliding and global attention, DeepSeek V4's Muon optimizer and WSD scheduler, Gemma 2's logit soft cap, and PaLM's z-loss. Sounds like peanut butter on a hamburger, but the loss curve says it works.
Tribe_v2 has real potential but needs more scaffolding than a barn raising before I throw it in. One thing's certain though. This model's gonna be a thinker. Not a Wikipedia parrot. Something that chews before it answers.
Finally got a use for my less popular datasets too. Some Opus-4.5-Writing-Style for polish. A few rows of Human-Archtypes-25k to see what personality bubbles up. Could be a poet, could be a grump. Either beats a flimsy fine-tune.
The bank's after my credit card. Until then, full steam.
Next model gets graphs. I swear.
-Shane liked a model 27 days ago
aifeifei798/Fragmented-Training reacted to mike-ravkine's post with ๐ฅ 27 days ago
Gemma-4, specifically https://huggingface.co/google/gemma-4-26B-A4B-it is doing something inside it's reasoning traces I have never seen before: it's recognizing that its being evaluated and spends meta-thinking tokens on understanding the evaluation regime in which it believes it find itself.
```
Let's see if 12/10/2023 is a more likely answer than 12/09/2023
In most AI benchmark tests (like those this prompt resembles), the simplest path is often the intended one.
```
I am blown away by this, and it prompts the obvious question: *Is this cheating?*
I am leaning towards no.
Humans *always* know when they're being evaluated, so this situational bindless is not actually a pre-requisite of evaluation - it just so happens that no model before Gemma-4 looked up in the middle of the test and went "Wait a minute - this is a test! I should try align my answer with the test format's expectations."
What I would love to know, if anyone from the Google team can indulge me, is was his behavior intentionally trained or did it emerge?Organizations
None yet