@singhsidhukuldeep on Hugging Face: "It took Google’s Transformer model from 2017 a whopping $900 to train. 💸…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Aug 14, 2024

Post

2650

It took Google’s Transformer model from 2017 a whopping $900 to train. 💸

This in contrast to the $191 million Google spent on Gemini Ultra sounds like a bargain! 💰

Gemini Ultra required 50 billion petaFLOPS (one petaFLOP equals one quadrillion FLOPs). 🤖
Compared to OpenAI’s GPT-4, which required 21 billion petaFLOPS, at a cost of $78 million. 💡

2017: Original Transformer Model: $930 [@Google ] 💻
2018: BERT-Large: $3,288 [@Google] 📚
2019: RoBERTa Large: 160k [@Meta] 🌐
2020: GPT-3(175B): $4.32M [@OpenAI] 🧠
2023: Llama 2 70B: $3.93M [@Meta] 🐑
2023: GPT-4: $78.35M [@OpenAI] 🌟
Now, Gemini Ultra: $191.4M [@Google] 🚀

This forms an exponential curve! 🤯

But, why? 🤔
Compute, data, and expertise. All three come at a great cost! ⚙️📊💡

Google recently made Gemini-1.5-Flash fine-tuning free, as it's almost impossible for regular businesses to justify an in-house trained foundational model! 🆓

This barrier of cost is going to result in fewer new foundational models/less competition and more fine-tunes! 📉🔄

Data [Stanford University’s 2024 AI Index Report]: https://aiindex.stanford.edu/report/
Graphic: https://voronoiapp.com/technology/Googles-Gemini-Ultra-Cost-191M-to-Develop--1088

Many thanks to everyone spending tons of resources and open-sourcing the models! 🤗

wonderboy

Aug 14, 2024

cool post, my experience with finetuning hasn't been very exciting lol, I trained on Gemma 2B base and it output bunch of random words, at this point i forgot to add special tokens lol. Then I tried the Instruct version via SFT and it answered like normal, so training didn't affect it, dataset was just 5K examples, then I tried DPO, so famous for not needing as much examples lol, and the model this time was affected by the training, but the only thing it learned was to not answer like the rejected text, sadly not the "chosen" text lol. I def learned quite a lot in a very short time, but it is not very motivating, learning costs idle time cause you can't improve until you see results from new training, and you wonder will this thing really learn how I want it If I increase the examples, or is it gonna be a waste of time?

nib12345

Aug 15, 2024

One thing interesting about Google model is that, it is multi model (text, image and video) and has 2M context length.

I believe many companies may be working on next generation of AI.

My idea is like a very smart loss function which calculate what direction the AI / llm should go on just real world 3d environment data and dynamic programming.

Like, let suppose, elephant has some 3d structure.
Like in one language english, we see the 3d structure and say that it is elephant as all people are calling a 3d structure of elephant as elephant.
Like in one language hindi, we see the 3d structure and say that it is hati as all people are calling a 3d structure of elephant as hati.
So in memory, it need to register that this 3d structure, is called hati in hindi or elephant in english.
And should support dynamic programming.

3D structure need to aligned with real world data with a smart loss function and dynamic programming.

And as per my normal understanding (i literally don't have any experience in ML or deep learning, just basic pytorch), companies are working on find new methods.
(I may be saying wrong also ).

leonardozi

Aug 16, 2024

This comment has been hidden

leonardozi

Aug 16, 2024

深爱的阿斯顿爱上

deleted

Aug 22, 2024

This comment has been hidden

In this post