Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeep 
posted an update Aug 14
Post
2612
It took Google’s Transformer model from 2017 a whopping $900 to train. 💸

This in contrast to the $191 million Google spent on Gemini Ultra sounds like a bargain! 💰

Gemini Ultra required 50 billion petaFLOPS (one petaFLOP equals one quadrillion FLOPs). 🤖
Compared to OpenAI’s GPT-4, which required 21 billion petaFLOPS, at a cost of $78 million. 💡

2017: Original Transformer Model: $930 [@Google ] 💻
2018: BERT-Large: $3,288 [@Google] 📚
2019: RoBERTa Large: 160k [@Meta] 🌐
2020: GPT-3(175B): $4.32M [@OpenAI] 🧠
2023: Llama 2 70B: $3.93M [@Meta] 🐑
2023: GPT-4: $78.35M [@OpenAI] 🌟
Now, Gemini Ultra: $191.4M [@Google] 🚀

This forms an exponential curve! 🤯

But, why? 🤔
Compute, data, and expertise. All three come at a great cost! ⚙️📊💡

Google recently made Gemini-1.5-Flash fine-tuning free, as it's almost impossible for regular businesses to justify an in-house trained foundational model! 🆓

This barrier of cost is going to result in fewer new foundational models/less competition and more fine-tunes! 📉🔄

Data [Stanford University’s 2024 AI Index Report]: https://aiindex.stanford.edu/report/
Graphic: https://voronoiapp.com/technology/Googles-Gemini-Ultra-Cost-191M-to-Develop--1088

Many thanks to everyone spending tons of resources and open-sourcing the models! 🤗

cool post, my experience with finetuning hasn't been very exciting lol, I trained on Gemma 2B base and it output bunch of random words, at this point i forgot to add special tokens lol. Then I tried the Instruct version via SFT and it answered like normal, so training didn't affect it, dataset was just 5K examples, then I tried DPO, so famous for not needing as much examples lol, and the model this time was affected by the training, but the only thing it learned was to not answer like the rejected text, sadly not the "chosen" text lol. I def learned quite a lot in a very short time, but it is not very motivating, learning costs idle time cause you can't improve until you see results from new training, and you wonder will this thing really learn how I want it If I increase the examples, or is it gonna be a waste of time?

One thing interesting about Google model is that, it is multi model (text, image and video) and has 2M context length.

I believe many companies may be working on next generation of AI.

My idea is like a very smart loss function which calculate what direction the AI / llm should go on just real world 3d environment data and dynamic programming.

Like, let suppose, elephant has some 3d structure.
Like in one language english, we see the 3d structure and say that it is elephant as all people are calling a 3d structure of elephant as elephant.
Like in one language hindi, we see the 3d structure and say that it is hati as all people are calling a 3d structure of elephant as hati.
So in memory, it need to register that this 3d structure, is called hati in hindi or elephant in english.
And should support dynamic programming.

3D structure need to aligned with real world data with a smart loss function and dynamic programming.

And as per my normal understanding (i literally don't have any experience in ML or deep learning, just basic pytorch), companies are working on find new methods.
(I may be saying wrong also ).

This comment has been hidden

深爱的阿斯顿爱上

Wow, the cost of training these models has skyrocketed! It’s fascinating to see how expenses have grown from Google’s Transformer model to the latest Gemini Ultra. The investment in compute, data, and expertise is staggering. As for fine-tuning models, Google’s decision to make Gemini-1.5-Flash free is a game-changer for businesses that can’t afford the massive costs of training from scratch. If you need any help managing finances related to tech projects, uw credit union customer service here https://www.pissedconsumer.com/company/uw-credit-unionl/customer-service.html has been fantastic—they might offer some useful advice or support