@vincentg64 on Hugging Face: "The Rise of Specialized LLMs for Enterprise -https://mltblog.com/3QXXE4I In…"

Post

2299

The Rise of Specialized LLMs for Enterprise -https://mltblog.com/3QXXE4I

In this article, I discuss the main problems of standard LLMs (OpenAI and the likes), and how the new generation of LLMs addresses these issues. The focus is on Enterprise LLMs.

LLMs with Billions of Parameters: Most of the LLMs still fall in that category. The first ones (ChatGPT) appeared around 2022, though Bert is an early precursor. Most recent books discussing LLMs still define them as transformer architecture with deep neural networks (DNNs), costly training, and reliance on GPUs. The training is optimized to predict the next tokens or missing tokens. However, this task is remotely relevant to what modern LLMs now deliver to the user, see here. Yet it requires time and intensive computer resources. Indeed, this type of architecture works best with billions or trillions of tokens. In the end, most of these tokens are noise, requiring smart distillation for performance improvement.

The main issues are:

➡️ Performance: Requires GPU and large corpuses as input data. Re-training is expensive. Hallucinations are still a problem. Fine-tuning is delicate (Blackbox). You need prompt engineering to get the best results. Mixtures of experts (multiple sub-LLMs, DeepSeek) is one step towards improving accuracy.

➡️ Cost: Besides the GPU costs, the pricing model charges by the token, incentivizing vendors to use models with billions of tokens.

Read full article describing more issues and how LLM 2.0 addresses them, at https://mltblog.com/3QXXE4I

More links:

- To receive latest updates: https://mltblog.com/4iTvQec
- About LLM 2.0: https://mltblog.com/4g2sKTv
- PowerPoint presentation: https://mltblog.com/43DYviE
- Our company website: https://mlt

Join the conversation