See my GitHub repository, at https://github.com/VincentGranville/Large-Language-Models
Vincent Granville PRO
AI & ML interests
Recent Activity
Organizations
vincentg64's activity
And in our case (see https://mltblog.com/4fPuvTb), with no training and zero parameter! By zero parameter, I mean no neural network parameters (the typical 40B you see in many LLMs, that stands for 40 billion parameters also called weights). We do indeed have a few intuitive parameters that you can fine-tune in real time.
Tips to make your system hallucination-free:
- We use sub-LLMs specific to each topic (part of a large corpus), thus mixing unrelated items is much less likely to happen.
- In the base version, the output returned is unaltered rather than reworded. The latter can cause hallucinations.
- It shows a high-level structured summary first, with category, tags, agents attached to each item; the user can click on the items he is most interested in based on summary, reducing the risk of misfit.
- The user can specify agents, tags or categories in the UI, it's much more than a prompt box. He can also include negative keywords, joint keywords that must appear jointly in the corpus, put a higher weight on the first keyword in the prompt, or favor the most recent material in the results.
- Python libraries can cause hallucinations. For instance, project and projected have the same stem. We use these libraries but with workarounds to avoid these issues that can lead to hallucinations.
- We return a relevancy score to each item in the prompt results, ranging from 0 to 10. If we cannot find highly relevant information in your augmented corpus, despite using a synonyms dictionary, the score will be low, telling you that the system knows that this particular item is not great. You can choose to no show items with a low score, though sometimes they contain unexpectedly interesting information (the reason to keep them).
- We show links and references, all coming from reliable sources. The user can double-check in case of doubt.
- We suggest alternate keywords to use in your next prompts (related concept)
In my most recent articles and books, I discussed our radically different approach to building enterprise LLMs from scratch, without training, hallucinations, prompt engineering or GPU, while delivering higher accuracy at a much lower cost, safely, at scale and at lightning speed (in-memory). It is also far easier to adapt to specific corpuses and business needs, to fine-tune, and modify, giving you full control over all the components, based on a small number of intuitive parameters and explainable AI.
Now, I assembled everything into a well-structured 9-page document (+ 20 pages of code) with one-click links to the sources including our internal library, deep retrieval PDF parser, real-life input corpus, backend tables, and so on. Access to all this is offered only to those acquiring the paper. Our technology is so different from standard LLMs that we call it LLM 2.0.
This technical paper is much more than a compact version of past documentation. It highlights new features such as un-stemming to boost exhaustivity, multi-index, relevancy score vectors, multi-level chunking, various multi-token types (some originating from the knowledge graph) and how they are leveraged, as well as pre-assigned multimodal agents. I also discuss the advanced UI — far more than a prompt box — with unaltered concise structured output, suggested keywords for deeper dive, agent or category selection to increase focus, and relevancy scores. Of special interest: simplified, improved architecture, and upgrade to process word associations in large chunks (embeddings) even faster.
➡️ See how to get a free copy, at https://mltblog.com/4fPuvTb
In this article, I share my latest Gen AI and LLM advances, featuring innovative approaches radically different from both standard AI and classical ML/NLP. The focus is on doing better with less, using efficient architectures, new algorithms and evaluation metrics. It originates from research that I started long ago. It gained significant momentum in the last two years. See background and history at https://mltblog.com/4g2sKTv.
OpenAI, Perplexity, Anthropic, Llama and others typically follow the trend and implement solutions very similar to mines within 3 to 6 months after I publish new milestones. For instance, multi-tokens, knowledge graph tokens, multi-indexes, real-time fine-tuning, mixtures of experts, LLM routers, small enterprise sub-LLMs, prompt distillation, relevancy scoring engine, deep contextual retrieval, optimum agentic chunking, and modern UI instead of the basic prompt box. I keep adding new features all the time, staying ahead of competition.
➡️ Read full article with links to GitHub, at https://mltblog.com/3DsyZSq
Here I illustrate my two most recent interactions with AI-powered GPT. It was an awful failure, a lot worse than before GenAI. Indeed, I had to revert back to old Google search to get help. This is typical of what hundreds of millions of users now experience every day.
➡️ First example:
I get payments from Stripe. I asked how I can pay someone, as opposed to getting paid, as I had a contact asking me to pay him with Stripe. After 30 mins of prompts to AI support, I got nowhere. In the end I decided to pay my contact using a different platform. I could not figure out how to a meaningful answer: see featured image.
➡️ Second example:
A VC guy I started to interact with sent me a few messages, but I never received any of them. I tried to contact my email provider, but was faced with a GenAI bot to answer the following precise question: his email address is xyz, mine is abc, his messages do not even show up in my spam box, and I did not block their domain name; how to fix this? After receiving irrelevant answers, I ask point blank: can I chat with a real human? Again, irrelevant answers, no matter how I phrase my question. In the end I told my contact to send messages to an alternate email address.
➡️ Read the article explaining causes, offering solutions, at https://mltblog.com/41BcGDY
LLM 2.0 has been brewing for a long time. Now it is becoming mainstream and replacing LLM 1.0, for its ability to deliver better ROI to enterprise customers, at a much lower cost. Much of the past resistance towards its adoption lied in one question: how can you possibly do better with no training, no GPU, and zero parameter? It is as if everyone believed that multi-billion parameter models are mandatory, due to a long tradition.
However, this machinery is used to train models on tasks irrelevant to the purpose, relying on self-reinforcing evaluation metrics that fail to capture desirable qualities such as depth, conciseness or exhaustivity. Not that standard LLMs are bad: I use OpenAI and Perplexity a lot for code generation, writing my investor deck, and even to answer advanced number theory questions. But their strength comes from all the sub-systems they rely upon, not from the central deep neural network. Remove or simplify that part, then you get a product far easier to maintain and upgrade, costing far less in development, and if done right, delivering more accurate results without hallucination, without prompt engineering and without the need to double-check the answers. Many times, errors are quite subtle and can be overlooked.
Good LLM 1.0 still saves a lot of time but requires significant vigilance. There is plenty of room for improvement, but more parameters and Blackbox DNNs have shown their limitations.
➡️ To read full article and learn how LLM 2.0 changes the game, see https://mltblog.com/4g2sKTv
I get many questions about the radically different LLM technology that I started to develop 2 years ago. Initially designed to retrieve information that I could no longer find on the Internet, not with search, OpenAI, Gemini, Perplexity or any other platform, it evolved to become the ideal solution for professional enterprise users. Now agentic and multimodal, automating business tasks at scale with lightning speed, consistently delivering real ROI, bypassing the costs associated to training and GPU with zero weight and explainable AI, tested and developed for Fortune 100 company.
So, what is behind the scenes, how different is it compared to LLM 1.0 (GPT and the likes), how can it be hallucination-free, what makes it a game changer, how did it eliminate prompt engineering, how does it handle knowledge graphs without neural networks, and what are the other benefits?
In a nutshell, the performance is due to building a robust architecture from the ground up and at every step, offering far more than a prompt box, relying on home-made technology rather than faulty Python libraries, and designed by enterprise and tech visionaries for enterprise users.
Contextual smart crawling to retrieve underlying taxonomies, augmented taxonomies, long contextual multi-tokens, real-time fine-tunning, increased security, LLM router with specialized sub-LLMs, an in-memory database architecture of its own to efficiently handle sparsity in keyword associations, contextual backend tables, agents built on the backend, mapping between prompt and corpus keywords, customized PMI rather than cosine similarity, variable-length embeddings, and the scoring engine (the new “PageRank” of LLMs) returning results along with the relevancy scores, are but a few of the differentiators.
➡️ Read the full article, at https://mltblog.com/49ksOLL
The technology described here boosts exhaustivity and structuredness in LLM prompt results, efficiently exploiting the knowledge graph and contextual structure present in any professional or enterprise corpus. The case study deals with public financial reports from Nvidia, available as PDF documents.
In this article, I discuss the preprocessing steps used to turn a PDF repository into input suitable for LLMs. It includes contextual chunking, indexing text entities with hierarchical multi-index system, and retrieving contextual elements including lists, sub-lists, fonts (type, color, and size), images and tables – some not detected by standard Python libraries. I also discuss how to build additional contextual information such as agents, categories, or tags, to add to text entities to further improve any LLM architecture, and prompt results.
What I mean here is that traditional LLMs are trained on tasks irrelevant to what they will do for the user. It’s like training a plane to efficiently operate on the runway, but not to fly. In short, it is almost impossible to train an LLM, and evaluating is just as challenging. Then, training is not even necessary. In this article, I dive on all these topics.
➡️ Training LLMs for the wrong tasks
Since the beginnings with Bert, training an LLM typically consists of predicting the next tokens in a sentence, or removing some tokens and then have your algorithm fill the blanks. You optimize the underlying deep neural networks to perform these supervised learning tasks as well as possible. Typically, it involves growing the list of tokens in the training set to billions or trillions, increasing the cost and time to train. However, recently, there is a tendency to work with smaller datasets, by distilling the input sources and token lists. After all, out of one trillion tokens, 99% are noise and do not contribute to improving the results for the end-user; they may even contribute to hallucinations. Keep in mind that human beings have a vocabulary of about 30,000 keywords, and that the number of potential standardized prompts on a specialized corpus (and thus the number of potential answers) is less than a million.
➡️ Read the full articles at https://mltblog.com/3CEJ9Pt, also featuring issues with evaluation metrics and the benefits of untrained LLMs.
Read full article at https://mltblog.com/4ftTko9
In this article, you will find my PowerPoint presentation describing the most recent features of xLLM, a CPU-based, full context, secure multi-LLM with real-time fine-tuning & explainable AI. It includes several new diagrams describing the innovative architecture, upcoming developments, new features and different use cases.
Content
➡️Enterprise use case: corporate corpus of a Fortune 100 company.
➡️Original version dealing with large websites such as Wolfram and Wikipedia. Comparison with OpenAI.
➡️xLLM for clustering and predictive analytics. Use case: unstructured text (articles) from a media company.
➡️Integration of our game-changing NoGAN tabular data synthesizer, and state-of-the-art model evaluation technology.
➡️Integration of external tools, for instance to solve math problems.
➡️Upcoming version for auto-indexing and cataloging large repositories.
➡️Demo: enterprise xLLM in action, featuring the modern user interface (full web API, not just a prompt box) with command menu and numerous options not found in other LLMs, including debugging, suggested prompts, choice of agents, and fine-tuning in real time.
➡️Relevancy score displayed to the user, for each returned item. I call it the new PageRank for RAG/LLM, using a technology radically different from Google search. See picture.
New startup coming soon!
We will be launching soon (January) a new startup focusing on GenAI at scale for Enterprises; xLLM will be part of the offer with exclusive features. We are looking for early adopters to partner with us on the Journey. The co-founder and CEO, to be announced soon, is Senior Director of GenAI at a Fortune 100 company, where the first version of Enterprise xLLM was implemented. More to come!
Read more, and access the PPT, at https://mltblog.com/4ftTko9
This book features new advances in game-changing AI and LLM technologies built by GenAItechLab.com. Written in simple English, it is best suited for engineers, developers, data scientists, analysts, consultants and anyone with an analytic background interested in starting a career in AI. The emphasis is on scalable enterprise solutions, easy to implement, yet outperforming vendors both in term of speed and quality, by several orders of magnitude.
Each topic comes with GitHub links, full Python code, datasets, illustrations, and real-life case studies, including from Fortune 100 company. Some of the material is presented as enterprise projects with solution, to help you build robust applications and boost your career. You don’t need expensive GPU and cloud bandwidth to implement them: a standard laptop works.
➡️ Part 1: Hallucination-Free LLM with Real-Time Fine-Tuning
➡️ Part 2: Outperforming Neural Nets and Classic AI
➡️ Part 3: Innovations in Statistical AI
About the author
Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at ML Techniques and GenAI Techlab, former VC-funded executive, author (Elsevier) and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET.
➡️ See content and get your copy, at https://mltblog.com/404F1BZ
Read full article at https://mltblog.com/4gT62y9
In this document, you will learn how to build a system that decides, among dozens of candidate paragraphs selected from the corpus to answer a prompt, which ones to show in the results, and in what order. The goal is to maximize relevancy while not overwhelming the user with a long, cluttered answer. Think of it as the new PageRank for RAG/LLM, although the algorithm is radically different, and much simpler. The approach is generic and works for all RAG/LLM systems whether based on neural networks or not. It is implemented in xLLM.
The article includes Python code (with links to GitHub) and case study featuring the anonymized augmented corpus of a fortune 100 company, as well as future LLM developments (auto-indexing and LLM for glossary generation).
Full doc at https://mltblog.com/47DisG5
Have you tried the xLLM web API? It allows you to fine-tune and debug an agentic multi-LLM in real time. The input data is part of the anonymized corporate corpus of a Fortune 100 company, dealing with AI policies, documentation, integration, best practices, references, onboarding, and so on. It features one sub-LLM. The full corpus is broken down into 15 sub-LLMs.
One of the goals is to return concise but exhaustive results, using acronyms (a specific table for each sub-LLM) to map multi-tokens found in prompts but not in the corpus, with multi-tokens in the corpus. Exhaustivity is the most overlooked metric when evaluating LLMs designed for search / retrieval. Using xLLM in combination with another LLMs is one of the best approaches, and both can be used to evaluate each other. Yet, thanks to fast in-memory processing, no weight, and no training, the xLLM web API is one of its kind, with capabilities not found in any competing product, free or not.
Read more at https://mltblog.com/47DisG5
You are welcome Stephen!
New additions to this ground-breaking system include multi-token distillation when processing prompts, agents to meet user intent, more NLP, and a command prompt menu accepting both standard prompts and various actions.
I also added several illustrations, featuring xLLM in action with a full session and sample commands to fine-tune in real-time. All the code, input sources (anonymized corporate corpus from fortune 100 company), contextual backend tables including embeddings, are on GitHub. My system has zero weight, no transformer, and no neural network. It relies on explainable AI, does not require training, is fully reproducible, and fits in memory. Yet your prompts can retrieve relevant full text entities from the corpus with no latency — including URLs, categories, titles, email addresses, and so on — thanks to well-designed architecture.
Read more, get the code, paper and everything for free, at https://mltblog.com/4dNPSnB
Many are ground-breaking innovations that make LLMs much faster and not prone to hallucinations. They reduce the cost, latency, and amount of computer resources (GPU, training) by several orders of magnitude. Some of them improve security, making your LLM more attractive to corporate clients. I introduced a few of these features in my previous article "New Trends in LLM Architecture". Now I offer a comprehensive list, based on the most recent developments.
Read full article, learn about agentic LLMs, LLM routers, contextual tables, fast search, and more, at https://mltblog.com/3Aq9iAb
See my tests: Python badly fails the congruential equidistribution among others, whatever generator is implemented in Python 3.10.
The one-line formula is this:
(5^n >> n) % (2^n)
Can be executed very efficiently. Each new n gives you n new bits independent from the previous ones. That's one of many sequences proposed in my paper. With n = 10^6, you get a total of 5 x 10^11 bits.
As for non-reproducibility, all of what I tested, you run the code twice, you get two different results. You have to set a seed for all sources of randomness, not some of them as set_seed does.
Finally, I have 40 years of experience designing random generators of increasing quality and PhD in computational stats (postdoc at the statslabs, Cambridge University).
Your Dieharder battery of tests is a joke designed by amateurs who know basic stuff in stats and nothing in number theory. The fact that everyone uses it does not make my statement less true. It does not even test "strong randomness", a concept defined in one of my books.
Actually, you only need one single test to check strong randomness: the full multivariate Kolmogorov-Smirnov distance. As far as I know, I am the only one to have implemented it in any dimension: https://pypi.org/project/genai-evaluation/
All you have to do is allow the user to specify the seeds of the random number generators involved. First, you need a good random generator you have full control over. Better than numpy.random. See ours, with infinite period and one line of code, faster and better than what's in Python and elsewhere. Here is the link: https://mltblog.com/4fGDLu0
The GenAItechLab Fellowship program allows participants to work on state-of-the-art, enterprise-grade projects, entirely for free, at their own pace, at home or in their workplace. The goal is to help you test, enhance, and further implement applications that outperform solutions offered by AI startups or organizations such as Google or OpenAI.
You will learn how to quickly build faster and lighter systems that deliver better results based on sound evaluation metrics, with a focus on case studies and best practices. Not the least, you will learn modern methods here to stay, designed by world-class expert and investor, Dr. Vincent Granville, founder of GenAItechLab.com.
This article features an application of xLLM to extract information from a corporate corpus, using prompts referred to as “queries”. The goal is to serve the business user — typically an employee of the company or someone allowed access — with condensed, relevant pieces of information including links, examples, PDFs, tables, charts, definitions and so on, to professional queries.
My custom sub-LLM designed from scratch does not rely on any Python library or API, and performs better than search tools available on the market, in terms of speed and results relevancy. It offers the user the ability to fine-tune parameters in real time, and can detect user intent to deliver appropriate output. The good performance comes from the quality of the well-structured input sources, combined with smart crawling to retrieve the embedded knowledge graph and integrate it into the backend tables. Traditional tools rely mostly on tokens, embeddings, billions of parameters and frontend tricks such as prompt engineering to fix backend issues.
To the contrary, my approach focuses on building a solid backend foundational architecture from the ground up. Tokens and embeddings are not the most important components, by a long shot. Cosine similarity and dot products are replaced by pointwise mutual information. There is no neural network, no training, and a small number of explainable parameters, easy to fine-tune.
Read more, access the code and data, at https://mltblog.com/3WcTS9C