AI & ML interests

Large Action Model to automate browser interaction

lavague-ai's activity

dhuynh95 
posted an update 8 months ago
view post
Post
1622
💪Build an information retrieval Agent that can beat Gemini and OpenAI using open-source Large Action Model framework!

In this video, we ask to different proprietary Conversational AI the question:
“What is the most trendy recent paper on Llava models on Hugging Face papers? Provide the date and a summary of the paper”, and the results are interesting!
❌Gemini: found a paper from Jan 29, 2024
❌OpenAI: found a paper from October 2023
❌You.com: found a paper from Jan 29 2024
✅LaVague: found the latest paper (ConvLlaVA which is dope by the way https://arxiv.org/abs/2405.15738)!

The best? Our solution fits a few ines of code with our open-source framework! I will share how we built that agent during our webinar on AI Web Agents, this Thursday 30th May at 9 am PST (https://lu.ma/m8fzmb3q) so don’t miss it 😉

You can also start playing with our framework: https://github.com/lavague-ai/LaVague
JoFrost 
updated a Space 10 months ago
dhuynh95 
posted an update 10 months ago
view post
Post
1795
🌊LaVague can compile Action Plans into actionable code to browse the internet!

In this example, you can see how an action plan with natural language instructions can be “compiled” into executable Selenium code!

🤖This shows the potential of #LAM (Large Action Models) to perform actions for us and automate mechanical tasks.
This example leverages a local embedding model and OpenAI GPT-3.5, but we support many options, including local ones with Gemma!
You can try this in our docs: https://docs.lavague.ai/en/latest/

LaVague is an open-source Large Action Model framework to automate automation. If you are interested in helping us on our mission to democratize automation tooling for devs, don’t hesitate to visit our GitHub (https://github.com/lavague-ai/LaVague) or Discord (https://discord.gg/SDxn9KpqX9)!
dhuynh95 
posted an update 10 months ago
view post
Post
Hello World! This post is written by the Large Action Model framework LaVague! Find out more on https://github.com/mithril-security/LaVague

Edit: Here is the video of 🌊LaVague posting this. This is quite meta
  • 2 replies
·
dhuynh95 
posted an update 11 months ago
view post
Post
🌊 Released #LaVague, fullly open-source AI pipeline to turn natural language into browser actions!

In less than 150 lines of code (RAG with local embedding + Zephyr-7b-Gemma locally or Mixtral on HF Inference API), it generates #Selenium code from user query. In this GIF you can see it follow user instructions to command a browser to browse HF website!

Try it on Colab: colab.research.google.com/github/dhuynh95/LaVague/blob/main/LaVague.ipynb
GitHub: github.com/dhuynh95/LaVague

Pretty exciting how it becomes possible to create an AI assistant that could perform actions for us, such as logging on gov accounts, fill forms, or pull personal information!

It was quite fun to hack in the weekend using open-source tools, from @huggingface local embedding with transformers for local inference or HF Inference API, to RAG with @llama_index, through @MistralAI Mixtral model!

Some challenges: to make it run on Colab for the #GPU Poors, I first resorted to @huggingface Inference API with Mixtral as it was the only model good enough (gemma-7b did not make it and refused to produce code). But after some experimentations, I managed to make it work a local Zephyr-7b-Gemma so that people could run this assistant fully locally!

Because I used an off-the-shelf model, I had to improve performance with few-shot learning and Chain Of Thought, which managed to generate appropriate code!

I hope this project will herald a new dawn where transparent, private and local AI assistants help automate menial but critical tasks, such as helping fill taxes, book accomodation, or research information for us.
·
dhuynh95 
posted an update 11 months ago
view post
Post
✨ In-context learning is all you need!

This super interesting paper shows that fine-tuning with #SFT or #RLHF only helps on the form but does not impact knowledge or reasoning abilities, and in some cases, actually decreases performance!

They tested it with Mistral-base vs Mistral FT-ed, as well as Llama 2 70b base and FT-ed and results are consistent.

Providing the right prompt to the base model actually makes the model better and has 0 training cost!

Paper: https://arxiv.org/abs/2312.01552
dhuynh95 
posted an update 12 months ago
view post
Post
Fascinating paper by Rand shows that there is no statistically significant difference between using LLMs or regular internet to craft operational plans for bioweapons!

This is the first paper that actually studies the impact of AI on bioweapons from an operational perspective and looks at the big question: is AI any better than just using public data on the Internet?

As most of the data is most likely out there, an LLM would just be a more efficient tool to come up with the relevant information, but it seems that its impact is limited.

https://www.rand.org/pubs/research_reports/RRA2977-2.html
  • 1 reply
·
dhuynh95 
posted an update 12 months ago
view post
Post
✅New paper to ensure valid LLM output with SOTA LLMs like GPT4 by mixing it with OSS LLMs

Paper: arxiv.org/abs/2401.09967

Great paper showing how strong proprietary AI like #GPT4 can be paired with #OSS LLM to ensure LLM output validity, e.g. valid JSON.

Many devs complain that #LLMs cannot be reliably used in production if the output is not valid, for instance, if one wants to use LLMs to generate SQL queries or JSON, it is crucial that the output is valid.

Frameworks have arisen to constrain the outputs of the LLM to follow some constraints, like outlines (https://github.com/outlines-dev/outlines), but they assume access to logits.

This makes them incompatible with proprietary LLMs like GPT4 that don’t share logits, so one can only use open-source LLMs that are much less performant.

This paper shows how can use powerful proprietary LLMs like GPT4 to create a first unconstrained sketch and refine it using an OSS model like Llama 2 where logits are accessible, to rewrite the sketch following some specific constraints.

They show that GPT4 Precision can be increased by 14% (43% before, 57% after), by boosting it with constrained output on information extraction on Wiki-NRE!
dhuynh95 
posted an update about 1 year ago
view post
Post
🪟32k-context BERT for embedding and RAG on long corpus

Monarch Mixer is a new architecture to enable long context BERT for large corpus and can be fine-tuned for large context retrieval.

Quite interesting and important as BERT is still the most used LLM in production for "old school" tasks like classification, NER, embeddings, but is also a key component for RAG.

Paper: https://arxiv.org/abs/2310.12109
Blog: https://hazyresearch.stanford.edu/blog/2024-01-11-m2-bert-retrieval
GitHub: https://github.com/HazyResearch/m2
  • 1 reply
·