Open-sourcing the Plain English to SQL Pipeline
There are no more "AI companies." Every company is just doing AI now. Two great examples are Snowflake and Uber.
Both of these orgs have either already productized or are teasing products which feature plain English to SQL pipelines.
I wanted to see how hard it would be to build something similar using open source tools like the Hugging Face smolagents library. Turns out, I was able to build a pretty decent MVP in the London Luton departures terminal.
The Genesis of the Tool
The goal was simple: enable users to interact with a structured SQL database using natural language queries. Instead of writing complex SQL statements, users can type their questions in plain English, and the tool will generate and execute an appropriate SQL query to retrieve relevant data.
The application integrates several open-source components:
- Gradio: Provides a simple and intuitive UI for interacting with the tool.
- SQLAlchemy: A powerful toolkit for handling SQL database interactions.
- SmolAgents and Hugging Face Transformers: Facilitates the conversion of natural language into SQL queries.
- Pandas: Ensures efficient data handling and display.
How It Works
- User Input: The user enters a natural language question.
- Query Generation: A language model (Qwen2.5-Coder-32B-Instruct) converts the input into a valid SQL query based on a predefined database schema.
- Query Execution: The generated SQL is executed against the database using SQLAlchemy.
- Results Display: The output is formatted and presented back to the user.
To maintain security, the system restricts queries to only SELECT statements, preventing unintended modifications to the database. These features could be expanded upon and enhanced whilst maintaining the same security posture using an on-prem deployment of a dedicated LLM.
The Power of Open Source
Accessibility & Innovation
One of the most significant advantages of open-source AI is accessibility. Proprietary AI solutions often come with steep licensing fees and restricted access, making them impractical for many developers, researchers, and businesses. Open-source alternatives provide a level playing field, allowing anyone to innovate and build upon existing models.
Who cares if your model can do 8D maths. I need it to write SQL.
Transparency & Trust
With open-source software, the underlying code is publicly available for review. This transparency ensures security, prevents hidden biases, and allows the community to scrutinize and improve upon existing frameworks. In AI, where ethical concerns around bias and fairness are prevalent, transparency is essential for building trustworthy systems.
There is no magic bullet here. This is AI 101, leveraging publically available knowledge of common language systems to the advantage of you or your business.
Adaptability & Customization
Proprietary solutions often impose limitations on customization and integration. Open-source AI allows developers to modify and tailor models to suit specific use cases. In our SQL query interface, the ability to tweak and optimize the Hugging Face model for SQL generation was invaluable in refining accuracy and performance.
This project has already inspired continuations.
Conclusion
The AI-powered SQL query interface developed here is an MVP testament to the power of open-source AI libraries. By combining multiple open-source technologies, I've created an intuitive and intelligent tool that simplifies database interactions.
As AI continues to evolve, the need for transparency, collaboration, and accessibility will only grow. Open-source AI ensures that innovation remains in the hands of the many, not the few, fostering a future where technology serves everyone, not just those who can afford it.
Have some fun using SqlAgent here and offer suggestions for improvements or collaboration: https://huggingface.co/spaces/ZennyKenny/sqlAgent