metadata

base_model: ybelkada/falcon-7b-sharded-bf16
tags:
  - generated_from_trainer
  - falcon
model-index:
  - name: results
    results: []
datasets:
  - Clinton/Text-to-sql-v1

AI2sql

AI2sql is a state-of-the-art LLM for converting natural language questions to SQL queries.

Model description

AI2SQL is a specialized LLM fine-tuned from Falcon-7b-instruct with PEFT- LoRA technology, tailored for interpreting natural language and generating corresponding SQL queries.

Intended uses & limitations

AI2SQL is designed for data analysts, business intelligence professionals, and developers to facilitate the conversion of natural language questions into SQL queries. This tool aids those who are not proficient in SQL, enabling easier database querying. AI2SQL's performance is inherently tied to the characteristics of its training data. While it has been trained on a diverse and substantial dataset, it may not account for all possible SQL dialects or database structures. Careful review of the generated SQL queries is recommended.

Training and evaluation data

Trained on a comprehensive dataset comprising 262,000 rows of paired natural language questions and SQL queries sourced from Text-to-SQL Dataset, covering a wide array of domains and question complexities.

Training procedure

Overview

AI2SQL was trained in a multi-stage process, starting with a pre-trained Falcon-7b-instruct model, a large transformer-based language model. This base model was then fine-tuned using a Parameter Efficient Fine-Tuning (PEFT) approach with Locally Reweighted Approximations (LoRA) specifically for the task of translating natural language to SQL queries.

Data Preparation

The training dataset, sourced from the Text-to-SQL Dataset, included 262,000 rows of paired natural language questions and SQL queries. Each pair consists of a natural language question and its corresponding SQL query, covering a diverse range of domains and query complexities.

Fine-Tuning Process

Data Preprocessing: The dataset was preprocessed to normalize text and SQL queries, ensuring consistency in formatting and syntax.
Model Adaptation: The Falcon-7b-instruct model was adapted using PEFT- LoRA, a technique that allows for efficient and targeted updates to the model's weights without extensive retraining. This approach is particularly beneficial for adapting large-scale models to specific tasks with limited computational resources.
Training Strategy: The model was trained in a supervised learning setup, where it learned to map natural language inputs to their corresponding SQL queries. Special attention was given to the model's ability to understand the semantics of the natural language questions and accurately reflect them in SQL syntax.
Validation and Testing: Throughout the training process, the model was periodically evaluated on a held-out validation set to monitor its performance and prevent overfitting. The final model was tested on an independent test set to assess its generalization capabilities.

Model Evaluation

The model's performance was evaluated based on its accuracy in generating correct SQL queries corresponding to the input natural language questions. Metrics such as precision, recall, and F1-score were used to quantify the model's effectiveness.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.03
training_steps: 500
mixed_precision_training: Native AMP

Training results

Performance Metrics

AI2SQL's performance was rigorously evaluated post-training. The key metrics used to assess the model were:

Accuracy: The percentage of queries where the model-generated SQL matched the expected SQL.
Precision: The proportion of correctly generated SQL queries out of all queries generated by the model.
Recall: The ability of the model to generate all relevant SQL queries corresponding to the input natural language questions.
F1-Score: The harmonic mean of precision and recall, providing a balance between the two.

Results:

Accuracy: TBD
Precision: TBD
Recall: TBD
F1-Score: TBD

Insights and Observations

Handling Complex Queries: AI2SQL demonstrated a high proficiency in handling complex queries involving multiple SQL clauses and parameters.
Contextual Understanding: The model showed a notable capability in understanding context and generating SQL queries that accurately reflect nuanced natural language instructions.
Performance on Diverse Data: AI2SQL maintained consistent performance across various domains present in the training dataset, indicating its robustness and general applicability.

Limitations Observed

Handling Ambiguous Questions: The model sometimes struggled with ambiguous natural language inputs where the intent was not clear.
Query Specificity: In cases of highly specific queries, the model occasionally generated SQL that was syntactically correct but did not completely align with the nuanced requirements of the question.

Future Improvements

Based on the training results and observed limitations, future improvements could include:

Enhanced training on ambiguous natural language inputs to improve the model's interpretative capabilities.
Further fine-tuning with a broader range of specific and complex SQL queries to enhance the model's accuracy in generating nuanced SQL statements.

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu118
Datasets 2.15.0
Tokenizers 0.15.0