File size: 3,741 Bytes
928b952
 
 
 
 
 
 
 
 
 
 
17c2952
928b952
78abc4b
928b952
 
 
 
 
32c47f3
 
 
 
 
 
e820ad2
 
 
 
 
 
 
 
928b952
 
b1cac83
928b952
9d6cbd9
b1cac83
928b952
 
729a32f
928b952
 
 
 
 
1c5e4a2
928b952
 
 
 
 
 
 
 
1c5e4a2
928b952
729a32f
928b952
729a32f
928b952
 
1c5e4a2
928b952
 
 
 
 
2414448
 
39fcffe
928b952
 
1c5e4a2
928b952
 
 
 
 
89d736e
928b952
 
 
 
 
 
 
0d148b5
928b952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d148b5
 
 
 
928b952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: mit
base_model: Deepseek-R1
tags:
  - text-generation
  - sql
  - lora
  - unsloth
  - Deepseek
---

# SQLNova - LoRA Fine-Tuned Deepseek 8B for Text-to-SQL Generation

**SQLNova** is a lightweight LoRA adapter fine-tuned on top of Unsloth’s Architecture. It is designed to convert natural language instructions into valid SQL queries with minimal compute overhead, making it ideal for integration into data-driven applications or chat interfaces.

The model was trained on over **100,000 natural language-to-SQL pairs** spanning diverse domains, including Education, Technical, Healthcare, and more.

---

## Model Dependencies

- **Python Version**: `3.10`
- **libraries**: `unsloth`
- pip install unsloth
  
## Model Highlights

- **Base model**: `Deepseek R1 8B Distilled Llama`
- **Tokenizer**: Compatible with `Deepseek R1 8B Distilled Llama`
- **Fine tuned for**: Text to SQL Converter
- **Accuracy**: > 85%
- **Language**: English Natural Language Sentences finetuned
- **Format**: `safetensors`

### General Information
- **Model type:** Text Generation
- **Language:** English  
- **License:** MIT
- **Base model:** DeepSeek R1 distilled on Llama3 8B
### Model Repository

- **Hugging Face Model Card:** [https://huggingface.co/mervp/SQLNova](https://huggingface.co/mervp/SQLNova)

---

## 💡 Intended Uses

### Applications

- Generating SQL queries from natural language prompts
- Powering AI assistants for databases
- Enhancing SQL query builders or no-code data tools
- Automating analytics workflows

---

## Limitations

While **SQLNova** performs well in many real-world scenarios Since its a Reasoning Model, there are some limitations:

- It may produce **invalid SQL** for rare or malformed inputs in rarest cases.
- Assumes a **generic SQL dialect**, resembling MySQL/PostgreSQL syntax.

### Recommendation for Use of Model

- Always **validate generated SQL** before executing in production.
- Include **schema context** in prompts to improve accuracy.
- Use with **human-in-the-loop** review for critical applications.


Thanks for visiting and downloading this model!
If this model helped you, please consider leaving a like. Your support helps this model reach more developers and encourages further improvements if any.
---

## How to Use the Model

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mervp/SQLNova",
    max_seq_length=2048,
    dtype=None,
)

prompt = """ You are an text to SQL query translator.
             Users will ask you questions in English
             and you will generate a SQL query based on their question
             SQL has to be simple, The schema context has been provided to you.


### User Question:
{}

### Sql Context:
{}

### Sql Query:
{}
"""

question = "List the names of customers who have an account balance greater than 6000."
schema = """
CREATE TABLE socially_responsible_lending (
    customer_id INT,
    name VARCHAR(50),
    account_balance DECIMAL(10, 2)
);

INSERT INTO socially_responsible_lending VALUES
    (1, 'james Chad', 5000),
    (2, 'Jane Rajesh', 7000),
    (3, 'Alia Kapoor', 6000),
    (4, 'Fatima Patil', 8000);
"""

inputs = tokenizer(
    [prompt.format(question, schema, "")],
    return_tensors="pt",
    padding=True,
    truncation=True
).to("cuda")

output = model.generate(
    **inputs,
    max_new_tokens=256, 
    temperature=0.2,  
    top_p=0.9,          
    top_k=50,        
    do_sample=True     
)

decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)

if "### Sql Query:" in decoded_output:
    sql_query = decoded_output.split("### Sql Query:")[-1].strip()
else:
    sql_query = decoded_output.strip()

print(sql_query)