Safetensors
English
llama
table
QA
Code
TableLLM-8b / README.md
sijialuo's picture
Update README.md
00c5928 verified
---
license: llama3.1
datasets:
- RUCKBReasoning/TableLLM-SFT
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
tags:
- table
- QA
- Code
---
# TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
| **[Paper](https://arxiv.org/abs/2403.19318)** | **[Training set](https://huggingface.co/datasets/RUCKBReasoning/TableLLM-SFT)** | **[Github](https://github.com/RUCKBReasoning/TableLLM)** | **[Homepage](https://tablellm.github.io/)** |
We present **TableLLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. TableLLM is fine-tuned based on [Llama3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
TableLLM generates either a code solution or a direct text answer to handle tabular data manipulation tasks based on different scenarios. Code generation is used for handling spreadsheet-embedded tabular data, which often involves the insert, delete, update, query, merge, and plot operations of tables. Text generation is used for handling document-embedded tabular data, which often involves the query operation of short tables.
## Evaluation Results
We evaluate the code solution generation ability of TableLLM on three benchmarks: WikiSQL, Spider and Self-created table operation benchmark. The text answer generation ability is tested on four benchmarks: WikiTableQuestion (WikiTQ), TAT-QA and FeTaQA. The evaluation result is shown below:
| Model | WikiTQ | TAT-QA | FeTaQA | WikiSQL | Spider | Self-created | Average |
| :------------------- | :----: | :----: | :----: | :-----: | :----: | :----------: | :-----: |
| TaPEX | 38.6 | – | – | 83.9 | 15.0 | / | 45.8 |
| TaPas | 31.6 | – | – | 74.2 | 23.1 | / | 43.0 |
| TableLlama | 24.0 | 22.3 | 20.5 | 43.7 | - | / | 23.4 |
| TableGPT2(7B) | 77.3 | 88.1 | 75.6 | 63.0 | 77.34 | 74.42 | 76.0
| Llama3.1 (8B) | 71.9 | 74.3 | 83.4 | 40.6 | 18.8 | 43.2 | 55.3 |
| GPT3.5 | 58.5 | 72.1 | 71.2 | 81.7 | 67.4 | 77.1 | 69.8 |
| GPT4o |**91.5**|**91.5**|**94.4**|<ins>84.0</ins>| 69.5 |<ins>77.8</ins>|<ins>84.8</ins>|
| CodeLlama (13B) | 43.4 | 47.3 | 57.2 | 38.3 | 21.9 | 47.6 | 43.6 |
| Deepseek-Coder (33B) | 6.5 | 11.0 | 7.1 | 72.5 | 58.4 | 73.9 | 33.8 |
| StructGPT (GPT3.5) | 52.5 | 27.5 | 11.8 | 67.8 |**84.8**| / | 43.1 |
| Binder (GPT3.5) | 61.6 | 12.8 | 6.9 | 78.6 | 52.6 | / | 36.3 |
| DATER (GPT3.5) | 53.4 | 28.5 | 18.3 | 58.2 | 26.5 | / | 33.0 |
| TableLLM-8B (Ours) |<ins>89.1</ins>|<ins>89.5</ins>|<ins>93.4</ins>|**89.6**|<ins>81.1</ins>|<ins>77.8</ins>|**86.7**|
## Prompt Template
The prompts we used for generating code solutions and text answers are introduced below.
### Code Solution
The prompt template for the insert, delete, update, query, and plot operations on a single table.
```
[INST]Below are the first few lines of a CSV file. You need to write a Python program to solve the provided question.
Header and first few lines of CSV file:
{csv_data}
Question: {question}[/INST]
```
The prompt template for the merge operation on two tables.
```
[INST]Below are the first few lines two CSV file. You need to write a Python program to solve the provided question.
Header and first few lines of CSV file 1:
{csv_data1}
Header and first few lines of CSV file 2:
{csv_data2}
Question: {question}[/INST]
```
The csv_data field is filled with the first few lines of your provided table file. Below is an example:
```
Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
```
### Text Answer
The prompt template for direct text answer generation on short tables.
````
[INST]Offer a thorough and accurate solution that directly addresses the Question outlined in the [Question].
### [Table Text]
{table_descriptions}
### [Table]
```
{table_in_csv}
```
### [Question]
{question}
### [Solution][INST/]
````
For more details about how to use TableLLM, please refer to our GitHub page: <https://github.com/TableLLM/TableLLM>