am5uc commited on
Commit
263c7ca
·
verified ·
1 Parent(s): 322c9f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -1
README.md CHANGED
@@ -3,10 +3,153 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
 
12
  ## Model Details
 
3
  tags: []
4
  ---
5
 
6
+ # ServiceNow Environment Assistant
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
+ ## Introduction
11
+ ServiceNow is a platform that helps businesses automate their processes and workflows. They offer several solutions such as ITSM. Currently, users of ServiceNow generally need to apply filters and/or build dashboards to observe data about tables in ServiceNow, such as incidents and problems. Building dashboards and reports often require the help of developers and may be a hassle just for quick information. Dashboards are useful for visual representation, but it would be useful to be able to ask questions about the data just to a chatbot. As I didn’t know how feasible it would be to integrate a ServiceNow instance with LLM, but I could instead just import example tables as csv files and use that as data. The task is to create a custom LLM chatbot that uses data from tables ServiceNow uses such as incident, change, and problem, which can then be used to respond to user queries in natural language. An LLM like chat-gpt probably wouldn’t work too poorly with this table data, but the costs of using those are probably higher as well.
12
+
13
+ ## Training Data
14
+
15
+ For this project, the training data was structured around ServiceNow ITSM tables, specifically Incident, Change, and Problem tables. I used a certain subset fields from Incident, Change, and Problem tables. For example, Problem tables have a problem id, priority, status, root cause, and resolved at field. Since I can’t use official data from in-use ServiceNow instances, which contain private information, I generated a synthetic dataset with custom code. Then, I had to structure that code in sqa format, which is the best format for the model I was using, TAPAS. For this, I had to save each table in a CSV file. The final refined dataset that I would pass in would contain an id, uestion, table_file, answer_coordinates if the answer was in the table itself, the actual answer, and a float answer if the answer was a numeric value not in the data, such as a count. I do have an aggregation_label field as well, which I set right before the training process, but after the train_test_table split. I used the method train_test_split() to obtain the training, validation, and test data. I specifically used a seed of 42:
16
+
17
+ ```python
18
+ train_val_data, test_data = train_test_split(data, test_size=0.1, random_state=42)
19
+ # Then split train+validation into train and validation
20
+ train_data, val_data = train_test_split(train_val_data, test_size=0.1, random_state=42)
21
+ ```
22
+
23
+ ## Training Method
24
+ I used full-fine tuning. The model did not really need generalization abilities. Its primary purpose is to take ServiceNow Tables and answer queries based on those tables. Keeping some generalization ability would be nice, but isn't really that necessary. PEFT could work as well to prevent catastrophic overfitting, but generalization is not hugely important. The drawbacks I had expected was some generalization loss, but that wasn't really the case.
25
+
26
+ These were the arguments/hyperparameters, I used. I tried using higher epochs, but those usually caused worse results:
27
+ ```python
28
+ num_train_epochs=1, # Number of training epochs
29
+ per_device_train_batch_size=32, # Batch size per device during training
30
+ per_device_eval_batch_size=64, # Batch size per device during evaluation
31
+ learning_rate=0.00001,
32
+ warmup_steps=100, # Number of warmup steps for learning rate scheduler
33
+ weight_decay=0.01, # Strength of weight decay
34
+ evaluation_strategy="steps", # Evaluate every 'eval_steps'
35
+ eval_steps=50, # Evaluation frequency in steps
36
+ logging_steps=50, # Log every eval_steps
37
+ save_steps=150, # Save model every 500 steps
38
+ save_total_limit=2,
39
+ load_best_model_at_end=True, # Load the best model when finished training
40
+ metric_for_best_model="eval_loss", # Metric to use for best model selection
41
+ ```
42
+
43
+ ## Evaluation
44
+ I had three benchmarks, the WikiTableQuestions dataset, the TabFact dataset, and the Synthetic Validation set. Fine-tuning did not harm the results of on the WTQ Validation Set and the TabFact Dataset, in which I got accuracies of .3405 and .5005, respectively for both the pre-trained and fine-tuned model. There were improvements in the validation and test results after training though. On Validation, there was a jump from 0.4000 to 0.4222. On the test set, there was quite a larger jump in accuracy from 0.2033 to 0.4667 after fine-tuning.
45
+
46
+ | Model | Benchmark 1 (WTQ Validation Set) | Benchmark 2 (TabFact) | Benchmark 3 (Synthetic Validation Set) | Test Set of Synthetic Dataset |
47
+ |------------------------------------------------------|----------------------------------|-----------------------|----------------------------------------|-------------------------------|
48
+ | google/tapas-base-finetuned-wtq (before Fine-tuning) | 0.3405 | .5005 | 0.4000 | 0.2933 |
49
+ | google/tapas-base-finetuned-wtq (Fine-tuned) | 0.3405 | .5005 | 0.4222 | 0.4667 |
50
+
51
+
52
+ ## Usage
53
+ The prompt for the TAPAS model should be a natural language question paired with a structured table that can be passed in in dataframe format. The prompt should look like this:
54
+
55
+ ```python
56
+ question = "How many Hardware Upgrade changes are still pending?"
57
+ table_df = pd.DataFrame({
58
+ "change_id": [
59
+ "CHG3000",
60
+ "CHG3001",
61
+ "CHG3002",
62
+ "CHG3003"
63
+ ],
64
+ "category": [
65
+ "Security Patch",
66
+ "Software Update",
67
+ "Hardware Upgrade",
68
+ "Software Update"
69
+ ],
70
+ "status": [
71
+ "Rejected",
72
+ "In Progress",
73
+ "In Progress",
74
+ "Completed"
75
+ ],
76
+ "approved_by": [
77
+ "",
78
+ "Manager2",
79
+ "",
80
+ "Admin1"
81
+ ],
82
+ "implementation_date": [
83
+ "",
84
+ "",
85
+ "",
86
+ "2023-05-30"
87
+ ]
88
+ })
89
+ ```
90
+
91
+ Or you could define table in json format and then have table = pd.DataFrame(table) in your tokenizer.
92
+
93
+ ## Expected Output Format
94
+ You tokenize the inputs and then perform a specific function to get outputs, which are the aggregation operation, answer, and predicted_cells. You can just grab the middle value which is the predicted answer.
95
+ ```python
96
+ # Tokenize both Question and Table together
97
+ inputs = tokenizer(table=table_df, queries=[question], padding='max_length', return_tensors='pt')
98
+
99
+ # Model prediction
100
+ ##--- Helper function ---
101
+
102
+ def get_final_answer(model, tokenizer, inputs, table_df):
103
+ outputs = model(**inputs)
104
+
105
+ logits = outputs.logits
106
+ logits_agg = outputs.logits_aggregation
107
+
108
+ predicted_answer_coordinates, predicted_aggregation_indices = tokenizer.convert_logits_to_predictions(
109
+ inputs,
110
+ logits.detach(),
111
+ logits_agg=logits_agg.detach()
112
+ )
113
+
114
+ aggregation_operators = ["NONE", "SUM", "AVERAGE", "COUNT"]
115
+
116
+ agg_op_idx = predicted_aggregation_indices[0] if predicted_aggregation_indices else 0
117
+ agg_op = aggregation_operators[agg_op_idx]
118
+
119
+ predicted_cells = []
120
+ for coord in predicted_answer_coordinates[0]:
121
+ cell_value = table_df.iat[coord[0], coord[1]]
122
+ predicted_cells.append(cell_value)
123
+
124
+ if agg_op == "COUNT":
125
+ answer = len(predicted_cells)
126
+ elif agg_op == "SUM":
127
+ try:
128
+ answer = sum(float(cell) for cell in predicted_cells)
129
+ except ValueError:
130
+ answer = "Could not SUM non-numeric cells"
131
+ elif agg_op == "AVERAGE":
132
+ try:
133
+ answer = sum(float(cell) for cell in predicted_cells) / len(predicted_cells)
134
+ except ValueError:
135
+ answer = "Could not AVERAGE non-numeric cells"
136
+ else: # NONE
137
+ answer = predicted_cells
138
+
139
+ return agg_op, answer, predicted_cells
140
+
141
+ _, answer, _ = get_final_answer(model, tokenizer, inputs, table_df)
142
+ print(answer)
143
+ ```
144
+
145
+ ## Limitations
146
+ The model does still not come close to a 100% accuracy. Possiblly using a larger model could help. It does seem to only be able to take in a limited size for tables, larger than a whole system. Once again, possibly a larger model could help. Also this needs to take in a question and table in dataframe format, so more preocessing is necessary than just a regular prompt.
147
+
148
+ ## Prompt Format
149
+
150
+
151
+
152
+
153
 
154
 
155
  ## Model Details