am5uc commited on
Commit
183485e
·
verified ·
1 Parent(s): 79f4083

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ ServiceNow is a platform that helps businesses automate their processes and work
14
 
15
  For this project, the training data was structured around ServiceNow ITSM tables, specifically Incident, Change, and Problem tables. I used a certain subset fields from Incident, Change, and Problem tables. For example, Problem tables have a problem id, priority, status, root cause, and resolved at field. Since I can’t use official data from in-use ServiceNow instances, which contain private information, I generated a synthetic dataset with custom code. Then, I had to structure that code in sqa format, which is the best format for the model I was using, TAPAS. For this, I had to save each table in a CSV file. The final refined dataset that I would pass in would contain an id, uestion, table_file, answer_coordinates if the answer was in the table itself, the actual answer, and a float answer if the answer was a numeric value not in the data, such as a count. I do have an aggregation_label field as well, which I set right before the training process, but after the train_test_table split. I used the method train_test_split() to obtain the training, validation, and test data. I specifically used a seed of 42:
16
 
17
- Example CSV with training data:
18
 
19
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67885e8302ab11c0b0ed0853/-8piWOY40wzTk3qU1tmRS.png)
20
 
 
14
 
15
  For this project, the training data was structured around ServiceNow ITSM tables, specifically Incident, Change, and Problem tables. I used a certain subset fields from Incident, Change, and Problem tables. For example, Problem tables have a problem id, priority, status, root cause, and resolved at field. Since I can’t use official data from in-use ServiceNow instances, which contain private information, I generated a synthetic dataset with custom code. Then, I had to structure that code in sqa format, which is the best format for the model I was using, TAPAS. For this, I had to save each table in a CSV file. The final refined dataset that I would pass in would contain an id, uestion, table_file, answer_coordinates if the answer was in the table itself, the actual answer, and a float answer if the answer was a numeric value not in the data, such as a count. I do have an aggregation_label field as well, which I set right before the training process, but after the train_test_table split. I used the method train_test_split() to obtain the training, validation, and test data. I specifically used a seed of 42:
16
 
17
+ Example of how the training data appears:
18
 
19
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67885e8302ab11c0b0ed0853/-8piWOY40wzTk3qU1tmRS.png)
20