am5uc commited on
Commit
79f4083
·
verified ·
1 Parent(s): 42b6922

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -14,6 +14,10 @@ ServiceNow is a platform that helps businesses automate their processes and work
14
 
15
  For this project, the training data was structured around ServiceNow ITSM tables, specifically Incident, Change, and Problem tables. I used a certain subset fields from Incident, Change, and Problem tables. For example, Problem tables have a problem id, priority, status, root cause, and resolved at field. Since I can’t use official data from in-use ServiceNow instances, which contain private information, I generated a synthetic dataset with custom code. Then, I had to structure that code in sqa format, which is the best format for the model I was using, TAPAS. For this, I had to save each table in a CSV file. The final refined dataset that I would pass in would contain an id, uestion, table_file, answer_coordinates if the answer was in the table itself, the actual answer, and a float answer if the answer was a numeric value not in the data, such as a count. I do have an aggregation_label field as well, which I set right before the training process, but after the train_test_table split. I used the method train_test_split() to obtain the training, validation, and test data. I specifically used a seed of 42:
16
 
 
 
 
 
17
  ```python
18
  train_val_data, test_data = train_test_split(data, test_size=0.1, random_state=42)
19
  # Then split train+validation into train and validation
 
14
 
15
  For this project, the training data was structured around ServiceNow ITSM tables, specifically Incident, Change, and Problem tables. I used a certain subset fields from Incident, Change, and Problem tables. For example, Problem tables have a problem id, priority, status, root cause, and resolved at field. Since I can’t use official data from in-use ServiceNow instances, which contain private information, I generated a synthetic dataset with custom code. Then, I had to structure that code in sqa format, which is the best format for the model I was using, TAPAS. For this, I had to save each table in a CSV file. The final refined dataset that I would pass in would contain an id, uestion, table_file, answer_coordinates if the answer was in the table itself, the actual answer, and a float answer if the answer was a numeric value not in the data, such as a count. I do have an aggregation_label field as well, which I set right before the training process, but after the train_test_table split. I used the method train_test_split() to obtain the training, validation, and test data. I specifically used a seed of 42:
16
 
17
+ Example CSV with training data:
18
+
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67885e8302ab11c0b0ed0853/-8piWOY40wzTk3qU1tmRS.png)
20
+
21
  ```python
22
  train_val_data, test_data = train_test_split(data, test_size=0.1, random_state=42)
23
  # Then split train+validation into train and validation