added some docs to readme
Browse files
README.md
CHANGED
|
@@ -17,17 +17,43 @@ pinned: false
|
|
| 17 |
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
| 18 |
|
| 19 |
## Metric Description
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## How to Use
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
*Provide simplest possible example for using the metric*
|
| 26 |
|
| 27 |
### Inputs
|
| 28 |
*List all input arguments in the format below*
|
| 29 |
-
- **
|
| 30 |
-
|
| 31 |
### Output Values
|
| 32 |
|
| 33 |
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
|
|
|
|
| 17 |
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
| 18 |
|
| 19 |
## Metric Description
|
| 20 |
+
This metric is used for evaluating how good a generated log(file) is, given a reference.
|
| 21 |
+
|
| 22 |
+
The metric measures two different aspects
|
| 23 |
+
|
| 24 |
+
1. It evaluates if the predicted log has the correct amount of timestamps, if timestamps are monotonically increasing and if the timestamps are consistent in their format.
|
| 25 |
+
2. For measuring the similarity in content (without timestamps), this metric uses sacrebleu.
|
| 26 |
|
| 27 |
## How to Use
|
| 28 |
+
The metric can be just by simply giving the predicted log and the reference log as string.
|
| 29 |
+
|
| 30 |
+
Example with timestamps that are of correct amount, consistent, monotonically increasing (-> timestamp score of 1.0):
|
| 31 |
+
```
|
| 32 |
+
>>> predictions = ["2024-01-12 11:23 hello, nice to meet you \n 2024-01-12 11:24 So we see each other again"]
|
| 33 |
+
>>> references = ["2024-02-14 This is a hello to you \n 2024-02-15 Another hello"]
|
| 34 |
+
logmetric = evaluate.load("svenwey/logscoremetric")
|
| 35 |
+
>>> results = logmetric.compute(predictions=predictions,
|
| 36 |
+
... references=references)
|
| 37 |
+
>>> print(results["timestamp_score"])
|
| 38 |
+
1.0
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
Example with timestamp missing from prediction:
|
| 42 |
+
```
|
| 43 |
+
>>> predictions = ["hello, nice to meet you"]
|
| 44 |
+
>>> references = ["2024-02-14 This is a hello to you"]
|
| 45 |
+
logmetric = evaluate.load("svenwey/logscoremetric")
|
| 46 |
+
>>> results = logmetric.compute(predictions=predictions,
|
| 47 |
+
... references=references)
|
| 48 |
+
>>> print(results["timestamp_score"])
|
| 49 |
+
0.0
|
| 50 |
+
```
|
| 51 |
|
|
|
|
| 52 |
|
| 53 |
### Inputs
|
| 54 |
*List all input arguments in the format below*
|
| 55 |
+
- **predictions** *(string list): The logs, as predicted/generated by the ML model. **Important: Every logfile is only one string, even if it contains multiple lines!***
|
| 56 |
+
- **references** *(string list): The reference logs (ground truth)*
|
| 57 |
### Output Values
|
| 58 |
|
| 59 |
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
|