Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
@@ -34,17 +34,17 @@ client = OpenAI(api_key=api_key)
|
|
34 |
|
35 |
title = "QUESTION ANSWERIGN WITH RAG TECHNIQUES"
|
36 |
description = """
|
37 |
-
This Space uses
|
38 |
FlavioBF/multi-qa-mpnet-base-dot-v1_fine_tuned_model
|
39 |
ChatGPT 3.5-turbo
|
40 |
-
to get insight from the Enron Mail dataset
|
41 |
-
The question asnswer model multi-qa-mpnet-base-dot-v1 has been been used to create embeddings of a small subset of the dataset and, then, trained over a sampled dataset of 500k istances of the original Enron dataset
|
42 |
-
Embedded content is used to retrieve context that will be used by the
|
43 |
The chunk size of 500 chars used in the text splitters, is probably too small to capture properly sentences in an effective way, neverthless it has been kept. \n
|
44 |
-
Further tests would have been required with 1000k and 1.5k chars.
|
45 |
-
To answer to quesitons from Enron dataset, both mnodels are using the context genrated by RAG technique. \n
|
46 |
|
47 |
-
|
|
|
|
|
48 |
"""
|
49 |
|
50 |
examples=[
|
|
|
34 |
|
35 |
title = "QUESTION ANSWERIGN WITH RAG TECHNIQUES"
|
36 |
description = """
|
37 |
+
This Space uses:\n
|
38 |
FlavioBF/multi-qa-mpnet-base-dot-v1_fine_tuned_model
|
39 |
ChatGPT 3.5-turbo
|
40 |
+
to get insight from the Enron Mail dataset, in particular:\n
|
41 |
+
The question asnswer model multi-qa-mpnet-base-dot-v1 has been been used to create embeddings of a small subset of the dataset and, then, trained over a sampled dataset of aapx. 500k istances of the original Enron dataset
|
42 |
+
Embedded content is used to retrieve context that will be used by the downnstream processing using similarity analysis (metric = dot product). \n
|
43 |
The chunk size of 500 chars used in the text splitters, is probably too small to capture properly sentences in an effective way, neverthless it has been kept. \n
|
|
|
|
|
44 |
|
45 |
+
To answer to questions from Enron dataset, both mnodels are using the context generated using RAG technique. \n
|
46 |
+
|
47 |
+
REMARK: due to the limited storage capacity the context can be generated only over a limited number of mails. The GPT 3.5 turbo model has been instructed to avoid to make up answers in case contecxt is not clear
|
48 |
"""
|
49 |
|
50 |
examples=[
|