FlavioBF commited on
Commit
1f20fd7
·
verified ·
1 Parent(s): 148a5e5

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +9 -8
app.py CHANGED
@@ -35,16 +35,17 @@ client = OpenAI(api_key=api_key)
35
  title = "QUESTION ANSWERIGN WITH RAG TECHNIQUES"
36
  description = """
37
  This Space uses:\n
38
- FlavioBF/multi-qa-mpnet-base-dot-v1_fine_tuned_model
39
- ChatGPT 3.5-turbo
40
- to get insight from the Enron Mail dataset, in particular:\n
41
- The question asnswer model multi-qa-mpnet-base-dot-v1 has been been used to create embeddings of a small subset of the dataset and, then, trained over a sampled dataset of aapx. 500k istances of the original Enron dataset
42
- Embedded content is used to retrieve context that will be used by the downnstream processing using similarity analysis (metric = dot product). \n
43
- The chunk size of 500 chars used in the text splitters, is probably too small to capture properly sentences in an effective way, neverthless it has been kept. \n
44
 
45
- To answer to questions from Enron dataset, both mnodels are using the context generated using RAG technique. \n
46
 
47
- REMARK: due to the limited storage capacity the context can be generated only over a limited number of mails. The GPT 3.5 turbo model has been instructed to avoid to make up answers in case contecxt is not clear
 
48
  """
49
 
50
  examples=[
 
35
  title = "QUESTION ANSWERIGN WITH RAG TECHNIQUES"
36
  description = """
37
  This Space uses:\n
38
+ - FlavioBF/multi-qa-mpnet-base-dot-v1_fine_tuned_model \n
39
+ - ChatGPT 3.5-turbo \n
40
+ to get insights from the Enron Mail dataset. In particular:\n
41
+ The Question Asnswer model "multi-qa-mpnet-base-dot-v1" has been been used to create embeddings of a small subset of the dataset and, then, trained over a sampled dataset of aapx. 500k istances of the original Enron dataset
42
+ Embedded content (stored in Chroma DB) is used to retrieve context that will be used by the downnstream processing using similarity analysis (metric for similarity = dot product). \n
43
+ NOTE: The chunk size of 500 chars used in the text splitters, is probably too small to capture properly sentences in an effective way, neverthless it has been kept. \n
44
 
45
+ Finally, to answer to questions from Enron dataset, both models are using the context generated using RAG technique (retriever). \n
46
 
47
+ REMARK: due to the limited storage capacity the context can be generated only over a limited number of mails.\n
48
+ The GPT 3.5 turbo model has been instructed to avoid to make up answers in case contecxt is not clear
49
  """
50
 
51
  examples=[