FlavioBF commited on
Commit
148a5e5
·
verified ·
1 Parent(s): acb9a74

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +7 -7
app.py CHANGED
@@ -34,17 +34,17 @@ client = OpenAI(api_key=api_key)
34
 
35
  title = "QUESTION ANSWERIGN WITH RAG TECHNIQUES"
36
  description = """
37
- This Space uses:
38
  FlavioBF/multi-qa-mpnet-base-dot-v1_fine_tuned_model
39
  ChatGPT 3.5-turbo
40
- to get insight from the Enron Mail dataset. In particular:\n
41
- The question asnswer model multi-qa-mpnet-base-dot-v1 has been been used to create embeddings of a small subset of the dataset and, then, trained over a sampled dataset of 500k istances of the original Enron dataset
42
- Embedded content is used to retrieve context that will be used by the downsrteam processing using similarity analysis (metric = dot product). \n
43
  The chunk size of 500 chars used in the text splitters, is probably too small to capture properly sentences in an effective way, neverthless it has been kept. \n
44
- Further tests would have been required with 1000k and 1.5k chars.
45
- To answer to quesitons from Enron dataset, both mnodels are using the context genrated by RAG technique. \n
46
 
47
- REMARK: due to the limited storage capacity the context can be generated only over a limited number of mails. The GPT 3.5 turbo model has been instructed to avoi to make up an answer base on its own trauined data
 
 
48
  """
49
 
50
  examples=[
 
34
 
35
  title = "QUESTION ANSWERIGN WITH RAG TECHNIQUES"
36
  description = """
37
+ This Space uses:\n
38
  FlavioBF/multi-qa-mpnet-base-dot-v1_fine_tuned_model
39
  ChatGPT 3.5-turbo
40
+ to get insight from the Enron Mail dataset, in particular:\n
41
+ The question asnswer model multi-qa-mpnet-base-dot-v1 has been been used to create embeddings of a small subset of the dataset and, then, trained over a sampled dataset of aapx. 500k istances of the original Enron dataset
42
+ Embedded content is used to retrieve context that will be used by the downnstream processing using similarity analysis (metric = dot product). \n
43
  The chunk size of 500 chars used in the text splitters, is probably too small to capture properly sentences in an effective way, neverthless it has been kept. \n
 
 
44
 
45
+ To answer to questions from Enron dataset, both mnodels are using the context generated using RAG technique. \n
46
+
47
+ REMARK: due to the limited storage capacity the context can be generated only over a limited number of mails. The GPT 3.5 turbo model has been instructed to avoid to make up answers in case contecxt is not clear
48
  """
49
 
50
  examples=[