Spaces:

FlavioBF
/

BRUNO_FRANCO_Flavio

Sleeping

FlavioBF commited on Apr 10, 2024

Commit

1f20fd7

verified ·

1 Parent(s): 148a5e5

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -35,16 +35,17 @@ client = OpenAI(api_key=api_key)
 title = "QUESTION ANSWERIGN WITH RAG TECHNIQUES"
 description = """
 This Space uses:\n
-FlavioBF/multi-qa-mpnet-base-dot-v1_fine_tuned_model
-ChatGPT 3.5-turbo
-to get insight from the Enron Mail dataset, in particular:\n
-The question asnswer model multi-qa-mpnet-base-dot-v1 has been been used to create embeddings of a small subset of the dataset and, then, trained over a sampled dataset of aapx. 500k istances of the original Enron dataset
-Embedded content is used to retrieve context that will be used by the downnstream processing using similarity analysis (metric = dot product). \n
-The chunk size of 500 chars used in the text splitters, is probably too small to capture properly sentences in an effective way, neverthless it has been kept. \n
-To answer to questions from Enron dataset, both mnodels are using the context generated using RAG technique. \n
-REMARK: due to the limited storage capacity the context can be generated only over a limited number of mails. The GPT 3.5 turbo model has been instructed to avoid to make up answers in case contecxt is not clear
 """
 examples=[

 title = "QUESTION ANSWERIGN WITH RAG TECHNIQUES"
 description = """
 This Space uses:\n
+- FlavioBF/multi-qa-mpnet-base-dot-v1_fine_tuned_model \n
+- ChatGPT 3.5-turbo \n
+to get insights from the Enron Mail dataset. In particular:\n
+The Question Asnswer model "multi-qa-mpnet-base-dot-v1" has been been used to create embeddings of a small subset of the dataset and, then, trained over a sampled dataset of aapx. 500k istances of the original Enron dataset
+Embedded content (stored in Chroma DB) is used to retrieve context that will be used by the downnstream processing using similarity analysis (metric for similarity = dot product). \n
+NOTE: The chunk size of 500 chars used in the text splitters, is probably too small to capture properly sentences in an effective way, neverthless it has been kept. \n
+Finally, to answer to questions from Enron dataset, both models are using the context generated using RAG technique (retriever). \n
+REMARK: due to the limited storage capacity the context can be generated only over a limited number of mails.\n
+The GPT 3.5 turbo model has been instructed to avoid to make up answers in case contecxt is not clear
 """
 examples=[