epalvarez commited on
Commit
637e2cd
·
verified ·
1 Parent(s): a5e0b85

changing the embedding model to gte-large to match that of the tesla_db database (1024 embedding dimensions)... embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')

Browse files
Files changed (1) hide show
  1. app.py +13 -1
app.py CHANGED
@@ -65,10 +65,22 @@ client = OpenAI(
65
  )
66
  #---------------------------------------------------------------------
67
 
68
- embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-small')
 
 
 
 
 
 
 
69
 
70
  tesla_10k_collection = 'tesla-10k-2019-to-2023'
71
 
 
 
 
 
 
72
  # vector database constructor Chroma()
73
  vectorstore_persisted = Chroma(
74
  collection_name=tesla_10k_collection,
 
65
  )
66
  #---------------------------------------------------------------------
67
 
68
+ # embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-small')
69
+ # The gte-small model from OpenAI's family of models, which includes the GTE models designed for retrieval tasks, uses a specific number of embedding dimensions. The gte-small model has 384 embedding dimensions.
70
+ # This dimensionality allows the model to capture semantic information effectively while maintaining a relatively small model size for efficiency in retrieval tasks.
71
+
72
+ # However the vector database was encoded with model 'gte-large' which has 1024 embedding dimensions, so we need to use gte-large model here to match the embedding dimensions of the tesla_db vector database
73
+ # otherwise we get the following runtime error: "chromadb.errors.InvalidDimensionException: Embedding dimension 384 does not match collection dimensionality 1024"
74
+ embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')
75
+
76
 
77
  tesla_10k_collection = 'tesla-10k-2019-to-2023'
78
 
79
+ # Example: Creating a collection with the correct dimensionality
80
+ # tesla_10k_collection = Chroma.create_collection("tesla-10k-2019-to-2023", embedding_dim=384)
81
+
82
+
83
+
84
  # vector database constructor Chroma()
85
  vectorstore_persisted = Chroma(
86
  collection_name=tesla_10k_collection,