Spaces:
Sleeping
Sleeping
changing the embedding model to gte-large to match that of the tesla_db database (1024 embedding dimensions)... embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')
Browse files
app.py
CHANGED
@@ -65,10 +65,22 @@ client = OpenAI(
|
|
65 |
)
|
66 |
#---------------------------------------------------------------------
|
67 |
|
68 |
-
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-small')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
|
70 |
tesla_10k_collection = 'tesla-10k-2019-to-2023'
|
71 |
|
|
|
|
|
|
|
|
|
|
|
72 |
# vector database constructor Chroma()
|
73 |
vectorstore_persisted = Chroma(
|
74 |
collection_name=tesla_10k_collection,
|
|
|
65 |
)
|
66 |
#---------------------------------------------------------------------
|
67 |
|
68 |
+
# embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-small')
|
69 |
+
# The gte-small model from OpenAI's family of models, which includes the GTE models designed for retrieval tasks, uses a specific number of embedding dimensions. The gte-small model has 384 embedding dimensions.
|
70 |
+
# This dimensionality allows the model to capture semantic information effectively while maintaining a relatively small model size for efficiency in retrieval tasks.
|
71 |
+
|
72 |
+
# However the vector database was encoded with model 'gte-large' which has 1024 embedding dimensions, so we need to use gte-large model here to match the embedding dimensions of the tesla_db vector database
|
73 |
+
# otherwise we get the following runtime error: "chromadb.errors.InvalidDimensionException: Embedding dimension 384 does not match collection dimensionality 1024"
|
74 |
+
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')
|
75 |
+
|
76 |
|
77 |
tesla_10k_collection = 'tesla-10k-2019-to-2023'
|
78 |
|
79 |
+
# Example: Creating a collection with the correct dimensionality
|
80 |
+
# tesla_10k_collection = Chroma.create_collection("tesla-10k-2019-to-2023", embedding_dim=384)
|
81 |
+
|
82 |
+
|
83 |
+
|
84 |
# vector database constructor Chroma()
|
85 |
vectorstore_persisted = Chroma(
|
86 |
collection_name=tesla_10k_collection,
|