Spaces:
Sleeping
Sleeping
changing the embedding model to gte-large to match that of the tesla_db database (1024 embedding dimensions)... embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')
Browse files
app.py
CHANGED
|
@@ -65,10 +65,22 @@ client = OpenAI(
|
|
| 65 |
)
|
| 66 |
#---------------------------------------------------------------------
|
| 67 |
|
| 68 |
-
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-small')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
tesla_10k_collection = 'tesla-10k-2019-to-2023'
|
| 71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
# vector database constructor Chroma()
|
| 73 |
vectorstore_persisted = Chroma(
|
| 74 |
collection_name=tesla_10k_collection,
|
|
|
|
| 65 |
)
|
| 66 |
#---------------------------------------------------------------------
|
| 67 |
|
| 68 |
+
# embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-small')
|
| 69 |
+
# The gte-small model from OpenAI's family of models, which includes the GTE models designed for retrieval tasks, uses a specific number of embedding dimensions. The gte-small model has 384 embedding dimensions.
|
| 70 |
+
# This dimensionality allows the model to capture semantic information effectively while maintaining a relatively small model size for efficiency in retrieval tasks.
|
| 71 |
+
|
| 72 |
+
# However the vector database was encoded with model 'gte-large' which has 1024 embedding dimensions, so we need to use gte-large model here to match the embedding dimensions of the tesla_db vector database
|
| 73 |
+
# otherwise we get the following runtime error: "chromadb.errors.InvalidDimensionException: Embedding dimension 384 does not match collection dimensionality 1024"
|
| 74 |
+
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')
|
| 75 |
+
|
| 76 |
|
| 77 |
tesla_10k_collection = 'tesla-10k-2019-to-2023'
|
| 78 |
|
| 79 |
+
# Example: Creating a collection with the correct dimensionality
|
| 80 |
+
# tesla_10k_collection = Chroma.create_collection("tesla-10k-2019-to-2023", embedding_dim=384)
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
|
| 84 |
# vector database constructor Chroma()
|
| 85 |
vectorstore_persisted = Chroma(
|
| 86 |
collection_name=tesla_10k_collection,
|