---
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
license: cc-by-nc-4.0
---

# FinLang/finance-embeddings-investopedia

This is the Investopedia embedding for finance application by the FinLang team. The model is trained using our open-sourced finance dataset from https://huggingface.co/datasets/FinLang/investopedia-embedding-dataset

This is a finetuned embedding model on top of BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search in RAG applications.


This project is for research purposes only. Third-party datasets may be subject to additional terms and conditions under their associated licenses.


## Plans
* The research paper will be published soon.
* We are working on a v2 version of the model where we are increasing the training corpus of financial data and using improved techniques for training embeddings.


## Usage (LLamaIndex)

Simply specify the Finlang embedding during the indexing procedure for your Financial RAG applications.

```
from llama_index.embeddings import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="FinLang/investopedia_embedding")
```


## Usage (Sentence-Transformers)

Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed (see https://huggingface.co/sentence-transformers):

```
pip install -U sentence-transformers
```

Then you can use the model like this:

```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('FinLang/investopedia_embedding')
embeddings = model.encode(sentences)
print(embeddings)
```

Example code testing:

```
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("FinLang/investopedia_embedding")

query_1 = "What is a potential concern with allowing someone else to store your cryptocurrency keys, and is it possible to decrypt a private key?"
query_2 = "A potential concern is that the entity holding your keys has control over your cryptocurrency in a custodial relationship. While it is theoretically possible to decrypt a private key, with current technology, it would take centuries or millennia for the 115 quattuorvigintillion possibilities. Most hacks and thefts occur in wallets, where private keys are stored."

embedding_1 = model.encode(query_1)
embedding_2 = model.encode(query_2)
scores = (embedding_1*embedding_2).sum()
print(scores) # 0.862
```


## Evaluation Results

We evaluate our model on unseen pairs of sentences for similarity and unseen shuffled pairs of sentences for dissimilarity. Our evaluation suite contains sentence pairs from: Investopedia (to test for proficiency on finance),
and Gooaq, MSMARCO,stackexchange_duplicate_questions_title_title, yahoo_answers_title_answer (to evaluate models ability to avoid forgetting after finetuning).

## License

Since non-commercial datasets are used for fine-tuning, we release this model as cc-by-nc-4.0.


## Citation [Coming Soon]