The BM25SRetriever for the wiki2021 corpus
The corpus was created by the Atlas project and the index was built using the FlexRAG library.
Corpus Attribute | Value |
---|---|
Language | English |
Domain | Wikipedia |
Size | 37.5M (33.1M text, 4.3M infobox) |
Dump Date | Dec 2021 |
Provideer | Atlas |
License | CC-BY-SA 3.0 |
Index Attribute | Value |
---|---|
Index Type | BM25S |
Index Method | Lucene |
Preprocessing | LengthFilter(min_char=10, max_char=4096) |
Provideer | FlexRAG |
License | CC-BY-SA 3.0 |
Installation
You can install the FlexRAG
library with pip
:
pip install flexrag
Loading a FlexRAG
retriever
You can use this retriever for information retrieval tasks. Here is an example:
from flexrag.retriever import LocalRetriever
# Load the retriever from the HuggingFace Hub
retriever = LocalRetriever.load_from_hub("FlexRAG/wiki2021_atlas_bm25s")
# You can retrieve now
results = retriever.search("Who is Bruce Wayne?")
Running the RAG application with the retriever
You can run the GUI application of the RAG assistant with this retriever. Here is an example:
python -m flexrag.entrypoints.run_interactive \
assistant_type=modular \
modular_config.used_fields=[title,text] \
modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \
modular_config.response_type=original \
modular_config.generator_type=openai \
modular_config.openai_config.model_name='gpt-4o-mini' \
modular_config.openai_config.api_key=$OPENAI_KEY \
modular_config.do_sample=False
You can also run the FlexRAG's RAG evaluation pipeline with this retriever. Here is an example that evaluates the ModularAssistant with the retriever on the Natural Questions test split:
OUTPUT_PATH=<path_to_output>
DB_PATH=<path_to_database>
OPENAI_KEY=<your_openai_key>
python -m flexrag.entrypoints.run_assistant \
name=nq \
split=test \
output_path=${OUTPUT_PATH} \
assistant_type=modular \
modular_config.used_fields=[title,text] \
modular_config.retriever_type="FlexRAG/wiki2021_atlas_bm25s" \
modular_config.generator_type=openai \
modular_config.openai_config.model_name='gpt-4o-mini' \
modular_config.openai_config.api_key=$OPENAI_KEY \
modular_config.do_sample=False \
eval_config.metrics_type=[retrieval_success_rate,generation_f1,generation_em] \
eval_config.retrieval_success_rate_config.context_preprocess.processor_type=[simplify_answer] \
eval_config.retrieval_success_rate_config.eval_field=text \
eval_config.response_preprocess.processor_type=[simplify_answer]
License
As the corpus is based on the CC-BY-SA 3.0 license, the retriever is also licensed under the same license.
Related Links
FlexRAG Related Links:
- ๐Documentation
- ๐ปGitHub Repository
- Downloads last month
- 16
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.