Kaguya-19 commited on
Commit
f24b538
·
1 Parent(s): f3e7ca3

update readme

Browse files
Files changed (2) hide show
  1. README.md +4 -4
  2. code/requirements.txt +8 -0
README.md CHANGED
@@ -58,7 +58,7 @@ Download [MiniCPM4-Survey](https://huggingface.co/openbmb/MiniCPM4-Survey) from
58
  We recommend using [MiniCPM-Embedding-Light](https://huggingface.co/openbmb/MiniCPM-Embedding-Light) as the embedding model, which can be downloaded from Hugging Face and placed in `model/MiniCPM-Embedding-Light`.
59
  ### Perpare the environment
60
 
61
- You can download the [paper data](https://www.kaggle.com/datasets/Cornell-University/arxiv) from Kaggle, then extract it. You can run `python dataset_process.py` to process the data and generate the retrieval database. Then you can run `python build_index.py` to build the retrieval database.
62
 
63
  ```
64
  cd ./code
@@ -66,7 +66,7 @@ curl -L -o ~/Downloads/arxiv.zip\
66
  https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
67
  unzip ~/Downloads/arxiv.zip -d .
68
  mkdir data
69
- python ./src/preprocess/dataset_process.py
70
  mkdir index
71
  python ./src/preprocess/build_index.py
72
  ```
@@ -151,14 +151,14 @@ MiniCPM4-Survey是由[THUNLP](https://nlp.csai.tsinghua.edu.cn)、中国人民
151
  我们建议使用[MiniCPM-Embedding-Light](https://huggingface.co/openbmb/MiniCPM-Embedding-Light)作为表征模型,放在model/MiniCPM-Embedding-Light中。
152
 
153
  ### 准备环境
154
- 从 Kaggle 下载论文数据,然后解压。运行`python dataset_process.py`,处理数据并生成检索数据库。然后运行`python build_index.py`,构建检索数据库。
155
  ``` bash
156
  cd ./code
157
  curl -L -o ~/Downloads/arxiv.zip\
158
  https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
159
  unzip ~/Downloads/arxiv.zip -d .
160
  mkdir data
161
- python ./src/preprocess/dataset_process.py
162
  mkdir index
163
  python ./src/preprocess/build_index.py
164
  ```
 
58
  We recommend using [MiniCPM-Embedding-Light](https://huggingface.co/openbmb/MiniCPM-Embedding-Light) as the embedding model, which can be downloaded from Hugging Face and placed in `model/MiniCPM-Embedding-Light`.
59
  ### Perpare the environment
60
 
61
+ You can download the [paper data](https://www.kaggle.com/datasets/Cornell-University/arxiv) from Kaggle, then extract it. You can run `python data_process.py` to process the data and generate the retrieval database. Then you can run `python build_index.py` to build the retrieval database.
62
 
63
  ```
64
  cd ./code
 
66
  https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
67
  unzip ~/Downloads/arxiv.zip -d .
68
  mkdir data
69
+ python ./src/preprocess/data_process.py
70
  mkdir index
71
  python ./src/preprocess/build_index.py
72
  ```
 
151
  我们建议使用[MiniCPM-Embedding-Light](https://huggingface.co/openbmb/MiniCPM-Embedding-Light)作为表征模型,放在model/MiniCPM-Embedding-Light中。
152
 
153
  ### 准备环境
154
+ 从 Kaggle 下载论文数据,然后解压。运行`python data_process.py`,处理数据并生成检索数据库。然后运行`python build_index.py`,构建检索数据库。
155
  ``` bash
156
  cd ./code
157
  curl -L -o ~/Downloads/arxiv.zip\
158
  https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
159
  unzip ~/Downloads/arxiv.zip -d .
160
  mkdir data
161
+ python ./src/preprocess/data_process.py
162
  mkdir index
163
  python ./src/preprocess/build_index.py
164
  ```
code/requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ openai
2
+ vllm
3
+ jsonlines
4
+ faiss-cpu
5
+ # faiss-gpu
6
+ fastapi
7
+ uvicorn
8
+ yarl