update readme
Browse files- README.md +4 -4
- code/requirements.txt +8 -0
README.md
CHANGED
@@ -58,7 +58,7 @@ Download [MiniCPM4-Survey](https://huggingface.co/openbmb/MiniCPM4-Survey) from
|
|
58 |
We recommend using [MiniCPM-Embedding-Light](https://huggingface.co/openbmb/MiniCPM-Embedding-Light) as the embedding model, which can be downloaded from Hugging Face and placed in `model/MiniCPM-Embedding-Light`.
|
59 |
### Perpare the environment
|
60 |
|
61 |
-
You can download the [paper data](https://www.kaggle.com/datasets/Cornell-University/arxiv) from Kaggle, then extract it. You can run `python
|
62 |
|
63 |
```
|
64 |
cd ./code
|
@@ -66,7 +66,7 @@ curl -L -o ~/Downloads/arxiv.zip\
|
|
66 |
https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
|
67 |
unzip ~/Downloads/arxiv.zip -d .
|
68 |
mkdir data
|
69 |
-
python ./src/preprocess/
|
70 |
mkdir index
|
71 |
python ./src/preprocess/build_index.py
|
72 |
```
|
@@ -151,14 +151,14 @@ MiniCPM4-Survey是由[THUNLP](https://nlp.csai.tsinghua.edu.cn)、中国人民
|
|
151 |
我们建议使用[MiniCPM-Embedding-Light](https://huggingface.co/openbmb/MiniCPM-Embedding-Light)作为表征模型,放在model/MiniCPM-Embedding-Light中。
|
152 |
|
153 |
### 准备环境
|
154 |
-
从 Kaggle 下载论文数据,然后解压。运行`python
|
155 |
``` bash
|
156 |
cd ./code
|
157 |
curl -L -o ~/Downloads/arxiv.zip\
|
158 |
https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
|
159 |
unzip ~/Downloads/arxiv.zip -d .
|
160 |
mkdir data
|
161 |
-
python ./src/preprocess/
|
162 |
mkdir index
|
163 |
python ./src/preprocess/build_index.py
|
164 |
```
|
|
|
58 |
We recommend using [MiniCPM-Embedding-Light](https://huggingface.co/openbmb/MiniCPM-Embedding-Light) as the embedding model, which can be downloaded from Hugging Face and placed in `model/MiniCPM-Embedding-Light`.
|
59 |
### Perpare the environment
|
60 |
|
61 |
+
You can download the [paper data](https://www.kaggle.com/datasets/Cornell-University/arxiv) from Kaggle, then extract it. You can run `python data_process.py` to process the data and generate the retrieval database. Then you can run `python build_index.py` to build the retrieval database.
|
62 |
|
63 |
```
|
64 |
cd ./code
|
|
|
66 |
https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
|
67 |
unzip ~/Downloads/arxiv.zip -d .
|
68 |
mkdir data
|
69 |
+
python ./src/preprocess/data_process.py
|
70 |
mkdir index
|
71 |
python ./src/preprocess/build_index.py
|
72 |
```
|
|
|
151 |
我们建议使用[MiniCPM-Embedding-Light](https://huggingface.co/openbmb/MiniCPM-Embedding-Light)作为表征模型,放在model/MiniCPM-Embedding-Light中。
|
152 |
|
153 |
### 准备环境
|
154 |
+
从 Kaggle 下载论文数据,然后解压。运行`python data_process.py`,处理数据并生成检索数据库。然后运行`python build_index.py`,构建检索数据库。
|
155 |
``` bash
|
156 |
cd ./code
|
157 |
curl -L -o ~/Downloads/arxiv.zip\
|
158 |
https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
|
159 |
unzip ~/Downloads/arxiv.zip -d .
|
160 |
mkdir data
|
161 |
+
python ./src/preprocess/data_process.py
|
162 |
mkdir index
|
163 |
python ./src/preprocess/build_index.py
|
164 |
```
|
code/requirements.txt
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
openai
|
2 |
+
vllm
|
3 |
+
jsonlines
|
4 |
+
faiss-cpu
|
5 |
+
# faiss-gpu
|
6 |
+
fastapi
|
7 |
+
uvicorn
|
8 |
+
yarl
|