cecilia-uu commited on
Commit
8227469
·
1 Parent(s): 460dec2

added api documentation and added more tests (#1194)

Browse files

### What problem does this PR solve?

This PR added ragflow_api.md and more tests for API.

### Type of change

- [x] Documentation Update
- [x] Other (please describe): tests

docs/references/ragflow_api.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ sidebar_position: 1
3
+ slug: /api
4
+ ---
5
+
6
+ # API reference
7
+
8
+ RAGFlow offers RESTful APIs for you to integrate its capabilities into third-party applications.
9
+
10
+ ## Base URL
11
+ ```
12
+ https://demo.ragflow.io/api/v1/
13
+ ```
14
+
15
+ ## Authorization
16
+
17
+ All of RAGFlow's RESTFul APIs use API key for authorization, so keep it safe and do not expose it to the front end.
18
+ Put your API key in the request header.
19
+
20
+ ```buildoutcfg
21
+ Authorization: Bearer {API_KEY}
22
+ ```
23
+
24
+ To get your API key:
25
+
26
+ 1. In RAGFlow, click **Chat** tab in the middle top of the page.
27
+ 2. Hover over the corresponding dialogue **>** **Chat Bot API** to show the chatbot API configuration page.
28
+ 3. Click **Api Key** **>** **Create new key** to create your API key.
29
+ 4. Copy and keep your API key safe.
30
+
31
+ ## Create dataset
32
+
33
+ This method creates (news) a dataset for a specific user.
34
+
35
+ ### Request
36
+
37
+ #### Request URI
38
+
39
+ | Method | Request URI |
40
+ |--------|-------------|
41
+ | POST | `/dataset` |
42
+
43
+ :::note
44
+ You are *required* to save the `data.id` value returned in the response data, which is the session ID for all upcoming conversations.
45
+ :::
46
+
47
+ #### Request parameter
48
+
49
+ | Name | Type | Required | Description |
50
+ |----------------|--------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
51
+ | `dataset_name` | string | Yes | The unique identifier assigned to each newly created dataset. `dataset_name` must be less than 2 ** 10 characters and cannot be empty. The following character sets are supported: <br />- 26 lowercase English letters (a-z)<br />- 26 uppercase English letters (A-Z)<br />- 10 digits (0-9)<br />- "_", "-", "." |
52
+
53
+ ### Response
54
+
55
+ ```json
56
+ {
57
+ "code": 0,
58
+ "data": {
59
+ "dataset_name": "kb1"
60
+ },
61
+ "message": "success"
62
+ }
63
+ ```
64
+
65
+ ## Get dataset list
66
+
67
+ This method lists the created datasets for a specific user.
68
+
69
+ ### Request
70
+
71
+ #### Request URI
72
+
73
+ | Method | Request URI |
74
+ |----------|-------------|
75
+ | GET | `/dataset` |
76
+
77
+ ### Response
78
+
79
+ #### Response parameter
80
+
81
+ ```python
82
+ (200,
83
+ {
84
+ "code": 102,
85
+ "data": [
86
+ {
87
+ "avatar": None,
88
+ "chunk_num": 0,
89
+ "create_date": "Mon, 17 Jun 2024 16:00:05 GMT",
90
+ "create_time": 1718611205876,
91
+ "created_by": "b48110a0286411ef994a3043d7ee537e",
92
+ "description": None,
93
+ "doc_num": 0,
94
+ "embd_id": "BAAI/bge-large-zh-v1.5",
95
+ "id": "9bd6424a2c7f11ef81b83043d7ee537e",
96
+ "language": "Chinese",
97
+ "name": "dataset3(23)",
98
+ "parser_config": {
99
+ "pages": [
100
+ [
101
+ 1,
102
+ 1000000
103
+ ]
104
+ ]
105
+ },
106
+ "parser_id": "naive",
107
+ "permission": "me",
108
+ "similarity_threshold": 0.2,
109
+ "status": "1",
110
+ "tenant_id": "b48110a0286411ef994a3043d7ee537e",
111
+ "token_num": 0,
112
+ "update_date": "Mon, 17 Jun 2024 16:00:05 GMT",
113
+ "update_time": 1718611205876,
114
+ "vector_similarity_weight": 0.3
115
+ },
116
+ # ... additional datasets ...
117
+ ],
118
+ "message": "attempt to list datasets"
119
+ }
120
+ )
121
+ ```
122
+
123
+ ## Delete dataset
124
+
125
+ This method deletes a dataset for a specific user.
126
+
127
+ ### Request
128
+
129
+ #### Request URI
130
+
131
+ | Method | Request URI |
132
+ |--------|-------------------------|
133
+ | DELETE | `/dataset/{dataset_id}` |
134
+
135
+ #### Request parameter
136
+
137
+ | Name | Type | Required | Description |
138
+ |--------------|--------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
139
+ | `dataset_id` | string | Yes | The ID of the dataset. Call ['GET' /dataset](#create-dataset) to retrieve the ID. |
140
+
141
+ ### Response
142
+
143
+ ```json
144
+ {
145
+ "success": true,
146
+ "message": "Dataset deleted successfully!"
147
+ }
148
+ ```
sdk/python/test/test_dataset.py CHANGED
@@ -2,27 +2,134 @@ from test_sdkbase import TestSdk
2
  from ragflow import RAGFlow
3
  import pytest
4
  from common import API_KEY, HOST_ADDRESS
5
-
6
 
7
 
8
  class TestDataset(TestSdk):
9
-
10
- def test_create_dataset(self):
11
- '''
12
  1. create a kb
13
  2. list the kb
14
  3. get the detail info according to the kb id
15
  4. update the kb
16
  5. delete the kb
17
- '''
18
-
 
 
 
 
19
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
20
  # create a kb
21
  res = ragflow.create_dataset("kb1")
22
  assert res['code'] == 0 and res['message'] == 'success'
23
- dataset_name = res['data']['dataset_name']
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  def test_list_dataset_success(self):
 
 
 
26
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
27
  # Call the list_datasets method
28
  response = ragflow.list_dataset()
@@ -32,6 +139,9 @@ class TestDataset(TestSdk):
32
  assert code == 200
33
 
34
  def test_list_dataset_with_checking_size_and_name(self):
 
 
 
35
  datasets_to_create = ["dataset1", "dataset2", "dataset3"]
36
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
37
  created_response = [ragflow.create_dataset(name) for name in datasets_to_create]
@@ -51,6 +161,9 @@ class TestDataset(TestSdk):
51
  assert len(listed_data) == len(datasets_to_create)
52
 
53
  def test_list_dataset_with_getting_empty_result(self):
 
 
 
54
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
55
  datasets_to_create = []
56
  created_response = [ragflow.create_dataset(name) for name in datasets_to_create]
@@ -70,6 +183,9 @@ class TestDataset(TestSdk):
70
  assert len(listed_data) == 0
71
 
72
  def test_list_dataset_with_creating_100_knowledge_bases(self):
 
 
 
73
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
74
  datasets_to_create = ["dataset1"] * 100
75
  created_response = [ragflow.create_dataset(name) for name in datasets_to_create]
@@ -89,6 +205,9 @@ class TestDataset(TestSdk):
89
  assert len(listed_data) == 100
90
 
91
  def test_list_dataset_with_showing_one_dataset(self):
 
 
 
92
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
93
  response = ragflow.list_dataset(0, 1)
94
  code, response = response
@@ -96,26 +215,145 @@ class TestDataset(TestSdk):
96
  assert len(datasets) == 1
97
 
98
  def test_list_dataset_failure(self):
 
 
 
99
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
100
  response = ragflow.list_dataset(-1, -1)
101
  _, res = response
102
  assert "IndexError" in res['message']
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  def test_delete_one_dataset_with_success(self):
 
 
 
105
  # get the real name of the created dataset
106
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
107
  res = ragflow.create_dataset("kb0")
108
  real_dataset_name = res['data']['dataset_name']
109
- print("name", real_dataset_name)
110
  # delete this dataset
111
  result = ragflow.delete_dataset(real_dataset_name)
112
- print(result)
113
  assert result["success"] is True
114
 
115
  def test_delete_dataset_with_not_existing_dataset(self):
 
 
 
116
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
117
  res = ragflow.delete_dataset("weird_dataset")
118
  assert res["success"] is False
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
 
 
 
 
 
121
 
 
2
  from ragflow import RAGFlow
3
  import pytest
4
  from common import API_KEY, HOST_ADDRESS
5
+ from api.contants import NAME_LENGTH_LIMIT
6
 
7
 
8
  class TestDataset(TestSdk):
9
+ """
10
+ This class contains a suite of tests for the dataset management functionality within the RAGFlow system.
11
+ It ensures that the following functionalities as expected:
12
  1. create a kb
13
  2. list the kb
14
  3. get the detail info according to the kb id
15
  4. update the kb
16
  5. delete the kb
17
+ """
18
+ # -----------------------create_dataset---------------------------------
19
+ def test_create_dataset_with_success(self):
20
+ """
21
+ Test the creation of a new dataset with success.
22
+ """
23
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
24
  # create a kb
25
  res = ragflow.create_dataset("kb1")
26
  assert res['code'] == 0 and res['message'] == 'success'
 
27
 
28
+ def test_create_dataset_with_empty_name(self):
29
+ """
30
+ Test the creation of a new dataset with an empty name.
31
+ """
32
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
33
+ res = ragflow.create_dataset("")
34
+ assert res['message'] == 'Empty dataset name' and res['code'] == 102
35
+
36
+ def test_create_dataset_with_name_exceeding_limit(self):
37
+ """
38
+ Test the creation of a new dataset with the length of name exceeding the limit.
39
+ """
40
+ name = "k" * NAME_LENGTH_LIMIT + "b"
41
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
42
+ res = ragflow.create_dataset(name)
43
+ assert (res['message'] == f"Dataset name: {name} with length {len(name)} exceeds {NAME_LENGTH_LIMIT}!"
44
+ and res['code'] == 102)
45
+
46
+ def test_create_dataset_name_with_space_in_the_middle(self):
47
+ """
48
+ Test the creation of a new dataset whose name has space in the middle.
49
+ """
50
+ name = "k b"
51
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
52
+ res = ragflow.create_dataset(name)
53
+ assert (res['code'] == 0 and res['message'] == 'success')
54
+
55
+ def test_create_dataset_name_with_space_in_the_head(self):
56
+ """
57
+ Test the creation of a new dataset whose name has space in the head.
58
+ """
59
+ name = " kb"
60
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
61
+ res = ragflow.create_dataset(name)
62
+ assert (res['code'] == 0 and res['message'] == 'success')
63
+
64
+ def test_create_dataset_name_with_space_in_the_tail(self):
65
+ """
66
+ Test the creation of a new dataset whose name has space in the tail.
67
+ """
68
+ name = "kb "
69
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
70
+ res = ragflow.create_dataset(name)
71
+ assert (res['code'] == 0 and res['message'] == 'success')
72
+
73
+ def test_create_dataset_name_with_space_in_the_head_and_tail_and_length_exceed_limit(self):
74
+ """
75
+ Test the creation of a new dataset whose name has space in the head and tail,
76
+ and the length of the name exceeds the limit.
77
+ """
78
+ name = " " + "k" * NAME_LENGTH_LIMIT + " "
79
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
80
+ res = ragflow.create_dataset(name)
81
+ assert (res['code'] == 0 and res['message'] == 'success')
82
+
83
+ def test_create_dataset_with_two_same_name(self):
84
+ """
85
+ Test the creation of two new datasets with the same name.
86
+ """
87
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
88
+ res = ragflow.create_dataset("kb")
89
+ assert (res['code'] == 0 and res['message'] == 'success')
90
+ res = ragflow.create_dataset("kb")
91
+ assert (res['code'] == 0 and res['message'] == 'success')
92
+
93
+ def test_create_dataset_with_only_space_in_the_name(self):
94
+ """
95
+ Test the creation of a dataset whose name only has space.
96
+ """
97
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
98
+ res = ragflow.create_dataset(" ")
99
+ assert (res['code'] == 0 and res['message'] == 'success')
100
+
101
+ def test_create_dataset_with_space_number_exceeding_limit(self):
102
+ """
103
+ Test the creation of a dataset with a name that only has space exceeds the allowed limit.
104
+ """
105
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
106
+ name = " " * NAME_LENGTH_LIMIT
107
+ res = ragflow.create_dataset(name)
108
+ assert (res['code'] == 0 and res['message'] == 'success')
109
+
110
+ def test_create_dataset_with_name_having_return(self):
111
+ """
112
+ Test the creation of a dataset with a name that has return symbol.
113
+ """
114
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
115
+ name = "kb\n"
116
+ res = ragflow.create_dataset(name)
117
+ assert (res['code'] == 0 and res['message'] == 'success')
118
+
119
+ def test_create_dataset_with_name_having_the_null_character(self):
120
+ """
121
+ Test the creation of a dataset with a name that has the null character.
122
+ """
123
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
124
+ name = "kb\0"
125
+ res = ragflow.create_dataset(name)
126
+ assert (res['code'] == 0 and res['message'] == 'success')
127
+
128
+ # -----------------------list_dataset---------------------------------
129
  def test_list_dataset_success(self):
130
+ """
131
+ Test listing datasets with a successful outcome.
132
+ """
133
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
134
  # Call the list_datasets method
135
  response = ragflow.list_dataset()
 
139
  assert code == 200
140
 
141
  def test_list_dataset_with_checking_size_and_name(self):
142
+ """
143
+ Test listing datasets and verify the size and names of the datasets.
144
+ """
145
  datasets_to_create = ["dataset1", "dataset2", "dataset3"]
146
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
147
  created_response = [ragflow.create_dataset(name) for name in datasets_to_create]
 
161
  assert len(listed_data) == len(datasets_to_create)
162
 
163
  def test_list_dataset_with_getting_empty_result(self):
164
+ """
165
+ Test listing datasets that should be empty.
166
+ """
167
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
168
  datasets_to_create = []
169
  created_response = [ragflow.create_dataset(name) for name in datasets_to_create]
 
183
  assert len(listed_data) == 0
184
 
185
  def test_list_dataset_with_creating_100_knowledge_bases(self):
186
+ """
187
+ Test listing 100 datasets and verify the size and names of these datasets.
188
+ """
189
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
190
  datasets_to_create = ["dataset1"] * 100
191
  created_response = [ragflow.create_dataset(name) for name in datasets_to_create]
 
205
  assert len(listed_data) == 100
206
 
207
  def test_list_dataset_with_showing_one_dataset(self):
208
+ """
209
+ Test listing one dataset and verify the size of the dataset.
210
+ """
211
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
212
  response = ragflow.list_dataset(0, 1)
213
  code, response = response
 
215
  assert len(datasets) == 1
216
 
217
  def test_list_dataset_failure(self):
218
+ """
219
+ Test listing datasets with IndexError.
220
+ """
221
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
222
  response = ragflow.list_dataset(-1, -1)
223
  _, res = response
224
  assert "IndexError" in res['message']
225
 
226
+ def test_list_dataset_for_empty_datasets(self):
227
+ """
228
+ Test listing datasets when the datasets are empty.
229
+ """
230
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
231
+ response = ragflow.list_dataset()
232
+ code, response = response
233
+ datasets = response['data']
234
+ assert len(datasets) == 0
235
+
236
+ # TODO: have to set the limitation of the number of datasets
237
+
238
+ # -----------------------delete_dataset---------------------------------
239
  def test_delete_one_dataset_with_success(self):
240
+ """
241
+ Test deleting a dataset with success.
242
+ """
243
  # get the real name of the created dataset
244
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
245
  res = ragflow.create_dataset("kb0")
246
  real_dataset_name = res['data']['dataset_name']
 
247
  # delete this dataset
248
  result = ragflow.delete_dataset(real_dataset_name)
 
249
  assert result["success"] is True
250
 
251
  def test_delete_dataset_with_not_existing_dataset(self):
252
+ """
253
+ Test deleting a dataset that does not exist with failure.
254
+ """
255
  ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
256
  res = ragflow.delete_dataset("weird_dataset")
257
  assert res["success"] is False
258
 
259
+ def test_delete_dataset_with_creating_100_datasets_and_deleting_100_datasets(self):
260
+ """
261
+ Test deleting a dataset when creating 100 datasets and deleting 100 datasets.
262
+ """
263
+ # create 100 datasets
264
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
265
+ datasets_to_create = ["dataset1"] * 100
266
+ created_response = [ragflow.create_dataset(name) for name in datasets_to_create]
267
+
268
+ real_name_to_create = set()
269
+ for response in created_response:
270
+ assert 'data' in response, "Response is missing 'data' key"
271
+ dataset_name = response['data']['dataset_name']
272
+ real_name_to_create.add(dataset_name)
273
+
274
+ for name in real_name_to_create:
275
+ res = ragflow.delete_dataset(name)
276
+ assert res["success"] is True
277
+
278
+ def test_delete_dataset_with_space_in_the_middle_of_the_name(self):
279
+ """
280
+ Test deleting a dataset when its name has space in the middle.
281
+ """
282
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
283
+ res = ragflow.delete_dataset("k b")
284
+ print(res)
285
+ assert res["success"] is True
286
+
287
+ def test_delete_dataset_with_space_in_the_head_of_the_name(self):
288
+ """
289
+ Test deleting a dataset when its name has space in the head.
290
+ """
291
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
292
+ res = ragflow.delete_dataset(" kb")
293
+ assert res["success"] is False
294
+
295
+ def test_delete_dataset_with_space_in_the_tail_of_the_name(self):
296
+ """
297
+ Test deleting a dataset when its name has space in the tail.
298
+ """
299
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
300
+ res = ragflow.delete_dataset("kb ")
301
+ assert res["success"] is False
302
+
303
+ def test_delete_dataset_with_only_space_in_the_name(self):
304
+ """
305
+ Test deleting a dataset when its name only has space.
306
+ """
307
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
308
+ res = ragflow.delete_dataset(" ")
309
+ assert res["success"] is False
310
+
311
+ def test_delete_dataset_with_only_exceeding_limit_space_in_the_name(self):
312
+ """
313
+ Test deleting a dataset when its name only has space and the number of it exceeds the limit.
314
+ """
315
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
316
+ name = " " * (NAME_LENGTH_LIMIT + 1)
317
+ res = ragflow.delete_dataset(name)
318
+ assert res["success"] is False
319
+
320
+ def test_delete_dataset_with_name_with_space_in_the_head_and_tail_and_length_exceed_limit(self):
321
+ """
322
+ Test deleting a dataset whose name has space in the head and tail,
323
+ and the length of the name exceeds the limit.
324
+ """
325
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
326
+ name = " " + "k" * NAME_LENGTH_LIMIT + " "
327
+ res = ragflow.delete_dataset(name)
328
+ assert res["success"] is False
329
+
330
+ # ---------------------------------mix the different methods--------------------
331
+ def test_create_and_delete_dataset_together(self):
332
+ """
333
+ Test creating 1 dataset, and then deleting 1 dataset.
334
+ Test creating 10 datasets, and then deleting 10 datasets.
335
+ """
336
+ # create 1 dataset
337
+ ragflow = RAGFlow(API_KEY, HOST_ADDRESS)
338
+ res = ragflow.create_dataset("ddd")
339
+ assert res['code'] == 0 and res['message'] == 'success'
340
+
341
+ # delete 1 dataset
342
+ res = ragflow.delete_dataset("ddd")
343
+ assert res["success"] is True
344
+
345
+ # create 10 datasets
346
+ datasets_to_create = ["dataset1"] * 10
347
+ created_response = [ragflow.create_dataset(name) for name in datasets_to_create]
348
+
349
+ real_name_to_create = set()
350
+ for response in created_response:
351
+ assert 'data' in response, "Response is missing 'data' key"
352
+ dataset_name = response['data']['dataset_name']
353
+ real_name_to_create.add(dataset_name)
354
 
355
+ # delete 10 datasets
356
+ for name in real_name_to_create:
357
+ res = ragflow.delete_dataset(name)
358
+ assert res["success"] is True
359