fixbug for computing 'not concating feature' (#896)
Browse files### What problem does this PR solve?
When pdfparser call `_naive_vertical_merge` method,there is a "not
concating feature " value by computing difference between `b` and `b_`'s
layoutno ,but actually is `b` and `b`. I think it's a bug, so fix it.
Please check again.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
deepdoc/parser/pdf_parser.py
CHANGED
@@ -396,7 +396,7 @@ class RAGFlowPdfParser:
|
|
396 |
]
|
397 |
# features for not concating
|
398 |
feats = [
|
399 |
-
b.get("layoutno", 0) !=
|
400 |
b["text"].strip()[-1] in "。?!?",
|
401 |
self.is_english and b["text"].strip()[-1] in ".!?",
|
402 |
b["page_number"] == b_["page_number"] and b_["top"] -
|
|
|
396 |
]
|
397 |
# features for not concating
|
398 |
feats = [
|
399 |
+
b.get("layoutno", 0) != b_.get("layoutno", 0),
|
400 |
b["text"].strip()[-1] in "。?!?",
|
401 |
self.is_english and b["text"].strip()[-1] in ".!?",
|
402 |
b["page_number"] == b_["page_number"] and b_["top"] -
|