Franker11 commited on
Commit
368b624
·
1 Parent(s): 19d0b0d

fixbug for computing 'not concating feature' (#896)

Browse files

### What problem does this PR solve?

When pdfparser call `_naive_vertical_merge` method,there is a "not
concating feature " value by computing difference between `b` and `b_`'s
layoutno ,but actually is `b` and `b`. I think it's a bug, so fix it.
Please check again.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Files changed (1) hide show
  1. deepdoc/parser/pdf_parser.py +1 -1
deepdoc/parser/pdf_parser.py CHANGED
@@ -396,7 +396,7 @@ class RAGFlowPdfParser:
396
  ]
397
  # features for not concating
398
  feats = [
399
- b.get("layoutno", 0) != b.get("layoutno", 0),
400
  b["text"].strip()[-1] in "。?!?",
401
  self.is_english and b["text"].strip()[-1] in ".!?",
402
  b["page_number"] == b_["page_number"] and b_["top"] -
 
396
  ]
397
  # features for not concating
398
  feats = [
399
+ b.get("layoutno", 0) != b_.get("layoutno", 0),
400
  b["text"].strip()[-1] in "。?!?",
401
  self.is_english and b["text"].strip()[-1] in ".!?",
402
  b["page_number"] == b_["page_number"] and b_["top"] -