DREX-062225-exp.png

DREX-062225-exp

The DREX-062225-exp (Document Retrieval and Extraction eXpert) model is a specialized fine-tuned version of docscopeOCR-7B-050425-exp, optimized for Document Retrieval, Content Extraction, and Analysis Recognition. Built on top of the Qwen2.5-VL architecture, this model enhances document comprehension capabilities with focused training on the Opendoc2-Analysis-Recognition dataset for superior document analysis and information extraction tasks.

DREX: Document Retrieval and Extraction eXpert [ experimental ]

Key Enhancements

  • Advanced Document Retrieval: Specialized capabilities for locating and retrieving specific information from complex document structures and layouts.

  • Enhanced Content Extraction: Optimized for extracting structured data, key information, and relevant content from diverse document types including reports, forms, and technical documentation.

  • Superior Analysis Recognition: Fine-tuned recognition abilities for document analysis tasks, pattern identification, and contextual understanding of document hierarchies.

  • Inherited OCR Excellence: Maintains all advanced OCR capabilities from the base docscopeOCR model including mathematical LaTeX formatting and multi-language support.

  • Document-Centric Understanding: Specialized training for understanding document relationships, cross-references, and contextual dependencies within complex document sets.


Markdown (.MD) - Inference

1.png


2.png


Quick Start with Transformers

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/DREX-062225-exp", torch_dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained("prithivMLmods/DREX-062225-exp")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Extract and analyze the key information from this document."},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Training Details

Parameter Value
Dataset Opendoc2-Analysis-Recognition
Dataset Size 6,910 samples
Base Model docscopeOCR-7B-050425-exp
Model Architecture Qwen2_5_VLForConditionalGeneration
Hardware 2 × A40 (19 vCPUs)
Total Disk 280,000 MB
Training Time 3,407 seconds (~0.95 hours)
Warmup Steps 250
Precision bfloat16

This model builds upon the robust foundation of docscopeOCR-7B-050425-exp with specialized training for document retrieval and extraction tasks.

Intended Use

This model is specifically designed for:

  • Document Retrieval: Efficiently locating specific information within large document collections and complex layouts.
  • Content Extraction: Precise extraction of structured data, tables, forms, and key information from various document types.
  • Analysis Recognition: Advanced recognition and analysis of document patterns, structures, and contextual relationships.
  • Enterprise Document Processing: Automated processing of business documents, reports, contracts, and administrative forms.
  • Research Document Analysis: Academic paper analysis, citation extraction, and research document comprehension.
  • Regulatory Compliance: Processing of compliance documents, regulatory filings, and standardized reporting formats.

Limitations

  • Inherits computational requirements from the base docscopeOCR model, requiring substantial resources for optimal performance.
  • Performance may vary on document types significantly different from the Opendoc2-Analysis-Recognition training dataset.
  • May show reduced accuracy on extremely specialized or domain-specific document formats not covered in training.
  • Long document processing requires adequate memory allocation and may not be suitable for real-time streaming applications.
  • Optimal performance depends on proper visual token configuration and input preprocessing.

References

Downloads last month
327
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/DREX-062225-exp

Finetuned
(3)
this model
Quantizations
2 models

Dataset used to train prithivMLmods/DREX-062225-exp

Space using prithivMLmods/DREX-062225-exp 1

Collection including prithivMLmods/DREX-062225-exp