alea-institute
/

charboundary-small-onnx

Text Classification

sentence-boundary-detection

paragraph-detection

text-segmentation

document-processing

optimized-inference

Model card Files Files and versions

alea-institute commited on Apr 11

Commit

02035e3

·

verified ·

1 Parent(s): c4ed0ce

Update README for small ONNX model

Files changed (1) hide show

README.md +8 -19

README.md CHANGED Viewed

@@ -51,32 +51,21 @@ a fast character-based sentence and paragraph boundary detection system optimize
 > **Security Advantage:** This ONNX model format provides enhanced security compared to SKOPS models, as it doesn't require bypassing security measures with `trust_model=True`. ONNX models are the recommended option for security-sensitive environments.
 ```python
-from huggingface_hub import hf_hub_download
-from charboundary import TextSegmenter
-from charboundary.onnx_support import enable_onnx
-# Enable ONNX support (make sure to install with: pip install charboundary[onnx])
-enable_onnx()
-# Download the compressed model
-model_path = hf_hub_download(repo_id="alea-institute/charboundary-small-onnx",
-                            filename="model.onnx.xz")
-# Load the model (handles .xz compression automatically)
-segmenter = TextSegmenter.load(model_path)
 # Use the model
 text = "This is a test sentence. Here's another one!"
 sentences = segmenter.segment_to_sentences(text)
 print(sentences)
-# Segment to paragraphs
-paragraphs = segmenter.segment_to_paragraphs(text)
-print(paragraphs)
-# Get character-level spans
-sentence_spans = segmenter.segment_to_sentence_spans(text)
-print(sentence_spans)  # [(0, 24), (25, 42)]
 ```
 ## Performance

 > **Security Advantage:** This ONNX model format provides enhanced security compared to SKOPS models, as it doesn't require bypassing security measures with `trust_model=True`. ONNX models are the recommended option for security-sensitive environments.
 ```python
+from charboundary import get_small_onnx_segmenter
+# First load can be slow
+segmenter = get_small_onnx_segmenter()
 # Use the model
 text = "This is a test sentence. Here's another one!"
 sentences = segmenter.segment_to_sentences(text)
 print(sentences)
+# Output: ['This is a test sentence.', " Here's another one!"]
+# Segment to spans
+sentence_spans = segmenter.get_sentence_spans(text)
+print(sentence_spans)
+# Output: [(0, 24), (24, 44)]
 ```
 ## Performance