To use the GUI version of the application: click here
Transform Your Document Processing with tenbase2.com
Document Segmentor REST API
Turn complex documents into structured, analyzable data with our powerful document segmentation REST API. Whether you’re building an advanced document analysis system, creating a content management solution, or developing educational tools, our API delivers precise document breakdown at multiple granularity levels.
Features That Set Us Apart
Multi-Format Support
- Process documents from various sources:
- PDF files
- HTML files (optimized for ABBYY FineReader OCR output)
- Plain text documents
Clean, Focused Segmentation
Our API automatically excludes non-essential elements to deliver clean, relevant content: – Table of contents removed – Page numbers stripped – Footnotes excluded
Flexible Segmentation Options
Break down documents exactly how you need them: – Page-level segmentation for preserving original document structure – Paragraph-level parsing for logical content blocks – Sentence-level analysis for fine-grained text processing – Intelligent footnote detection and parsing (when using ABBYY FineReader OCR-generated HTML)
Built for Developers
- RESTful API architecture for seamless integration
- Clear, consistent JSON responses
- Comprehensive API documentation
- No API key required
- Unlimited free usage
- Low latency processing
How It Works
-
Submit Your Document Send your document to our API endpoint using a simple POST request.
-
Choose Your Segmentation Level Specify whether you want page, paragraph, or sentence-level segmentation.
-
Receive Structured Results Get back clean, organized JSON containing your segmented document data.
Example Code Snippet
Here’s how you can use the API with a simple POST request in Python:
import requests
# Define the API endpoint
url = "https://tenbase2.ai/api/segmentor/seg"
# Open the file in binary mode
# Files are pdf, html, txt, or zip
with open("path/to/your/file.txt", "rb") as file:
files = { "file": file }
# Add the iParseType parameter (set to 0, 1, or 2)
params = {
"iParseType": 0 # 0=sentence, 1=paragraph, 2=page
}
# Make the POST request to upload the file with the parameter
response = requests.post(url, files=files, params=params)
# Check the response status
if response.status_code == 200:
try:
data = response.json()
if isinstance(data, dict) and "items" in data:
for item in data.get("items", []):
print(item)
else:
print("Unexpected response format:", data)
except ValueError:
print("Failed to parse JSON:", response.text)
else:
print("Failed to upload file:", response.status_code, response.text)