Source Locations
DocuDevs can return page-aware evidence for structured extraction results.
Use this when you want the extracted value and the location on the document page where it came from.
Typical examples:
- contract clauses with page numbers and bounding boxes
- insurance deductible options linked to the pricing table
- extracted fields that need human review in the jobs UI
What It Returns
Source locations are returned as a separate artifact so the normal extracted result stays backward compatible.
GET /job/result/{guid}returns the extracted JSONGET /job/result/{guid}/source-locationsreturns the evidence manifest
Each entry is keyed by a JSON Pointer into the extracted result.
Example:
{
"version": 1,
"granularity": "block",
"pages": [
{
"pageNumber": 1,
"width": 8.5,
"height": 11.0,
"unit": "inch"
}
],
"locations": {
"/basic_deductible_options": {
"resolution": "block",
"sourceRefs": ["/paragraphs/9", "/paragraphs/11"],
"pageAnchors": [
{
"pageNumber": 1,
"unit": "inch",
"bbox": {
"left": 1.58,
"top": 3.91,
"right": 5.39,
"bottom": 5.06
},
"polygons": [
[
{ "x": 1.58, "y": 3.91 },
{ "x": 5.39, "y": 3.91 },
{ "x": 5.39, "y": 5.06 },
{ "x": 1.58, "y": 5.06 }
]
]
}
]
}
},
"unresolved": []
}
Supported Scope
Version 1 supports:
- structured extraction jobs
simpleextraction mode- documents with stable page-aware text layout
- PDF and image inputs
Version 1 does not support:
- map-reduce extraction
- every OCR and spreadsheet workflow
- exact citation for free-form generative summaries
For synthesized outputs, treat source locations as supporting evidence, not as a single exact box for the final conclusion.
Request Parameters
Enable the feature with:
sourceLocations=true- optional
sourceLocationGranularity
Allowed sourceLocationGranularity values:
autoblockword
Behavior:
block: returns layout-block geometryword: narrows to matched words when possibleauto: prefers word matches and falls back to block geometry
Python SDK
- Python SDK
- cURL
from docudevs_client import DocuDevsClient
import json
import os
import asyncio
client = DocuDevsClient(
api_url="https://api.docudevs.ai",
token=os.getenv("API_KEY"),
)
schema = json.dumps({
"type": "object",
"properties": {
"clauses": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["clauses"]
})
async def run():
with open("contract.pdf", "rb") as f:
guid = await client.submit_and_process_document(
document=f.read(),
document_mime_type="application/pdf",
prompt="Extract the main clauses from this contract.",
schema=schema,
source_locations=True,
source_location_granularity="block",
)
result = await client.wait_until_ready(guid)
source_locations = await client.get_source_locations(guid)
print(result)
print(source_locations)
asyncio.run(run())
You can also fetch both together:
combined = await client.wait_until_ready_with_source_locations(guid)
print(combined.result)
print(combined.source_locations)
curl -X POST "https://api.docudevs.ai/document/upload" \
-H "Authorization: YOUR_API_KEY" \
-F "file=@contract.pdf" \
-F 'command={
"prompt":"Extract all clauses from this contract.",
"schema":"{\"type\":\"object\",\"properties\":{\"clauses\":{\"type\":\"array\",\"items\":{\"type\":\"string\"}}},\"required\":[\"clauses\"]}",
"sourceLocations":true,
"sourceLocationGranularity":"block"
}'
Fetch the evidence artifact:
curl "https://api.docudevs.ai/job/result/JOB_GUID/source-locations" \
-H "Authorization: YOUR_API_KEY"
Jobs UI
When a completed structured job includes source locations, the jobs result inspector can show:
- evidence cards per extracted field
- page selectors
- page thumbnails
- bounding-box overlays on the selected page
- raw coordinate values for auditing
Document Support Notes
Source locations work best when DocuDevs can determine reliable page geometry for the input document.
Important caveat:
- PDF and image inputs are best supported
- some editable formats may not provide consistent page-aware evidence boxes for every extracted value
Recommended Usage
Prefer block granularity for longer fields such as clauses and summaries of a row or paragraph.
Prefer auto when you want better word-level precision but still want a successful fallback when exact word matching is not possible.