Batch Processing
Coordinate the processing of multiple homogeneous documents under a single parent job.
Batches let you upload several files, run one configuration across all of them, and retrieve an ordered list of per-document results.
When To Use
Choose batches when:
- You have dozens or hundreds of similar documents that share the same prompt, schema, or template.
- You want consolidated progress tracking and a single job identifier.
- You want to reprocess a collection without re-uploading each file.
Lifecycle Overview
- Create an empty batch job to reserve a GUID.
- Upload documents one at a time (each upload receives an index).
- Process the batch by providing the extraction configuration.
- Monitor progress and completion.
- Fetch results as a list aligned with the upload order.
Step-by-Step Guide
- Python SDK
- Java SDK
- cURL
import asyncio
import json
import os
from docudevs.docudevs_client import DocuDevsClient
async def run_batch():
client = DocuDevsClient(token=os.environ["API_KEY"])
# 1. Create Batch
batch_guid = await client.create_batch(max_concurrency=3)
print(f"Created batch: {batch_guid}")
# 2. Upload Documents
files = ["invoice_jan.pdf", "invoice_feb.pdf", "invoice_mar.pdf"]
for path in files:
with open(path, "rb") as f:
await client.upload_batch_document(
batch_guid=batch_guid,
document=f.read(),
mime_type="application/pdf",
file_name=os.path.basename(path),
)
print(f"Uploaded {len(files)} documents.")
# 3. Process Batch
schema = {
"statements": [{
"date": "date",
"customer": "string",
"total": "number"
}]
}
await client.process_batch(
batch_guid=batch_guid,
mime_type="application/pdf",
prompt="Extract statement details.",
schema=json.dumps(schema),
)
print("Processing started...")
# 4. Wait for Results
results = await client.wait_until_ready(
batch_guid,
poll_interval=2,
result_format="json",
)
# 5. Handle Results
for i, result in enumerate(results):
if isinstance(result, str):
print(f"Doc {i} Error: {result}")
elif result is None:
print(f"Doc {i} Pending/Failed")
else:
print(f"Doc {i} Data: {result}")
if __name__ == "__main__":
asyncio.run(run_batch())
import ai.docudevs.client.DocuDevsClient;
import ai.docudevs.client.ProcessOptions;
import ai.docudevs.client.UploadRequest;
import ai.docudevs.client.WaitOptions;
import com.fasterxml.jackson.databind.JsonNode;
import java.nio.file.Files;
import java.nio.file.Path;
DocuDevsClient client = DocuDevsClient.builder()
.apiKey(System.getenv("API_KEY"))
.build();
// 1. Create Batch
String batchGuid = client.createBatch(3);
System.out.println("Created batch: " + batchGuid);
// 2. Upload Documents
for (String path : new String[]{"invoice_jan.pdf", "invoice_feb.pdf", "invoice_mar.pdf"}) {
byte[] fileBytes = Files.readAllBytes(Path.of(path));
UploadRequest upload = new UploadRequest(path, "application/pdf", fileBytes);
client.uploadBatchDocument(batchGuid, upload);
System.out.println("Uploaded " + path);
}
// 3. Process Batch
String schema = """
{
"type": "object",
"properties": {
"statements": {
"type": "array",
"items": {
"type": "object",
"properties": {
"date": {"type": "string"},
"customer": {"type": "string"},
"total": {"type": "number"}
}
}
}
}
}
""";
ProcessOptions options = ProcessOptions.builder()
.prompt("Extract statement details.")
.schema(schema)
.build();
client.processBatch(batchGuid, options);
System.out.println("Processing started...");
// 4. Wait and Fetch Results
JsonNode results = client.waitUntilReadyJson(batchGuid, WaitOptions.defaults());
System.out.println(results);
1. Create a Batch
curl -X POST "https://api.docudevs.ai/document/batch" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{ "mimeType": "application/pdf" }'
2. Upload Documents
Repeat for each file:
curl -X POST "https://api.docudevs.ai/document/batch/${BATCH_GUID}/upload" \
-H "Authorization: Bearer $API_KEY" \
-F "document=@invoice_jan.pdf"
3. Process the Batch
curl -X POST "https://api.docudevs.ai/document/batch/${BATCH_GUID}/process" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Extract statement details.",
"schema": "...",
"maxConcurrency": 4
}'
4. Check Status
curl -X GET "https://api.docudevs.ai/job/status/${BATCH_GUID}" \
-H "Authorization: Bearer $API_KEY"
5. Retrieve Results
curl -X GET "https://api.docudevs.ai/job/result/${BATCH_GUID}/json" \
-H "Authorization: Bearer $API_KEY"
Core Concepts
| Concept | Description |
|---|---|
isBatch | Flag on the parent job that identifies batch processing. |
totalDocuments | Count of documents currently attached to the batch. |
completedDocuments | Number of documents that finished successfully. |
failedDocuments | Number of documents that completed with errors. |
maxConcurrency | Upper bound of documents processed in parallel. |
Best Practices
- Concurrency: Set
maxConcurrencybetween 3 and 8 for optimal throughput. Lower it for very large files. - Validation: Ensure all documents in a batch are of the same type (e.g., all invoices) so they share the same schema.
- Error Handling: Batch jobs continue even if individual documents fail. Always check the results list for error strings.