Skip to main content
The kdx document command provides a comprehensive set of tools for inspecting, searching, annotating, and extracting structured data from local Kodexa documents (KDDB files). These commands work offline without requiring a connection to the Kodexa platform.

What are KDDB Files?

KDDB (Kodexa Document Database) files are SQLite-based document containers that store:
  • Document structure - Hierarchical content nodes with types and content
  • Metadata - Document properties like UUID, version, and custom fields
  • Tags - Annotations on content nodes for extraction workflows
  • Data objects - Structured extracted data with attributes
  • Native files - Embedded binary files (PDFs, images, etc.)
  • Audit trail - Full revision history of data changes

Reading & Analysis

Commands for understanding document content without modifying anything.

Structure & Metadata

Commands for inspecting document structure, annotations, and spatial layout.

Data & Tagging

Commands that modify the document by adding tags, creating data objects, and setting attributes.

Agentic Workflows

Agentic CLI Use

How AI agents use kdx document commands to analyze and annotate documents programmatically. Includes end-to-end workflow examples and best practices.

Quick Start

Explore a Document

# Get overview
kdx document info invoice.kddb

# Check structure
kdx document print invoice.kddb --depth 3

# Read page content
kdx document text invoice.kddb --pages 1:3

Search and Annotate

# Find content
kdx document locate invoice.kddb --pattern "\$[\d,]+\.\d{2}" --type word

# Tag a node
kdx document tag invoice.kddb --node-id 245 --name "invoice/amount"

# Create structured data
kdx document data create invoice.kddb --path "INVOICE"
kdx document data set-attribute invoice.kddb \
  --object-id 1 --tag "total" --value "1234.56" --type CURRENCY

Scripting with JSON

# JSON Lines output pipes well with jq
kdx document grep "revenue" report.kddb | jq -r '.content'

# Get stats for scripting
PAGE_COUNT=$(kdx document info report.kddb -o json | jq '.statistics.pageCount')

Output Formats

All document commands support the global -o flag:
kdx document info doc.kddb -o json    # JSON output
kdx document info doc.kddb -o yaml    # YAML output
kdx document info doc.kddb -o table   # Table output (default)
Search commands (grep, find, locate) output JSON Lines (JSONL) by default for streaming. Use --pretty for readable output.