Query Commands - Kodexa Developer Portal

The kdx document grep and kdx document lines commands provide Unix-style text search and line retrieval for KDDB files. Output is JSON Lines format by default, ideal for piping through jq and building shell pipelines.

Grep Command

Search document content for lines matching a pattern, similar to Unix grep.

kdx document grep <pattern> <file.kddb> [flags]

Flags

Flag	Short	Description	Default
`--ignore-case`	`-i`	Case-insensitive matching	false
`--extended-regexp`	`-E`	Use extended regex (otherwise literal)	false
`--count`	`-c`	Only print count of matches	false
`--context`	`-C`	Print N lines before and after	0
`--before`	`-B`	Print N lines before each match	0
`--after`	`-A`	Print N lines after each match	0
`--max`		Maximum number of results	0 (unlimited)
`--page`		Search only on page N (1-based)	0 (all pages)
`--pretty`		Pretty-print JSON output	false

Basic Search

Find all lines containing a text pattern:

kdx document grep "risk factor" doc.kddb

{"uuid":"abc-123","content":"Risk factors include market volatility","page":5}
{"uuid":"def-456","content":"Additional risk factors discussed below","page":5}

Case-Insensitive Search

kdx document grep -i "revenue" doc.kddb

Regex Search

Use -E for extended regex patterns:

# Find section headers with numbers
kdx document grep -E "Section\s+\d+" doc.kddb

# Find amounts with dollar signs
kdx document grep -E "\$[\d,]+\.\d{2}" doc.kddb

# Find dates in various formats
kdx document grep -E "\d{1,2}/\d{1,2}/\d{2,4}" doc.kddb

Context Lines

Show surrounding lines for context:

# 2 lines before and after each match
kdx document grep -C 2 "total revenue" doc.kddb

# 3 lines before, 1 line after
kdx document grep -B 3 -A 1 "conclusion" doc.kddb

Count Matches

Get just the count of matching lines:

kdx document grep -c "error" doc.kddb

Limit Results

Stop after finding N matches:

kdx document grep --max 10 "warning" doc.kddb

Filter by Page

Search only on a specific page:

kdx document grep --page 5 "total" doc.kddb

Pretty Print

For debugging, use pretty-printed JSON:

kdx document grep --pretty "revenue" doc.kddb

{
  "uuid": "abc-123",
  "content": "Total revenue for Q4 was $1.2M",
  "page": 3
}

Lines Command

Retrieve specific lines by index range, page number, or UUID.

kdx document lines <file.kddb> [flags]

Flags

Flag	Description	Default
`--range`	Line index range (start:end, 0-based, exclusive end)
`--page`	Get all lines from page N (1-based)
`--uuid`	Get lines by UUID (comma-separated)
`--pretty`	Pretty-print JSON output	false

Lines by Index Range

Get lines by their 0-based index (exclusive end, like Python slicing):

# Get lines 10-19 (10 lines total)
kdx document lines --range 10:20 doc.kddb

Lines by Page

Get all lines from a specific page:

# All lines from page 5
kdx document lines --page 5 doc.kddb

Lines by UUID

Get specific lines by their UUID:

# Single UUID
kdx document lines --uuid abc-123 doc.kddb

# Multiple UUIDs
kdx document lines --uuid abc-123,def-456,ghi-789 doc.kddb

All Lines

Without flags, returns all lines in the document:

kdx document lines doc.kddb

Output Format

Both commands output JSON Lines (JSONL) format by default - one JSON object per line:

{"uuid":"abc-123","content":"Line content here","page":3}
{"uuid":"def-456","content":"Another line of text","page":3}
{"uuid":"ghi-789","content":"Third line content","page":4}

Output Fields

Field	Description
`uuid`	Unique identifier for the line (from `spatial:uuid` feature)
`content`	Text content of the line
`page`	Page number (1-based)

Shell Pipeline Examples

Extract UUIDs from Search Results

kdx document grep -i "risk" doc.kddb | jq -r '.uuid'

Count Matches per Page

kdx document grep "revenue" doc.kddb | jq -s 'group_by(.page) | map({page: .[0].page, count: length})'

[
  {"page": 3, "count": 5},
  {"page": 7, "count": 2},
  {"page": 12, "count": 8}
]

Get Content from UUIDs

# Read UUIDs from a file and fetch their content
UUIDS=$(cat uuids.txt | tr '\n' ',')
kdx document lines --uuid "$UUIDS" doc.kddb | jq -r '.content'

Extract Lines from Multiple Pages

for p in {10..15}; do
  kdx document lines --page $p doc.kddb
done | jq -s '.'

Filter and Transform

# Find all amounts and extract the values
kdx document grep -E '\$[\d,]+' doc.kddb | jq -r '.content'

# Get first 5 lines from each page
for p in $(seq 1 10); do
  kdx document lines --page $p doc.kddb | head -5
done

Build Context for LLM

# Extract specific content by UUID for LLM context
kdx document lines --uuid abc-123,def-456 doc.kddb | jq -r '.content' | paste -sd '\n' -

Search and Retrieve Full Context

# Find lines mentioning "revenue", then get their page content
PAGES=$(kdx document grep "revenue" doc.kddb | jq -r '.page' | sort -u)
for p in $PAGES; do
  echo "=== Page $p ==="
  kdx document lines --page $p doc.kddb | jq -r '.content'
done

Use Cases

Document Search and Analysis

Quickly search across document content:

# Find all mentions of a term
kdx document grep "depreciation" financials.kddb

# Case-insensitive search for variations
kdx document grep -i "ebitda" report.kddb

# Find patterns like phone numbers
kdx document grep -E "\(\d{3}\)\s*\d{3}-\d{4}" contract.kddb

Building LLM Prompts

Extract relevant content for LLM context windows:

# Get all lines from executive summary (page 2)
CONTEXT=$(kdx document lines --page 2 doc.kddb | jq -r '.content' | paste -sd '\n' -)

# Search for relevant lines and build context
CONTEXT=$(kdx document grep -i "key finding" doc.kddb | jq -r '.content' | paste -sd '\n' -)

Data Pipeline Integration

Use in automated workflows:

#!/bin/bash
# Extract all lines mentioning amounts
kdx document grep -E '\$[\d,]+' "$1" | while read -r line; do
  UUID=$(echo "$line" | jq -r '.uuid')
  CONTENT=$(echo "$line" | jq -r '.content')
  PAGE=$(echo "$line" | jq -r '.page')

  # Process each match...
  echo "Found amount on page $PAGE: $CONTENT"
done

Quality Assurance

Validate document content:

# Check if required sections exist
if kdx document grep -c "Terms and Conditions" contract.kddb | grep -q "^0$"; then
  echo "ERROR: Missing Terms and Conditions section"
  exit 1
fi

# Count pages with content
PAGES_WITH_CONTENT=$(kdx document grep "." doc.kddb | jq -r '.page' | sort -u | wc -l)
echo "Document has content on $PAGES_WITH_CONTENT pages"

Tips

Use -E (extended regex) when you need pattern matching. Without it, special characters like . and * are treated literally.

JSON Lines format streams well - you can process results as they arrive without loading everything into memory.

Combine grep with --max to quickly check if a pattern exists without processing the entire document.

The uuid field comes from the spatial:uuid feature on line nodes. Documents without this feature will have empty UUIDs.

Page numbers are 1-based in the output and flags, matching how users typically refer to pages.

Document Structure

Print and select commands for structure inspection

Document Overview

All document commands

SDK: Selectors

Programmatic node selection

Getting Started

Core Operations

Extended Commands

Document Commands

DevOps & GitOps

​Grep Command

​Flags

​Basic Search

​Case-Insensitive Search

​Regex Search

​Context Lines

​Count Matches

​Limit Results

​Filter by Page

​Pretty Print

​Lines Command

​Flags

​Lines by Index Range

​Lines by Page

​Lines by UUID

​All Lines

​Output Format

​Output Fields

​Shell Pipeline Examples

​Extract UUIDs from Search Results

​Count Matches per Page

​Get Content from UUIDs

​Extract Lines from Multiple Pages

​Filter and Transform

​Build Context for LLM

​Search and Retrieve Full Context

​Use Cases

​Document Search and Analysis

​Building LLM Prompts

​Data Pipeline Integration

​Quality Assurance

​Tips

​Related

Document Structure

Document Overview

SDK: Selectors

Grep Command

Flags

Basic Search

Case-Insensitive Search

Regex Search

Context Lines

Count Matches

Limit Results

Filter by Page

Pretty Print

Lines Command

Flags

Lines by Index Range

Lines by Page

Lines by UUID

All Lines

Output Format

Output Fields

Shell Pipeline Examples

Extract UUIDs from Search Results

Count Matches per Page

Get Content from UUIDs

Extract Lines from Multiple Pages

Filter and Transform

Build Context for LLM

Search and Retrieve Full Context

Use Cases

Document Search and Analysis

Building LLM Prompts

Data Pipeline Integration

Quality Assurance

Tips

Related