Skip to main content
The kdx document grep and kdx document lines commands provide Unix-style text search and line retrieval for KDDB files. Output is JSON Lines format by default, ideal for piping through jq and building shell pipelines.

Grep Command

Search document content for lines matching a pattern, similar to Unix grep.
kdx document grep <pattern> <file.kddb> [flags]

Flags

FlagShortDescriptionDefault
--ignore-case-iCase-insensitive matchingfalse
--extended-regexp-EUse extended regex (otherwise literal)false
--count-cOnly print count of matchesfalse
--context-CPrint N lines before and after0
--before-BPrint N lines before each match0
--after-APrint N lines after each match0
--maxMaximum number of results0 (unlimited)
--pageSearch only on page N (1-based)0 (all pages)
--prettyPretty-print JSON outputfalse
Find all lines containing a text pattern:
kdx document grep "risk factor" doc.kddb
{"uuid":"abc-123","content":"Risk factors include market volatility","page":5}
{"uuid":"def-456","content":"Additional risk factors discussed below","page":5}
kdx document grep -i "revenue" doc.kddb
Use -E for extended regex patterns:
# Find section headers with numbers
kdx document grep -E "Section\s+\d+" doc.kddb

# Find amounts with dollar signs
kdx document grep -E "\$[\d,]+\.\d{2}" doc.kddb

# Find dates in various formats
kdx document grep -E "\d{1,2}/\d{1,2}/\d{2,4}" doc.kddb

Context Lines

Show surrounding lines for context:
# 2 lines before and after each match
kdx document grep -C 2 "total revenue" doc.kddb

# 3 lines before, 1 line after
kdx document grep -B 3 -A 1 "conclusion" doc.kddb

Count Matches

Get just the count of matching lines:
kdx document grep -c "error" doc.kddb
12

Limit Results

Stop after finding N matches:
kdx document grep --max 10 "warning" doc.kddb

Filter by Page

Search only on a specific page:
kdx document grep --page 5 "total" doc.kddb

Pretty Print

For debugging, use pretty-printed JSON:
kdx document grep --pretty "revenue" doc.kddb
{
  "uuid": "abc-123",
  "content": "Total revenue for Q4 was $1.2M",
  "page": 3
}

Lines Command

Retrieve specific lines by index range, page number, or UUID.
kdx document lines <file.kddb> [flags]

Flags

FlagDescriptionDefault
--rangeLine index range (start:end, 0-based, exclusive end)
--pageGet all lines from page N (1-based)
--uuidGet lines by UUID (comma-separated)
--prettyPretty-print JSON outputfalse

Lines by Index Range

Get lines by their 0-based index (exclusive end, like Python slicing):
# Get lines 10-19 (10 lines total)
kdx document lines --range 10:20 doc.kddb

Lines by Page

Get all lines from a specific page:
# All lines from page 5
kdx document lines --page 5 doc.kddb

Lines by UUID

Get specific lines by their UUID:
# Single UUID
kdx document lines --uuid abc-123 doc.kddb

# Multiple UUIDs
kdx document lines --uuid abc-123,def-456,ghi-789 doc.kddb

All Lines

Without flags, returns all lines in the document:
kdx document lines doc.kddb

Output Format

Both commands output JSON Lines (JSONL) format by default - one JSON object per line:
{"uuid":"abc-123","content":"Line content here","page":3}
{"uuid":"def-456","content":"Another line of text","page":3}
{"uuid":"ghi-789","content":"Third line content","page":4}

Output Fields

FieldDescription
uuidUnique identifier for the line (from spatial:uuid feature)
contentText content of the line
pagePage number (1-based)

Shell Pipeline Examples

Extract UUIDs from Search Results

kdx document grep -i "risk" doc.kddb | jq -r '.uuid'

Count Matches per Page

kdx document grep "revenue" doc.kddb | jq -s 'group_by(.page) | map({page: .[0].page, count: length})'
[
  {"page": 3, "count": 5},
  {"page": 7, "count": 2},
  {"page": 12, "count": 8}
]

Get Content from UUIDs

# Read UUIDs from a file and fetch their content
UUIDS=$(cat uuids.txt | tr '\n' ',')
kdx document lines --uuid "$UUIDS" doc.kddb | jq -r '.content'

Extract Lines from Multiple Pages

for p in {10..15}; do
  kdx document lines --page $p doc.kddb
done | jq -s '.'

Filter and Transform

# Find all amounts and extract the values
kdx document grep -E '\$[\d,]+' doc.kddb | jq -r '.content'

# Get first 5 lines from each page
for p in $(seq 1 10); do
  kdx document lines --page $p doc.kddb | head -5
done

Build Context for LLM

# Extract specific content by UUID for LLM context
kdx document lines --uuid abc-123,def-456 doc.kddb | jq -r '.content' | paste -sd '\n' -

Search and Retrieve Full Context

# Find lines mentioning "revenue", then get their page content
PAGES=$(kdx document grep "revenue" doc.kddb | jq -r '.page' | sort -u)
for p in $PAGES; do
  echo "=== Page $p ==="
  kdx document lines --page $p doc.kddb | jq -r '.content'
done

Use Cases

Document Search and Analysis

Quickly search across document content:
# Find all mentions of a term
kdx document grep "depreciation" financials.kddb

# Case-insensitive search for variations
kdx document grep -i "ebitda" report.kddb

# Find patterns like phone numbers
kdx document grep -E "\(\d{3}\)\s*\d{3}-\d{4}" contract.kddb

Building LLM Prompts

Extract relevant content for LLM context windows:
# Get all lines from executive summary (page 2)
CONTEXT=$(kdx document lines --page 2 doc.kddb | jq -r '.content' | paste -sd '\n' -)

# Search for relevant lines and build context
CONTEXT=$(kdx document grep -i "key finding" doc.kddb | jq -r '.content' | paste -sd '\n' -)

Data Pipeline Integration

Use in automated workflows:
#!/bin/bash
# Extract all lines mentioning amounts
kdx document grep -E '\$[\d,]+' "$1" | while read -r line; do
  UUID=$(echo "$line" | jq -r '.uuid')
  CONTENT=$(echo "$line" | jq -r '.content')
  PAGE=$(echo "$line" | jq -r '.page')

  # Process each match...
  echo "Found amount on page $PAGE: $CONTENT"
done

Quality Assurance

Validate document content:
# Check if required sections exist
if kdx document grep -c "Terms and Conditions" contract.kddb | grep -q "^0$"; then
  echo "ERROR: Missing Terms and Conditions section"
  exit 1
fi

# Count pages with content
PAGES_WITH_CONTENT=$(kdx document grep "." doc.kddb | jq -r '.page' | sort -u | wc -l)
echo "Document has content on $PAGES_WITH_CONTENT pages"

Tips

Use -E (extended regex) when you need pattern matching. Without it, special characters like . and * are treated literally.
JSON Lines format streams well - you can process results as they arrive without loading everything into memory.
Combine grep with --max to quickly check if a pattern exists without processing the entire document.
The uuid field comes from the spatial:uuid feature on line nodes. Documents without this feature will have empty UUIDs.
Page numbers are 1-based in the output and flags, matching how users typically refer to pages.