The kdx document grep and kdx document lines commands provide Unix-style text search and line retrieval for KDDB files. Output is JSON Lines format by default, ideal for piping through jq and building shell pipelines.
Grep Command
Search document content for lines matching a pattern, similar to Unix grep.
kdx document grep <pattern> <file.kddb> [flags]
Flags
| Flag | Short | Description | Default |
|---|
--ignore-case | -i | Case-insensitive matching | false |
--extended-regexp | -E | Use extended regex (otherwise literal) | false |
--count | -c | Only print count of matches | false |
--context | -C | Print N lines before and after | 0 |
--before | -B | Print N lines before each match | 0 |
--after | -A | Print N lines after each match | 0 |
--max | | Maximum number of results | 0 (unlimited) |
--page | | Search only on page N (1-based) | 0 (all pages) |
--pretty | | Pretty-print JSON output | false |
Basic Search
Find all lines containing a text pattern:
kdx document grep "risk factor" doc.kddb
{"uuid":"abc-123","content":"Risk factors include market volatility","page":5}
{"uuid":"def-456","content":"Additional risk factors discussed below","page":5}
Case-Insensitive Search
kdx document grep -i "revenue" doc.kddb
Regex Search
Use -E for extended regex patterns:
# Find section headers with numbers
kdx document grep -E "Section\s+\d+" doc.kddb
# Find amounts with dollar signs
kdx document grep -E "\$[\d,]+\.\d{2}" doc.kddb
# Find dates in various formats
kdx document grep -E "\d{1,2}/\d{1,2}/\d{2,4}" doc.kddb
Context Lines
Show surrounding lines for context:
# 2 lines before and after each match
kdx document grep -C 2 "total revenue" doc.kddb
# 3 lines before, 1 line after
kdx document grep -B 3 -A 1 "conclusion" doc.kddb
Count Matches
Get just the count of matching lines:
kdx document grep -c "error" doc.kddb
Limit Results
Stop after finding N matches:
kdx document grep --max 10 "warning" doc.kddb
Filter by Page
Search only on a specific page:
kdx document grep --page 5 "total" doc.kddb
Pretty Print
For debugging, use pretty-printed JSON:
kdx document grep --pretty "revenue" doc.kddb
{
"uuid": "abc-123",
"content": "Total revenue for Q4 was $1.2M",
"page": 3
}
Lines Command
Retrieve specific lines by index range, page number, or UUID.
kdx document lines <file.kddb> [flags]
Flags
| Flag | Description | Default |
|---|
--range | Line index range (start:end, 0-based, exclusive end) | |
--page | Get all lines from page N (1-based) | |
--uuid | Get lines by UUID (comma-separated) | |
--pretty | Pretty-print JSON output | false |
Lines by Index Range
Get lines by their 0-based index (exclusive end, like Python slicing):
# Get lines 10-19 (10 lines total)
kdx document lines --range 10:20 doc.kddb
Lines by Page
Get all lines from a specific page:
# All lines from page 5
kdx document lines --page 5 doc.kddb
Lines by UUID
Get specific lines by their UUID:
# Single UUID
kdx document lines --uuid abc-123 doc.kddb
# Multiple UUIDs
kdx document lines --uuid abc-123,def-456,ghi-789 doc.kddb
All Lines
Without flags, returns all lines in the document:
kdx document lines doc.kddb
Both commands output JSON Lines (JSONL) format by default - one JSON object per line:
{"uuid":"abc-123","content":"Line content here","page":3}
{"uuid":"def-456","content":"Another line of text","page":3}
{"uuid":"ghi-789","content":"Third line content","page":4}
Output Fields
| Field | Description |
|---|
uuid | Unique identifier for the line (from spatial:uuid feature) |
content | Text content of the line |
page | Page number (1-based) |
Shell Pipeline Examples
kdx document grep -i "risk" doc.kddb | jq -r '.uuid'
Count Matches per Page
kdx document grep "revenue" doc.kddb | jq -s 'group_by(.page) | map({page: .[0].page, count: length})'
[
{"page": 3, "count": 5},
{"page": 7, "count": 2},
{"page": 12, "count": 8}
]
Get Content from UUIDs
# Read UUIDs from a file and fetch their content
UUIDS=$(cat uuids.txt | tr '\n' ',')
kdx document lines --uuid "$UUIDS" doc.kddb | jq -r '.content'
for p in {10..15}; do
kdx document lines --page $p doc.kddb
done | jq -s '.'
# Find all amounts and extract the values
kdx document grep -E '\$[\d,]+' doc.kddb | jq -r '.content'
# Get first 5 lines from each page
for p in $(seq 1 10); do
kdx document lines --page $p doc.kddb | head -5
done
Build Context for LLM
# Extract specific content by UUID for LLM context
kdx document lines --uuid abc-123,def-456 doc.kddb | jq -r '.content' | paste -sd '\n' -
Search and Retrieve Full Context
# Find lines mentioning "revenue", then get their page content
PAGES=$(kdx document grep "revenue" doc.kddb | jq -r '.page' | sort -u)
for p in $PAGES; do
echo "=== Page $p ==="
kdx document lines --page $p doc.kddb | jq -r '.content'
done
Use Cases
Document Search and Analysis
Quickly search across document content:
# Find all mentions of a term
kdx document grep "depreciation" financials.kddb
# Case-insensitive search for variations
kdx document grep -i "ebitda" report.kddb
# Find patterns like phone numbers
kdx document grep -E "\(\d{3}\)\s*\d{3}-\d{4}" contract.kddb
Building LLM Prompts
Extract relevant content for LLM context windows:
# Get all lines from executive summary (page 2)
CONTEXT=$(kdx document lines --page 2 doc.kddb | jq -r '.content' | paste -sd '\n' -)
# Search for relevant lines and build context
CONTEXT=$(kdx document grep -i "key finding" doc.kddb | jq -r '.content' | paste -sd '\n' -)
Data Pipeline Integration
Use in automated workflows:
#!/bin/bash
# Extract all lines mentioning amounts
kdx document grep -E '\$[\d,]+' "$1" | while read -r line; do
UUID=$(echo "$line" | jq -r '.uuid')
CONTENT=$(echo "$line" | jq -r '.content')
PAGE=$(echo "$line" | jq -r '.page')
# Process each match...
echo "Found amount on page $PAGE: $CONTENT"
done
Quality Assurance
Validate document content:
# Check if required sections exist
if kdx document grep -c "Terms and Conditions" contract.kddb | grep -q "^0$"; then
echo "ERROR: Missing Terms and Conditions section"
exit 1
fi
# Count pages with content
PAGES_WITH_CONTENT=$(kdx document grep "." doc.kddb | jq -r '.page' | sort -u | wc -l)
echo "Document has content on $PAGES_WITH_CONTENT pages"
Tips
Use -E (extended regex) when you need pattern matching. Without it, special characters like . and * are treated literally.
JSON Lines format streams well - you can process results as they arrive without loading everything into memory.
Combine grep with --max to quickly check if a pattern exists without processing the entire document.
The uuid field comes from the spatial:uuid feature on line nodes. Documents without this feature will have empty UUIDs.
Page numbers are 1-based in the output and flags, matching how users typically refer to pages.