ImageDecomposition
Recursively analyze images to find fine details, perform deep OCR on complex documents, or identify small objects through iterative vision model prompting.
Category: Vision
Side-Effect Safe
Vision Model Required
⚙️ ImageDecompositionConfig.json
{
"files": ["docs/complex_form.pdf"],
"segmentation_query": "Find all handwritten fields",
"analysis_query": "Transcribe the handwriting in this region",
"max_depth": 3,
"dpi": 300,
"min_region_size": 100,
"output_file": "form_analysis.json"
}
→
👁️ Session UI: Visual Analysis
[Annotated Image Preview]
✔ Found 12 regions of interest
- Depth 1: Header Section
- Depth 2: Signature Block (Zoomed)
- Depth 2: Date Field (Zoomed)
- Depth 2: Signature Block (Zoomed)
- Depth 2: Date Field (Zoomed)
Test Workspace Browser
Explore actual artifacts, debug images, and structured JSON generated by the ImageDecompositionTask.
Configuration Parameters
| Field | Type | Description |
|---|---|---|
files * |
List<String> | The image or PDF file(s) to analyze (relative path). |
segmentation_query |
String | The goal for identifying regions (e.g., "Find all line items"). |
analysis_query |
String | The goal for the final detailed description. |
dpi |
Float | DPI for rendering document pages (default: 150). |
max_depth |
Int | Recursion limit (1-5). Default: 2. |
min_region_size |
Int | Minimum pixel size (width/height) to trigger a recursive zoom. |
output_file |
String | Path to save the hierarchical JSON result. |
Task Lifecycle
- Initialization: Loads the target image or renders PDF pages at specified DPI.
- Segmentation: The Vision Model identifies regions of interest based on the
segmentation_query. - Recursive Zoom: For regions marked
requires_zoom, the task crops the image and repeats the process untilmax_depthis reached. - Leaf Analysis: Smallest regions are analyzed for specific content (OCR, object description).
- Synthesis: Results are stitched into a hierarchical JSON tree and a final markdown report is generated.
Embedded Execution (UnifiedHarness)
Use this to invoke the task programmatically in a headless environment.
import com.simiacryptus.cognotik.plan.tools.file.ImageDecompositionTask
import com.simiacryptus.cognotik.plan.tools.file.ImageDecompositionTask.Companion.ImageDecomposition
// 1. Define Runtime Input
val executionConfig = ImageDecompositionTask.ImageDecompositionConfig(
files = listOf("input/schematic.png"),
segmentation_query = "Identify all electrical components",
analysis_query = "Describe component specs and labels",
max_depth = 3,
output_file = "schematic_analysis.json"
)
// 2. Run via Harness
harness.runTask(
taskType = ImageDecomposition,
typeConfig = TaskTypeConfig(model = OpenAIModels.GPT4o),
executionConfig = executionConfig,
workspace = projectDir,
autoFix = true
)
Prompt Segment
"IterativeImageDecomposition - Recursively analyzes images for fine details. Use for: OCR on complex documents, finding small objects, or crowd analysis."