⚙️ ImageDecompositionConfig.json

{
  "files": ["docs/complex_form.pdf"],
  "segmentation_query": "Find all handwritten fields",
  "analysis_query": "Transcribe the handwriting in this region",
  "max_depth": 3,
  "dpi": 300,
  "min_region_size": 100,
  "output_file": "form_analysis.json"
}
            
👁️ Session UI: Visual Analysis
[Annotated Image Preview]
✔ Found 12 regions of interest
- Depth 1: Header Section
- Depth 2: Signature Block (Zoomed)
- Depth 2: Date Field (Zoomed)

Test Workspace Browser

Explore actual artifacts, debug images, and structured JSON generated by the ImageDecompositionTask.

Configuration Parameters

Field Type Description
files * List<String> The image or PDF file(s) to analyze (relative path).
segmentation_query String The goal for identifying regions (e.g., "Find all line items").
analysis_query String The goal for the final detailed description.
dpi Float DPI for rendering document pages (default: 150).
max_depth Int Recursion limit (1-5). Default: 2.
min_region_size Int Minimum pixel size (width/height) to trigger a recursive zoom.
output_file String Path to save the hierarchical JSON result.

Task Lifecycle

  1. Initialization: Loads the target image or renders PDF pages at specified DPI.
  2. Segmentation: The Vision Model identifies regions of interest based on the segmentation_query.
  3. Recursive Zoom: For regions marked requires_zoom, the task crops the image and repeats the process until max_depth is reached.
  4. Leaf Analysis: Smallest regions are analyzed for specific content (OCR, object description).
  5. Synthesis: Results are stitched into a hierarchical JSON tree and a final markdown report is generated.

Embedded Execution (UnifiedHarness)

Use this to invoke the task programmatically in a headless environment.


import com.simiacryptus.cognotik.plan.tools.file.ImageDecompositionTask
import com.simiacryptus.cognotik.plan.tools.file.ImageDecompositionTask.Companion.ImageDecomposition

// 1. Define Runtime Input
val executionConfig = ImageDecompositionTask.ImageDecompositionConfig(
    files = listOf("input/schematic.png"),
    segmentation_query = "Identify all electrical components",
    analysis_query = "Describe component specs and labels",
    max_depth = 3,
    output_file = "schematic_analysis.json"
)
// 2. Run via Harness
harness.runTask(
    taskType = ImageDecomposition,
    typeConfig = TaskTypeConfig(model = OpenAIModels.GPT4o),
    executionConfig = executionConfig,
    workspace = projectDir,
    autoFix = true
)
            

Prompt Segment

"IterativeImageDecomposition - Recursively analyzes images for fine details. Use for: OCR on complex documents, finding small objects, or crowd analysis."