Cognotik | ImageDecomposition

⚙️ ImageDecompositionConfig.json


{
  "files": ["docs/complex_form.pdf"],
  "segmentation_query": "Find all handwritten fields",
  "analysis_query": "Transcribe the handwriting in this region",
  "max_depth": 3,
  "dpi": 300,
  "min_region_size": 100,
  "output_file": "form_analysis.json"
}

→

👁️ Session UI: Visual Analysis

[Annotated Image Preview]

✔ Found 12 regions of interest

- Depth 1: Header Section
- Depth 2: Signature Block (Zoomed)
- Depth 2: Date Field (Zoomed)

Test Workspace Browser

Explore actual artifacts, debug images, and structured JSON generated by the ImageDecompositionTask.

Configuration Parameters

Field	Type	Description
`files` *	List<String>	The image or PDF file(s) to analyze (relative path).
`segmentation_query`	String	The goal for identifying regions (e.g., "Find all line items").
`analysis_query`	String	The goal for the final detailed description.
`dpi`	Float	DPI for rendering document pages (default: 150).
`max_depth`	Int	Recursion limit (1-5). Default: 2.
`min_region_size`	Int	Minimum pixel size (width/height) to trigger a recursive zoom.
`output_file`	String	Path to save the hierarchical JSON result.

Task Lifecycle

Initialization: Loads the target image or renders PDF pages at specified DPI.
Segmentation: The Vision Model identifies regions of interest based on the segmentation_query.
Recursive Zoom: For regions marked requires_zoom, the task crops the image and repeats the process until max_depth is reached.
Leaf Analysis: Smallest regions are analyzed for specific content (OCR, object description).
Synthesis: Results are stitched into a hierarchical JSON tree and a final markdown report is generated.

Embedded Execution (UnifiedHarness)

Use this to invoke the task programmatically in a headless environment.


import com.simiacryptus.cognotik.plan.tools.file.ImageDecompositionTask
import com.simiacryptus.cognotik.plan.tools.file.ImageDecompositionTask.Companion.ImageDecomposition

// 1. Define Runtime Input
val executionConfig = ImageDecompositionTask.ImageDecompositionConfig(
    files = listOf("input/schematic.png"),
    segmentation_query = "Identify all electrical components",
    analysis_query = "Describe component specs and labels",
    max_depth = 3,
    output_file = "schematic_analysis.json"
)
// 2. Run via Harness
harness.runTask(
    taskType = ImageDecomposition,
    typeConfig = TaskTypeConfig(model = OpenAIModels.GPT4o),
    executionConfig = executionConfig,
    workspace = projectDir,
    autoFix = true
)

Prompt Segment

"IterativeImageDecomposition - Recursively analyzes images for fine details. Use for: OCR on complex documents, finding small objects, or crowd analysis."