Cognotik | OCRTask

OCRTaskExecutionConfigData.json ● Config

{
  "files": ["internal/q3_report.pdf"],
  "dpi": 200,
  "extract_figures": true,
  "extract_metadata": true,
  "extract_text": true
}

→

Session Output ● Result

✔ Processed 12 pages

Generated Files:

📄 q3_report.md
📄 q3_report_metadata.json
📄 q3_report_text.txt
📁 q3_report_figures/
└─ p1_1_Revenue_Chart.png

# Quarterly Report
## Financial Highlights
The revenue for Q3 exceeded expectations...

Live Results Showcase

Explore actual artifacts generated by the OCR engine, including extracted figures and structured metadata.

Configuration Parameters

Field	Type	Default	Description
`files` *	List<String>	-	List of workspace-relative paths to PDF or image files.
`dpi`	Float	150.0	Rendering resolution for PDF pages before Vision processing.
`extract_figures`	Boolean	false	If true, identifies and crops charts/images into a sub-directory.
`extract_metadata`	Boolean	false	Extracts form fields and key-value pairs into a JSON file.
`extract_text`	Boolean	false	Extracts raw embedded text from PDFs (non-OCR) to a .txt file.

Task Lifecycle

Initialization: Validates file existence and prepares the TabbedDisplay.
Rendering: Converts PDF pages to high-resolution images using RenderableDocumentReader.
Vision OCR: Sends page images to the LLM with a system prompt optimized for Markdown conversion.
Asset Analysis: If enabled, runs ParsedImageAgent to detect bounding boxes for figures and metadata.
Persistence: Saves the final Markdown, JSON metadata, and cropped figure PNGs to the workspace.

Kotlin Usage

Invoke as a standalone tool using the UnifiedHarness.

import com.simiacryptus.cognotik.plan.tools.file.OCRTask
import com.simiacryptus.cognotik.plan.tools.file.OCRTask.Companion.OCR

// 1. Define Runtime Input
val executionConfig = OCRTask.OCRTaskExecutionConfigData(
    files = listOf("test_document.pdf"),
    dpi = 300f,
    extract_figures = true,
    extract_metadata = true,
    extract_text = true
)

// 2. Run via Harness
harness.runTask(
    taskType = OCR,
    typeConfig = TaskTypeConfig(name = "DocumentProcessor"),
    executionConfig = executionConfig,
    workspace = File("./workspaces/test-20260110_143732"),
    autoFix = true // Automatically save results to disk
)

Prompt Segment

The internal description used by the AI orchestrator:

OCR - Convert documents (PDF, Images) to Markdown text.
* Extracts text from images and PDFs using Vision models.
* Preserves formatting as Markdown.
* Optionally extracts figures as images and metadata/form fields.
* Saves output to a .md file with the same name.

extract_metadata = true, dpi = 300f