OCRTaskExecutionConfigData.json ● Config
{
  "files": ["internal/q3_report.pdf"],
  "dpi": 200,
  "extract_figures": true,
  "extract_metadata": true,
  "extract_text": true
}
Session Output ● Result
✔ Processed 12 pages

Generated Files:

  • 📄 q3_report.md
  • 📄 q3_report_metadata.json
  • 📄 q3_report_text.txt
  • 📁 q3_report_figures/
  • └─ p1_1_Revenue_Chart.png
# Quarterly Report
## Financial Highlights
The revenue for Q3 exceeded expectations...

Live Results Showcase

Explore actual artifacts generated by the OCR engine, including extracted figures and structured metadata.

Configuration Parameters

Field Type Default Description
files * List<String> - List of workspace-relative paths to PDF or image files.
dpi Float 150.0 Rendering resolution for PDF pages before Vision processing.
extract_figures Boolean false If true, identifies and crops charts/images into a sub-directory.
extract_metadata Boolean false Extracts form fields and key-value pairs into a JSON file.
extract_text Boolean false Extracts raw embedded text from PDFs (non-OCR) to a .txt file.

Task Lifecycle

  1. Initialization: Validates file existence and prepares the TabbedDisplay.
  2. Rendering: Converts PDF pages to high-resolution images using RenderableDocumentReader.
  3. Vision OCR: Sends page images to the LLM with a system prompt optimized for Markdown conversion.
  4. Asset Analysis: If enabled, runs ParsedImageAgent to detect bounding boxes for figures and metadata.
  5. Persistence: Saves the final Markdown, JSON metadata, and cropped figure PNGs to the workspace.

Kotlin Usage

Invoke as a standalone tool using the UnifiedHarness.

import com.simiacryptus.cognotik.plan.tools.file.OCRTask
import com.simiacryptus.cognotik.plan.tools.file.OCRTask.Companion.OCR

// 1. Define Runtime Input
val executionConfig = OCRTask.OCRTaskExecutionConfigData(
    files = listOf("test_document.pdf"),
    dpi = 300f,
    extract_figures = true,
    extract_metadata = true,
    extract_text = true
)

// 2. Run via Harness
harness.runTask(
    taskType = OCR,
    typeConfig = TaskTypeConfig(name = "DocumentProcessor"),
    executionConfig = executionConfig,
    workspace = File("./workspaces/test-20260110_143732"),
    autoFix = true // Automatically save results to disk
)

Prompt Segment

The internal description used by the AI orchestrator:

OCR - Convert documents (PDF, Images) to Markdown text.
* Extracts text from images and PDFs using Vision models.
* Preserves formatting as Markdown.
* Optionally extracts figures as images and metadata/form fields.
* Saves output to a .md file with the same name.
extract_metadata = true, dpi = 300f