DataTableCompilationTask

Automated extraction and synthesis of structured data from heterogeneous file collections into unified JSON, CSV, or Markdown tables.

Category: Writing Side-Effect Safe Model: GPT-4 Preferred
⚙️ ExecutionConfig.json
{
  "file_patterns": ["docs/reports/*.md"],
  "output_file": "q3_summary.csv",
  "row_identification_instructions": "Each project mentioned.",
  "column_identification_instructions": "Status, Budget, Owner.",
  "cell_extraction_instructions": "Extract exact USD values."
}
👁️ Compiled Output (Markdown Preview)
Project ID Status Budget Owner
Alpha-7 Completed $45,000 A. Chen
Beta-9 In Progress $12,200 J. Smith

✔ Saved to /workspace/q3_summary.csv

Live Results Showcase

Explore actual artifacts generated by this task in the test workspace.

Configuration Parameters

Field Type Description
file_patterns* List<String> List of file glob patterns (e.g., "src/**/*.kt") to include in the data compilation.
output_file String Path where the compiled table will be saved. Supports .json, .csv, and .md. Default: compiled_data.json.
row_identification_instructions String Natural language instructions for identifying what constitutes a "row" in the source data.
column_identification_instructions String Instructions for identifying the schema/columns of the resulting table.
cell_extraction_instructions String Specific logic for how to extract or format data for individual cells.

Task Execution Flow

  1. File Collection: Scans the workspace using provided glob patterns to build a candidate file list.
  2. Schema Discovery: Analyzes file samples to identify distinct columns based on column_identification_instructions.
  3. Entity Resolution: Identifies all unique rows across all files, mapping which files contribute to which row.
  4. Iterative Extraction: For each identified row, the agent performs a targeted extraction from its source files to fill the schema.
  5. Serialization: Compiles the results into the requested format (JSON, CSV, or Markdown) and writes to the workspace.

Direct Task Instantiation

DataTableCompilationTask.kt
val task = DataTableCompilationTask(
    orchestrationConfig = config,
    planTask = DataTableCompilationTaskExecutionConfigData(
        file_patterns = listOf("logs/*.txt"),
        output_file = "error_report.json",
        row_identification_instructions = "Unique error codes",
        column_identification_instructions = "Code, Frequency, Last Seen"
    )
)

Embedded Execution (UnifiedHarness)

Use the UnifiedHarness to run this task in a headless environment (CI/CD, CLI).

EmbeddedExample.kt
import com.simiacryptus.cognotik.plan.TaskType
import com.simiacryptus.cognotik.plan.tools.writing.DataTableCompilationTask.Companion.DataTableCompilation
import com.simiacryptus.cognotik.plan.tools.writing.DataTableCompilationTask.DataTableCompilationTaskExecutionConfigData
fun runCompilation(harness: UnifiedHarness, projectDir: File) {
    val config = DataTableCompilationTaskExecutionConfigData(
        file_patterns = listOf("src/**/*.kt"),
        output_file = "architecture_inventory.md",
        row_identification_instructions = "Each public class or interface",
        column_identification_instructions = "Name, Package, Primary Responsibility",
        task_description = "Generate architecture inventory"
    )
    harness.runTask(
        taskType = DataTableCompilation,
        typeConfig = TaskTypeConfig(model = OpenAIModels.GPT4o.asApiChatModel()),
        executionConfig = config,
        workspace = projectDir,
        autoFix = true
    )
}

Prompt Segment

The following text is injected into the LLM context to define this capability:

DataTableCompilation - Compile structured data tables from multiple files
  ** Specify file glob patterns to include in the compilation
  ** Define instructions for identifying rows in the data
  ** Define instructions for identifying columns in the data
  ** Define instructions for extracting cell data
  ** Specify output file path for the compiled table