DataTableCompilationTask
Automated extraction and synthesis of structured data from heterogeneous file collections into unified JSON, CSV, or Markdown tables.
Category: Writing
Side-Effect Safe
Model: GPT-4 Preferred
⚙️ ExecutionConfig.json
{
"file_patterns": ["docs/reports/*.md"],
"output_file": "q3_summary.csv",
"row_identification_instructions": "Each project mentioned.",
"column_identification_instructions": "Status, Budget, Owner.",
"cell_extraction_instructions": "Extract exact USD values."
}
→
👁️ Compiled Output (Markdown Preview)
| Project ID | Status | Budget | Owner |
|---|---|---|---|
| Alpha-7 | Completed | $45,000 | A. Chen |
| Beta-9 | In Progress | $12,200 | J. Smith |
✔ Saved to /workspace/q3_summary.csv
Live Results Showcase
Explore actual artifacts generated by this task in the test workspace.
Configuration Parameters
| Field | Type | Description |
|---|---|---|
file_patterns* |
List<String> |
List of file glob patterns (e.g., "src/**/*.kt") to include in the data compilation. |
output_file |
String |
Path where the compiled table will be saved. Supports .json, .csv, and .md. Default: compiled_data.json. |
row_identification_instructions |
String |
Natural language instructions for identifying what constitutes a "row" in the source data. |
column_identification_instructions |
String |
Instructions for identifying the schema/columns of the resulting table. |
cell_extraction_instructions |
String |
Specific logic for how to extract or format data for individual cells. |
Task Execution Flow
- File Collection: Scans the workspace using provided glob patterns to build a candidate file list.
-
Schema Discovery: Analyzes file samples to identify distinct columns based on
column_identification_instructions. - Entity Resolution: Identifies all unique rows across all files, mapping which files contribute to which row.
- Iterative Extraction: For each identified row, the agent performs a targeted extraction from its source files to fill the schema.
- Serialization: Compiles the results into the requested format (JSON, CSV, or Markdown) and writes to the workspace.
Direct Task Instantiation
DataTableCompilationTask.kt
val task = DataTableCompilationTask(
orchestrationConfig = config,
planTask = DataTableCompilationTaskExecutionConfigData(
file_patterns = listOf("logs/*.txt"),
output_file = "error_report.json",
row_identification_instructions = "Unique error codes",
column_identification_instructions = "Code, Frequency, Last Seen"
)
)
Embedded Execution (UnifiedHarness)
Use the UnifiedHarness to run this task in a headless environment (CI/CD, CLI).
EmbeddedExample.kt
import com.simiacryptus.cognotik.plan.TaskType
import com.simiacryptus.cognotik.plan.tools.writing.DataTableCompilationTask.Companion.DataTableCompilation
import com.simiacryptus.cognotik.plan.tools.writing.DataTableCompilationTask.DataTableCompilationTaskExecutionConfigData
fun runCompilation(harness: UnifiedHarness, projectDir: File) {
val config = DataTableCompilationTaskExecutionConfigData(
file_patterns = listOf("src/**/*.kt"),
output_file = "architecture_inventory.md",
row_identification_instructions = "Each public class or interface",
column_identification_instructions = "Name, Package, Primary Responsibility",
task_description = "Generate architecture inventory"
)
harness.runTask(
taskType = DataTableCompilation,
typeConfig = TaskTypeConfig(model = OpenAIModels.GPT4o.asApiChatModel()),
executionConfig = config,
workspace = projectDir,
autoFix = true
)
}
Prompt Segment
The following text is injected into the LLM context to define this capability:
DataTableCompilation - Compile structured data tables from multiple files ** Specify file glob patterns to include in the compilation ** Define instructions for identifying rows in the data ** Define instructions for identifying columns in the data ** Define instructions for extracting cell data ** Specify output file path for the compiled table