DataTableAccumulationStrategy
Extracts and accumulates structured tabular data across multiple web pages with automated normalization, deduplication, and multi-format export.
Stateful: Accumulative
Model: GPT-4o Preferred
Output: CSV/JSON/Markdown
⚙️ ExecutionConfig.json
{
"column_names": "Model, Price, Range",
"column_types": { "Price": "number" },
"deduplicate": true,
"key_columns": "Model",
"export_format": "csv",
"include_source_urls": true
}
→
📊 Session UI Output
✔ Extraction Successful
Confidence: 98% | Rows: 12 | Total: 45
| Model | Price | Range |
|---|---|---|
| Model 3 | 38990 | 333 mi |
| Ioniq 6 | 42450 | 361 mi |
| Air Pure | 69900 | 419 mi |
⚠️ Normalized 'Price' from "$38,990" to "38990"
DataTableConfig Fields
| Field | Type | Description |
|---|---|---|
column_names * |
String |
Comma-separated list of columns to extract. |
column_descriptions |
Map<String, String> |
Guidance for the LLM on what each column represents. |
column_types |
Map<String, String> |
Mapping of column names to types (string, number, boolean, date). |
extraction_instructions |
String |
Natural language guidance for the LLM extraction engine. |
auto_detect_tables |
Boolean |
If true, prioritizes existing HTML table structures. Default: true. |
min_rows |
Int |
Minimum rows required to consider a table valid. Default: 1. |
max_rows_per_page |
Int? |
Limit extraction per page. Default: null (unlimited). |
deduplicate |
Boolean |
Prevents duplicate rows across multiple pages. Default: true. |
key_columns |
String? |
Comma-separated columns used for deduplication identity. |
validate_types |
Boolean |
Enforce type checking on extracted values. Default: true. |
normalize_data |
Boolean |
Clean values (e.g. strip currency symbols). Default: true. |
export_format |
String |
Final output format: csv, json, or markdown. Default: csv. |