DataTableAccumulationStrategy

Extracts and accumulates structured tabular data across multiple web pages with automated normalization, deduplication, and multi-format export.

Stateful: Accumulative Model: GPT-4o Preferred Output: CSV/JSON/Markdown
⚙️ ExecutionConfig.json
{
  "column_names": "Model, Price, Range",
  "column_types": { "Price": "number" },
  "deduplicate": true,
  "key_columns": "Model",
  "export_format": "csv",
  "include_source_urls": true
}
📊 Session UI Output

✔ Extraction Successful

Confidence: 98% | Rows: 12 | Total: 45

ModelPriceRange
Model 338990333 mi
Ioniq 642450361 mi
Air Pure69900419 mi
⚠️ Normalized 'Price' from "$38,990" to "38990"

DataTableConfig Fields

Field Type Description
column_names * String Comma-separated list of columns to extract.
column_descriptions Map<String, String> Guidance for the LLM on what each column represents.
column_types Map<String, String> Mapping of column names to types (string, number, boolean, date).
extraction_instructions String Natural language guidance for the LLM extraction engine.
auto_detect_tables Boolean If true, prioritizes existing HTML table structures. Default: true.
min_rows Int Minimum rows required to consider a table valid. Default: 1.
max_rows_per_page Int? Limit extraction per page. Default: null (unlimited).
deduplicate Boolean Prevents duplicate rows across multiple pages. Default: true.
key_columns String? Comma-separated columns used for deduplication identity.
validate_types Boolean Enforce type checking on extracted values. Default: true.
normalize_data Boolean Clean values (e.g. strip currency symbols). Default: true.
export_format String Final output format: csv, json, or markdown. Default: csv.