DefaultSummarizerStrategy
High-density web content synthesis. Implements recursive chunking for large pages and multi-pass LLM analysis to generate structured, goal-oriented summaries across multiple sources.
{
"content_queries": [
"Identify core technical specs",
"List API rate limits"
],
"model": "gpt-4o-mini",
"max_final_output_size": 15000
}
The platform utilizes a distributed microservices architecture. API rate limits are enforced at 1000 req/min per endpoint. Sub-millisecond latency is achieved via edge caching...
- /docs/api-reference
- /pricing/enterprise
Configuration Parameters
| Field | Type | Description |
|---|---|---|
content_queries Optional |
List<String> | Specific questions or data points to extract from the page content. |
task_description Optional |
String | General goal for the analysis if specific queries aren't provided. |
model Optional |
String | The LLM instance to use. Defaults to the orchestration's defaultFast model. |
max_final_output_size Default: 15k |
Int | Maximum character length for the synthesized final summary. |
The Industrial Pipeline
1. Intelligent Chunking
Content exceeding 50,000 characters is automatically split. The strategy uses a weighted search for paragraph breaks (\n\n), newlines, or sentence ends to preserve semantic context.
2. Recursive Analysis
Each chunk is analyzed by a ParsedAgent. If multiple chunks exist, their intermediate summaries are recursively synthesized into a single ParsedPage object to maintain global context.
3. Multi-Source Synthesis
After processing all URLs, the strategy aggregates individual results and performs a final synthesis pass using a ChatAgent to create a cohesive, thematic summary.
4. Structured Metadata
The strategy extracts page_type, tags, and link_data, enabling the crawler to make informed decisions about subsequent navigation and link prioritization.
Implementation
Register the strategy within a CrawlerAgentTask to enable intelligent synthesis.
// Example: Configuring a crawler with the summarizer
val crawlerTask = CrawlerAgentTask(
processingStrategy = DefaultSummarizerStrategy(),
executionConfig = CrawlerAgentTask.ExecutionConfig(
content_queries = listOf("Find pricing information"),
max_pages = 5
)
)
Synthesis Prompt Segment
The final synthesis pass uses the following logic:
"Create a comprehensive summary... Organize information by themes rather than by source... Include the most important links that should be followed up on."