⚙️ Strategy Config

{
  "content_queries": [
    "Identify core technical specs",
    "List API rate limits"
  ],
  "model": "gpt-4o-mini",
  "max_final_output_size": 15000
}
            
👁️ Synthesized Result
Technical Analysis Summary

The platform utilizes a distributed microservices architecture. API rate limits are enforced at 1000 req/min per endpoint. Sub-millisecond latency is achieved via edge caching...

Extracted Links:
  • /docs/api-reference
  • /pricing/enterprise
Architecture API Performance

Configuration Parameters

Field Type Description
content_queries Optional List<String> Specific questions or data points to extract from the page content.
task_description Optional String General goal for the analysis if specific queries aren't provided.
model Optional String The LLM instance to use. Defaults to the orchestration's defaultFast model.
max_final_output_size Default: 15k Int Maximum character length for the synthesized final summary.

The Industrial Pipeline

1. Intelligent Chunking

Content exceeding 50,000 characters is automatically split. The strategy uses a weighted search for paragraph breaks (\n\n), newlines, or sentence ends to preserve semantic context.

2. Recursive Analysis

Each chunk is analyzed by a ParsedAgent. If multiple chunks exist, their intermediate summaries are recursively synthesized into a single ParsedPage object to maintain global context.

3. Multi-Source Synthesis

After processing all URLs, the strategy aggregates individual results and performs a final synthesis pass using a ChatAgent to create a cohesive, thematic summary.

4. Structured Metadata

The strategy extracts page_type, tags, and link_data, enabling the crawler to make informed decisions about subsequent navigation and link prioritization.

Implementation

Register the strategy within a CrawlerAgentTask to enable intelligent synthesis.


// Example: Configuring a crawler with the summarizer
val crawlerTask = CrawlerAgentTask(
    processingStrategy = DefaultSummarizerStrategy(),
    executionConfig = CrawlerAgentTask.ExecutionConfig(
        content_queries = listOf("Find pricing information"),
        max_pages = 5
    )
)
            

Synthesis Prompt Segment

The final synthesis pass uses the following logic:

"Create a comprehensive summary... Organize information by themes rather than by source... Include the most important links that should be followed up on."