ExecutionConfig.json UTF-8
{
  "search_query": "Latest LLM benchmarks 2024",
  "content_queries": "Extract performance metrics for 
                    Llama-3 and GPT-4o.",
  "direct_urls": ["https://arxiv.org/abs/..."],
  "max_pages_per_task": 15
}
Session UI (TabbedDisplay) Live Render
Final Output
Crawl Details
Queue Details

# Research Summary

Based on 12 analyzed pages, Llama-3 70B shows a 15% improvement in MMLU over previous iterations...

  • Source: HuggingFace Blog (Relevance: 98%)
  • Source: Arxiv (Relevance: 92%)

Execution Configuration

Field Type Description
search_query String? The primary query used to seed the crawl via Google Search.
direct_urls List<String>? Explicit URLs to analyze alongside or instead of search results.
content_queries * Any Detailed instructions for the LLM on how to transform page content into insights.

Type Configuration (Static)

Field Default Description
processing_strategy DefaultSummarizer Strategy for page analysis (e.g., FactChecking, JobMatching).
max_pages_per_task 30 Hard limit on the number of unique pages to process.
concurrent_page_processing 3 Number of parallel threads for fetching and analysis.
respect_robots_txt true Ensures compliance with site crawling policies.

Lifecycle of a Crawler Task

  1. Seeding: The agent uses SeedMethod (Google Proxy or Direct URLs) to populate the initial PriorityQueue.
  2. Priority Discovery: Links are scored based on relevance_score. Higher relevance and lower depth are prioritized.
  3. Concurrent Execution: The ExecutorService manages parallel crawlPage operations. Each page is fetched via HttpClient or Selenium.
  4. LLM Synthesis: Each page is processed by the selected PageProcessingStrategy, which extracts data and discovers new links.
  5. Final Aggregation: Once limits are reached or the queue is exhausted, a ChatAgent synthesizes all individual reports into a final comprehensive summary.

Kotlin Implementation

OrchestrationConfig.kt
val crawlerTask = CrawlerAgentTask(
    orchestrationConfig = config,
    planTask = CrawlerTaskExecutionConfigData(
        search_query = "Cognotik AI Framework",
        content_queries = "Summarize core features and architecture."
    )
).apply {
    typeConfig = CrawlerTaskTypeConfig(
        max_pages_per_task = 10,
        processing_strategy = ProcessingStrategyType.DefaultSummarizer
    )
}