Cognotik | CrawlerAgentTask

⚙️ ExecutionConfig.json UTF-8

{
  "search_query": "Latest LLM benchmarks 2024",
  "content_queries": "Extract performance metrics for 
                    Llama-3 and GPT-4o.",
  "direct_urls": ["https://arxiv.org/abs/..."],
  "max_pages_per_task": 15
}

→

👁️ Session UI (TabbedDisplay) Live Render

Final Output

Crawl Details

Queue Details

# Research Summary

Based on 12 analyzed pages, Llama-3 70B shows a 15% improvement in MMLU over previous iterations...

Source: HuggingFace Blog (Relevance: 98%)
Source: Arxiv (Relevance: 92%)

Execution Configuration

Field	Type	Description
`search_query`	`String?`	The primary query used to seed the crawl via Google Search.
`direct_urls`	`List<String>?`	Explicit URLs to analyze alongside or instead of search results.
`content_queries` *	`Any`	Detailed instructions for the LLM on how to transform page content into insights.

Type Configuration (Static)

Field	Default	Description
`processing_strategy`	`DefaultSummarizer`	Strategy for page analysis (e.g., FactChecking, JobMatching).
`max_pages_per_task`	`30`	Hard limit on the number of unique pages to process.
`concurrent_page_processing`	`3`	Number of parallel threads for fetching and analysis.
`respect_robots_txt`	`true`	Ensures compliance with site crawling policies.

Lifecycle of a Crawler Task

Seeding: The agent uses SeedMethod (Google Proxy or Direct URLs) to populate the initial PriorityQueue.
Priority Discovery: Links are scored based on relevance_score. Higher relevance and lower depth are prioritized.
Concurrent Execution: The ExecutorService manages parallel crawlPage operations. Each page is fetched via HttpClient or Selenium.
LLM Synthesis: Each page is processed by the selected PageProcessingStrategy, which extracts data and discovers new links.
Final Aggregation: Once limits are reached or the queue is exhausted, a ChatAgent synthesizes all individual reports into a final comprehensive summary.

Kotlin Implementation

OrchestrationConfig.kt

val crawlerTask = CrawlerAgentTask(
    orchestrationConfig = config,
    planTask = CrawlerTaskExecutionConfigData(
        search_query = "Cognotik AI Framework",
        content_queries = "Summarize core features and architecture."
    )
).apply {
    typeConfig = CrawlerTaskTypeConfig(
        max_pages_per_task = 10,
      processing_strategy = ProcessingStrategyType.DefaultSummarizer,
    )
}