⚙️ CrawlerTaskExecutionConfigData

{
  "direct_urls": [
    "https://cognotik.com/docs",
    "https://cognotik.com/api/v1"
  ]
}
            
🌱 Seed Strategy Output
[LOG] Successfully processed 2 direct URLs
  • Item 1: Direct URL 1
    https://cognotik.com/docs
  • Item 2: Direct URL 2
    https://cognotik.com/api/v1

Configuration Parameters

Field Type Description
direct_urls * List<String> A list of absolute URLs to be used as the starting point for the crawler. Must use http:// or https:// protocols.

Validation Rules

  • URLs are trimmed of leading/trailing whitespace.
  • Blank or empty strings are ignored.
  • Malformed URIs are logged as warnings and skipped.
  • Only http and https schemes are accepted.

Seed Generation Lifecycle

1. Initialization
The system checks taskConfig.direct_urls. If the list is null or empty, an error is logged and an empty seed list is returned.
2. Sanitization & Filtering
Each string is trimmed. The system filters out blank entries and validates the URI format. Non-HTTP(S) URLs are discarded with a warning.
3. Mapping
Valid URLs are converted into SeedItem objects. Titles are auto-generated as "Direct URL {index + 1}".

Kotlin Implementation


// Example: Configuring a CrawlerAgentTask with DirectUrls
val taskConfig = CrawlerTaskExecutionConfigData(
    direct_urls = listOf(
        "https://example.com/start",
        "https://example.com/about"
    )
)

val seedStrategy = DirectUrls().createStrategy(task, user)
val seedItems = seedStrategy.getSeedItems(taskConfig, orchestrationConfig)

// Result: List of SeedItems ready for the CrawlerAgent