Cognotik | DirectUrls Seed Method

⚙️ CrawlerTaskExecutionConfigData


{
  "direct_urls": [
    "https://cognotik.com/docs",
    "https://cognotik.com/api/v1"
  ]
}

→

🌱 Seed Strategy Output

[LOG] Successfully processed 2 direct URLs

Item 1: Direct URL 1
https://cognotik.com/docs
Item 2: Direct URL 2
https://cognotik.com/api/v1

Configuration Parameters

Field	Type	Description
`direct_urls` *	`List<String>`	A list of absolute URLs to be used as the starting point for the crawler. Must use `http://` or `https://` protocols.

Validation Rules

URLs are trimmed of leading/trailing whitespace.
Blank or empty strings are ignored.
Malformed URIs are logged as warnings and skipped.
Only http and https schemes are accepted.

Seed Generation Lifecycle

1. Initialization

The system checks taskConfig.direct_urls. If the list is null or empty, an error is logged and an empty seed list is returned.

2. Sanitization & Filtering

Each string is trimmed. The system filters out blank entries and validates the URI format. Non-HTTP(S) URLs are discarded with a warning.

3. Mapping

Valid URLs are converted into SeedItem objects. Titles are auto-generated as "Direct URL {index + 1}".

Kotlin Implementation


// Example: Configuring a CrawlerAgentTask with DirectUrls
val taskConfig = CrawlerTaskExecutionConfigData(
    direct_urls = listOf(
        "https://example.com/start",
        "https://example.com/about"
    )
)

val seedStrategy = DirectUrls().createStrategy(task, user)
val seedItems = seedStrategy.getSeedItems(taskConfig, orchestrationConfig)

// Result: List of SeedItems ready for the CrawlerAgent