DirectUrls
Explicitly define the entry points for a web crawl. This method bypasses discovery logic to focus the agent on a specific set of target URLs.
Side-Effect Safe
Deterministic
Input-Driven
⚙️ CrawlerTaskExecutionConfigData
{
"direct_urls": [
"https://cognotik.com/docs",
"https://cognotik.com/api/v1"
]
}
→
🌱 Seed Strategy Output
[LOG] Successfully processed 2 direct URLs
-
Item 1: Direct URL 1
https://cognotik.com/docs -
Item 2: Direct URL 2
https://cognotik.com/api/v1
Configuration Parameters
| Field | Type | Description |
|---|---|---|
direct_urls * |
List<String> |
A list of absolute URLs to be used as the starting point for the crawler.
Must use http:// or https:// protocols.
|
Validation Rules
- URLs are trimmed of leading/trailing whitespace.
- Blank or empty strings are ignored.
- Malformed URIs are logged as warnings and skipped.
- Only
httpandhttpsschemes are accepted.
Seed Generation Lifecycle
1. Initialization
The system checks
taskConfig.direct_urls. If the list is null or empty, an error is logged and an empty seed list is returned.
2. Sanitization & Filtering
Each string is trimmed. The system filters out blank entries and validates the URI format. Non-HTTP(S) URLs are discarded with a warning.
3. Mapping
Valid URLs are converted into
SeedItem objects. Titles are auto-generated as "Direct URL {index + 1}".
Kotlin Implementation
// Example: Configuring a CrawlerAgentTask with DirectUrls
val taskConfig = CrawlerTaskExecutionConfigData(
direct_urls = listOf(
"https://example.com/start",
"https://example.com/about"
)
)
val seedStrategy = DirectUrls().createStrategy(task, user)
val seedItems = seedStrategy.getSeedItems(taskConfig, orchestrationConfig)
// Result: List of SeedItems ready for the CrawlerAgent