PageProcessingStrategy
A pluggable strategy interface for orchestrating web page analysis, link extraction, and crawl termination logic within the CrawlerAgentTask.
Category: Interface
Lifecycle: Stateful
Usage: High Density
⚙️ Implementation Example (Kotlin)
class DocumentationStrategy : PageProcessingStrategy {
override fun processPage(url: String, content: String, context: ProcessingContext) =
PageProcessingResult(
url = url,
pageType = PageType.Content,
extractedLinks = extractMarkdownLinks(content)
)
override fun shouldContinueCrawling(results: List, context: ProcessingContext) =
ContinuationDecision(results.size < context.maxPages)
}
→
👁️ Session Task UI
[CrawlerAgentTask] Executing...
✔ Processed: https://docs.example.com/api
└─ Extracted 14 links
✔ Processed: https://docs.example.com/setup
└─ Extracted 5 links
Status: CONTINUING (2/10 pages)
Core Interface Methods
| Method | Return Type | Description |
|---|---|---|
processPage(url, content, context) |
PageProcessingResult |
Analyzes a single page's content and extracts metadata/links. |
shouldContinueCrawling(results, context) |
ContinuationDecision |
Determines if the crawler should proceed to the next URL. |
generateFinalOutput(results, context) |
String |
Aggregates all results into the final task output. |
validateConfig(config) |
String? |
Validates strategy-specific settings before execution. |
ProcessingContext (State & Config)
| Field | Type | Description |
|---|---|---|
executionConfig |
CrawlerTaskExecutionConfigData |
Runtime parameters for the crawl. |
typeConfig |
CrawlerTaskTypeConfig |
Static configuration defined at registration. |
orchestrationConfig |
OrchestrationConfig |
Global system configuration and available tools. |
task |
SessionTask |
The active UI task for reporting progress. |
maxPages |
Int |
The maximum number of pages allowed for this crawl. |
PageProcessingResult
| Field | Type | Description |
|---|---|---|
pageType |
PageType |
Categorization (Content, Index, Error, etc.). |
extractedLinks |
List<LinkData>? |
New URLs discovered on this page. |
metadata |
Map<String, Any> |
Arbitrary data extracted (e.g., page title, author). |
shouldTerminate |
Boolean |
If true, stops the entire crawl immediately. |
error |
Throwable? |
Captured exception if processing failed. |