Selenium Fetch Strategy
High-fidelity web content extraction using headless Chrome. Designed for JavaScript-heavy applications where standard HTTP clients fail to render the DOM.
Side-Effect: Safe
JS-Enabled
Auto-Fallback
⚙️ Fetch Configuration
{
"method": "Selenium",
"url": "https://app.example.com/dashboard",
"wait_for_selector": ".main-content",
"timeout": 30000
}
→
👁️ Rendered Output
✔ DOM Fully Rendered (142kb)
<div id="root">
<nav class="sidebar">...</nav>
<main class="main-content">
<h1>Welcome, User</h1>
...
</main>
</div>
<nav class="sidebar">...</nav>
<main class="main-content">
<h1>Welcome, User</h1>
...
</main>
</div>
Fetch Parameters
| Parameter | Type | Description |
|---|---|---|
url * |
String | The target URL to navigate to and extract source from. |
pool |
ExecutorService | Thread pool used for managing the Selenium driver lifecycle. |
webSearchDir |
File | Directory for storing temporary artifacts or search context. |
isSeleniumEnabled |
Boolean | Global toggle. If a fetch fails, this is automatically set to false to trigger fallback. |
Resource Usage
Token Consumption: Low (Retrieval only). Output length depends on target DOM density.
Compute: High (Requires local Chrome/Chromium instance).
Task Lifecycle
- Initialization: Checks if
task.seleniumis null. If so, instantiatesSelenium2S3with a freshchromeDriver. - Navigation: Commands the driver to navigate to the target
url. - Extraction: Retrieves the full page source via
getPageSource(). - Cleanup: Executes
quit()on the driver in afinallyblock to ensure no zombie processes remain. - Error Handling: If an exception occurs,
FetchConfig.isSeleniumEnabledis disabled and the system retries using the next available strategy (e.g., HttpClient).
Kotlin Implementation
Selenium.kt
class Selenium : FetchMethodFactory {
override fun createStrategy(task: CrawlerAgentTask): FetchStrategy = object : FetchStrategy {
override fun fetch(url: String, ...): String {
return try {
task.selenium = Selenium2S3(pool, driver = chromeDriver())
task.selenium?.navigate(url)
task.selenium?.getPageSource() ?: ""
} finally {
task.selenium?.quit()
}
}
}
}