GoogleSearch
High-precision web discovery using the Google Custom Search API. Automatically populates crawler queues with relevant entry points based on targeted search queries.
Category: Seed Strategy
Auth: Google API Key
Side-Effect: Safe
CrawlerTaskExecutionConfigData
JSON
{
"search_query": "site:github.com 'cognotik' architecture",
"max_seeds": 20
}
→
SeedItem Discovery
UI Render
Cognotik/docs/architecture.md
https://github.com/Cognotik/docs/blob/main/architecture.md
Technical overview of the Cognotik task orchestration engine and bento-grid UI implementation...
Industrial Design System - Cognotik
https://github.com/Cognotik/design/wiki/Industrial-Theme
Standards for utility-first developer interfaces, focusing on density and IDE-native aesthetics...
Execution Parameters
| Field | Type | Description |
|---|---|---|
search_query* |
String |
The search terms used to query Google. Supports advanced operators (site:, filetype:, etc.). |
max_seeds |
Int |
Maximum number of results to return. Hard-capped at 20 to respect API limits. Default: 20. |
Required Credentials
This task requires the following configured in User Settings:
APIProvider.GoogleKey: A valid API key from the Google Cloud Console.Search Engine ID (CX): The unique identifier for your Custom Search Engine.
Task Lifecycle
- Validation: Ensures
search_queryis present and non-blank. - Authentication: Retrieves and decrypts the Google API key and Engine ID from
UserSettingsManager. - Request Construction: Encodes the query and builds the URI for
googleapis.com/customsearch/v1. - Execution: Performs an asynchronous GET request with a 30-second timeout and custom User-Agent.
- Error Handling:
401: Invalid API Key.403: Quota exceeded or access forbidden.429: Rate limit hit.
- Mapping: Parses JSON response into
SeedItemobjects, capturinglink,title,snippet, andpagemapdata.
Kotlin Implementation
Add this to your OrchestrationConfig to enable Google-backed discovery:
val searchTask = CrawlerAgentTask(
config = CrawlerTaskExecutionConfigData(
search_query = "latest transformer research 2024",
seed_method = "GoogleSearch"
)
)
// The engine will automatically use GoogleSearch factory
// to generate the SeedStrategy.