Agent Types
Cognotik is designed around a strongly typed, object-oriented approach to LLM interaction.
At the core is the abstract BaseAgent<I, R>.
1. Core Abstraction: BaseAgent
File: BaseAgent.kt
Standardizes how inputs are converted into chat messages and how responses are returned.
Generic Types
- IInput type (e.g.,
List<String>,CodeRequest) - RReturn type (e.g.,
String,ParsedResponse<T>)
Key Methods
- respond(input, messages): Core processing method.
- answer(input): Convenience method for auto-generating messages.
- withModel(model): Returns a new instance with a different model.
- response(messages, model): Sends raw messages to the model.
2. Text & Conversational Agents
The standard agent for conversational text generation. Takes a history of strings and returns a raw string response.
val agent = ChatAgent(prompt = "Helpful assistant.", model = myModel)
val response = agent.respond(listOf("Hello", "Tell me a joke"))
Chatbots, summarization, and general Q&A.
3. Structured Data Agents (JSON/POJO)
Converts natural language into a specific class instance (T).
Key Features
- Schema Generation: Uses TypeDescriber to inject YAML schemas into prompts.
- Single vs. Two-Stage:
singleStage = false(default) uses a dedicated parser agent for reliability. - Validation: Runs
ValidatedObjectlogic after deserialization.
A wrapper holding both the raw text and the deserialized obj.
val response: ParsedResponse<User> = agent.respond(listOf("Extract user"))
val transformed = response.map(UserDTO::class.java) { user -> UserDTO(user.name) }
Creates a dynamic Java Proxy. Method calls are serialized to JSON, executed by the LLM, and returned as typed results.
- Metrics: Tracks performance and request counts per method.
- Examples: Supports
addExample()for few-shot learning.
Schema Best Practices
- Constructors: Provide default values for all fields.
- Naming: Use
user_name(snake_case) for JSON compatibility. - Documentation: Use
@Descriptionfor semantic guidance.
Validation Tip: Use validation to canonicalize data (e.g., fixing formatting) rather than just rejecting it.
4. Action & Code Agents
Autonomous agent that writes, executes, and fixes code in a sandboxed environment.
Key Components
- CodeRuntime: The execution environment (Kotlin, JS, etc.).
- Symbols: Objects injected into the script context for tool use.
- Self-Correction: Automatically feeds exceptions back to the LLM to generate fixes.
- Interception:
codeInterceptorallows transforming code before execution.
5. Media Agents
Data class pairing text: String with an optional image: BufferedImage?.
Uses a text LLM to refine prompts before sending them to an image model (e.g., DALL-E 3).
Handles vision tasks like captioning or OCR by encoding images to Base64 PNG.
Summary Table
| Agent Class | Input | Output | Use Case |
|---|---|---|---|
ChatAgent | List<String> | String | Conversation |
ParsedAgent<T> | List<String> | ParsedResponse<T> | Data Extraction |
CodeAgent | CodeRequest | CodeResult | Tool Use / Automation |
ImageGenerationAgent | List<String> | ImageAndText | Asset Creation |
ProxyAgent<T> | Method Args | Method Return | Dynamic Logic |
Advanced Topics
Code Interception
Wrap execution with logging or sanitization:
codeInterceptor = { code -> "println(\"Starting...\")\n$code" }
Fallback Models
Use a cheaper model first, falling back to a more capable one on failure:
CodeAgent(model = gpt35, fallbackModel = gpt4)