Agent Types
Cognotik is designed around a strongly typed, object-oriented approach to LLM interaction.
At the core is the abstract BaseAgent<I, R>, where I is the Input type and R is the Result type.
1. Core Abstraction: BaseAgent
File: BaseAgent.kt
All agents inherit from this class. It standardizes how inputs are converted into chat messages and how responses are returned.
Generic Types
- IInput type (e.g.,
List<String>,CodeRequest) - RReturn type (e.g.,
String,ParsedResponse<T>)
Key Properties
- model: The
ChatInterface(the LLM provider, e.g., OpenAI, Anthropic). - temperature: Controls randomness (0.0 for deterministic, 1.0 for creative).
- prompt: The system prompt/instructions.
- name: Optional identifier for the agent.
Key Methods
- respond(input, messages): Core method that processes input and returns a result. Can be called with custom messages.
- answer(input): Convenience method that automatically generates chat messages from input and calls
respond(). - chatMessages(input): Abstract method that subclasses must implement to convert input into chat messages.
- withModel(model): Abstract method that returns a new agent instance with a different model.
- response(messages, model): Protected method that sends messages to the model and returns the raw response.
2. Text & Conversational Agents
File: ChatAgent.kt
The standard agent for conversational text generation. It takes a history of strings and returns a raw string response.
- Input
List<String>(A list of user messages/questions) - Output
String(The raw content of the LLM's response)
It constructs a chat history starting with the system prompt, followed by each string in the input list as a separate user message.
val agent = ChatAgent(
prompt = "You are a helpful assistant.",
model = myChatModel,
temperature = 0.7
)
val response = agent.respond(listOf("Hello", "Tell me a joke"))
Chatbots, summarization, creative writing, and general Q&A.
3. Structured Data Agents (JSON/POJO)
These agents are designed to force the LLM to output structured data (JSON) which is then automatically deserialized into Kotlin/Java objects.
File: ParsedAgent.kt
Converts natural language input into a specific class instance (T).
- Input
List<String>(Instructions) - Output
ParsedResponse<T>(Contains the raw text and the deserialized objectobj)
Key Features
- Schema Generation: Uses TypeDescriber to inject YAML schemas into prompts.
- Validation: Runs
ValidatedObjectlogic after deserialization. - Retries: If JSON parsing fails, it can retry with adjusted parameters (
deserializerRetries). - Single vs. Two-Stage:
singleStage = false(default) uses a dedicated parser agent to extract and format JSON from the response. - Custom Parser Prompt: The
parserPromptparameter allows additional instructions to the parser agent.
Data extraction, converting unstructured text to structured data, API payload generation.
File: ParsedResponse.kt
A wrapper holding both the raw text and the deserialized obj.
- text: The raw response from the LLM.
- obj: The deserialized object of type
T. - clazz: The class of the deserialized object.
- map(cls, fn): Transforms the object using a function while preserving the original text.
val response: ParsedResponse<User> = agent.respond(listOf("Extract user info"))
val transformed: ParsedResponse<UserDTO> = response.map(UserDTO::class.java) { user ->
UserDTO(user.name.uppercase(), user.email)
}
File: ParsedImageAgent.kt
Similar to ParsedAgent, but accepts images as input. It performs Visual Question Answering (VQA) where the answer is a structured object.
- Input
List<ImageAndText>(Text prompts paired withBufferedImage) - Output
ParsedResponse<T>
- Images are automatically encoded to Base64 PNG format and embedded in the request.
- Supports vision-capable models (e.g., GPT-4-Vision, Claude 3).
- Includes automatic retry logic with
deserializerRetries.
Extracting data from invoices, describing UI elements in JSON, categorizing visual content, OCR with structured output.
File: ProxyAgent.kt
Note: Does not inherit BaseAgent.
This is a "Magic" agent. It creates a dynamic Java Proxy for a given interface or class. When you call a method on the proxy, the arguments are serialized, sent to the LLM, and the LLM "executes" the logic, returning the result.
- Dynamic Proxy Creation: Uses Java reflection to intercept method calls.
- Automatic Serialization: Arguments are converted to JSON and sent to the LLM.
- Retry Logic: Automatically retries failed calls with adjusted temperature (
maxRetries). - Validation: If the return type implements
ValidatedObject, validation is performed. - Examples: You can provide example input/output pairs via
addExample()to improve accuracy. - Metrics: Track performance via the
metricsproperty, which includes request counts per method.
interface SentimentAnalyzer {
fun analyze(text: String): SentimentResult
}
val proxy = ProxyAgent(SentimentAnalyzer::class.java, model).create()
val result = proxy.analyze("I love this library!") // LLM determines the return value
Rapid prototyping, implementing complex logic without writing code, semantic routing, simulating service behavior.
Schema Best Practices
To ensure reliable parsing and validation with ParsedAgent and ParsedImageAgent, follow these guidelines when defining your data classes:
- Constructors: All fields should have a default value to ensure a no-argument constructor exists.
- Mutability: Using
varin data objects is recommended. - Nullability: Nullable types are fully supported and handled well by Kotlin.
- Validation: Implement
ValidatedObjectto ensure validity. Use validation logic to modifyvarproperties for canonicalization rather than just rejecting data. - Documentation: Use
@Descriptionto provide semantic guidance to the Parser LLM. - Naming: Use
user_name(snake_case) for JSON compatibility. - Dynamic Types: Using
Anytypes is appropriate for dynamic schemas. These will be deserialized as Lists and Maps according to Jackson defaults.
Validation Tip: Do not be too strict. Use validation to canonicalize data (e.g., fixing formatting) rather than just rejecting it.
4. Action & Code Agents
File: CodeAgent.kt
An autonomous agent capable of writing, executing, and fixing code in a sandboxed runtime environment.
Input: CodeRequest
- messages: Chat history with role information.
- codePrefix: Code to prepend before execution (e.g., imports, setup).
- autoEvaluate: Whether to automatically execute and fix code.
- fixIterations: Number of times to attempt fixing failed code.
- fixRetries: Number of times to retry the entire coding process.
Output: CodeResult
- code: The final generated code.
- status: One of
Coding,Correcting,Success, orFailure. - result: Contains
resultValueandresultOutput. - renderedResponse: The LLM's text response alongside the code.
Key Components
- CodeRuntime: The environment where code runs (e.g., a Kotlin script engine, JavaScript engine).
- symbols: A map of objects injected into the script context (allows the agent to control your application).
- language: Automatically determined from the
CodeRuntime(e.g., "kotlin", "javascript"). - codeInterceptor: A function that can transform generated code before execution (useful for logging, sanitization, or instrumentation).
- evalFormat: When true, instructs the LLM to structure code as parameterized functions with a final invocation.
- fallbackModel: An optional secondary model to use if the primary model fails to generate valid code.
- describer: A
TypeDescriberthat generates API documentation for the injected symbols.
Self-Correction Loop
If autoEvaluate is true, the agent executes the code. If it throws an exception, the agent feeds the error back to the LLM to generate a fix (up to fixIterations times). If the code still fails, it retries the entire process (up to fixRetries times).
val agent = CodeAgent(
codeRuntime = KotlinScriptRuntime(),
symbols = mapOf("api" to myApiClient),
model = myChatModel,
temperature = 0.1
)
val result = agent.respond(CodeAgent.CodeRequest(
messages = listOf("Calculate the sum of 1 to 100" to Role.user),
autoEvaluate = true,
fixIterations = 3
))
println(result.code)
println(result.result.resultOutput)
Data analysis, complex math, controlling external APIs via script, tasks requiring iterative logic, automation.
5. Media Agents
File: ImageAndText.kt
A simple data class that pairs text with an optional image.
- text: Text content.
- image: Optional
BufferedImage(null if not present).
Passing multimodal data to agents that accept both text and images.
File: ImageGenerationAgent.kt
Generates images from text descriptions.
- Input
List<String>(User instructions) - Output
ImageAndText(The generated image and the refined prompt used)
Key Components
- textModel: A
ChatInterfaceused to refine the user's request into an optimized image generation prompt. - imageModel: The image generation model (e.g., DALL-E 3).
- imageClient: The
ImageClientInterfacethat communicates with the image generation service. - width / height: Dimensions of the generated image (default: 1024x1024).
Workflow
- Refinement: Uses the text LLM to transform the user request into an optimized image generation prompt.
- Length Validation: If the prompt exceeds the model's
maxPromptlimit, it's automatically shortened. - Generation: Sends the refined prompt to the
ImageClientInterface. - Decoding: Handles both URL-based and Base64-encoded image responses.
Creating assets, visualizing concepts, generating illustrations.
File: ImageProcessingAgent.kt
Handles Vision tasks. It can analyze images or (depending on the backend model) edit them.
- Input
List<ImageAndText>(Text prompts paired with images) - Output
ImageAndText(Analyzed text and optionally a modified image)
- Encodes images to Base64 PNG format and sends them alongside text to a vision-capable model.
- Supports multimodal responses (text and/or image).
- Falls back to input image if the model doesn't return an image.
Image captioning, visual analysis, describing scenes, OCR, image editing (with capable models).
Summary Table
| Agent Class | Input Type | Output Type | Primary Use Case |
|---|---|---|---|
ChatAgent | List<String> | String | Conversation, Q&A |
ParsedAgent<T> | List<String> | ParsedResponse<T> | Text-to-Object, Data Extraction |
CodeAgent | CodeRequest | CodeResult | Writing & Executing Code, Tool Use |
ImageGenerationAgent | List<String> | ImageAndText | Creating Images from text |
ImageProcessingAgent | List<ImageAndText> | ImageAndText | Analyzing/Captioning Images |
ParsedImageAgent<T> | List<ImageAndText> | ParsedResponse<T> | Image-to-Object (Visual Data Extraction) |
ProxyAgent<T> | Method Args | Method Return | Implementing Interfaces via LLM |
Advanced Topics
Code Interception in CodeAgent
The codeInterceptor function allows you to transform code before execution. This is useful for:
- Logging: Wrap code execution with logging statements.
- Sanitization: Remove or modify dangerous operations.
- Instrumentation: Add performance monitoring.
val agent = CodeAgent(
codeRuntime = runtime,
model = model,
codeInterceptor = { code ->
"println(\"Executing code...\")\n$code\nprintln(\"Code executed.\")"
}
)
Fallback Models in CodeAgent
If the primary model fails to generate valid code after all retries, the fallbackModel is used. This is useful for:
- Reliability: Use a more capable model as a fallback.
- Cost Optimization: Use a cheaper model first, then fall back to a more expensive one.
val agent = CodeAgent(
codeRuntime = runtime,
model = cheaperModel,
fallbackModel = moreCapableModel,
temperature = 0.1
)
Example-Based Learning in ProxyAgent
Improve ProxyAgent accuracy by providing examples:
val agent = ProxyAgent(MyInterface::class.java, model)
// Add examples
agent.addExample(SentimentResult(score = 0.9, label = "positive")) { proxy ->
proxy.analyze("I love this!")
}
agent.addExample(SentimentResult(score = 0.1, label = "negative")) { proxy ->
proxy.analyze("This is terrible.")
}
val finalProxy = agent.create()