Agent Types

Cognotik is designed around a strongly typed, object-oriented approach to LLM interaction. At the core is the abstract BaseAgent<I, R>.

1. Core Abstraction: BaseAgent

BaseAgent<I, R> Abstract

File: BaseAgent.kt

Standardizes how inputs are converted into chat messages and how responses are returned.

Generic Types

  • IInput type (e.g., List<String>, CodeRequest)
  • RReturn type (e.g., String, ParsedResponse<T>)

Key Methods

  • respond(input, messages): Core processing method.
  • answer(input): Convenience method for auto-generating messages.
  • withModel(model): Returns a new instance with a different model.
  • response(messages, model): Sends raw messages to the model.

2. Text & Conversational Agents

ChatAgent Stable
Inheritance: BaseAgent<List<String>, String>

The standard agent for conversational text generation. Takes a history of strings and returns a raw string response.

kotlin
val agent = ChatAgent(prompt = "Helpful assistant.", model = myModel)
val response = agent.respond(listOf("Hello", "Tell me a joke"))
Best For

Chatbots, summarization, and general Q&A.

3. Structured Data Agents (JSON/POJO)

ParsedAgent<T>
Inheritance: BaseAgent<List<String>, ParsedResponse<T>>

Converts natural language into a specific class instance (T).

Key Features

  • Schema Generation: Uses TypeDescriber to inject YAML schemas into prompts.
  • Single vs. Two-Stage: singleStage = false (default) uses a dedicated parser agent for reliability.
  • Validation: Runs ValidatedObject logic after deserialization.
ParsedResponse<T>

A wrapper holding both the raw text and the deserialized obj.

kotlin
val response: ParsedResponse<User> = agent.respond(listOf("Extract user"))
val transformed = response.map(UserDTO::class.java) { user -> UserDTO(user.name) }
ProxyAgent<T>Advanced

Creates a dynamic Java Proxy. Method calls are serialized to JSON, executed by the LLM, and returned as typed results.

  • Metrics: Tracks performance and request counts per method.
  • Examples: Supports addExample() for few-shot learning.

Schema Best Practices

  • Constructors: Provide default values for all fields.
  • Naming: Use user_name (snake_case) for JSON compatibility.
  • Documentation: Use @Description for semantic guidance.

Validation Tip: Use validation to canonicalize data (e.g., fixing formatting) rather than just rejecting it.

4. Action & Code Agents

CodeAgent Core
Inheritance: BaseAgent<CodeRequest, CodeResult>

Autonomous agent that writes, executes, and fixes code in a sandboxed environment.

Key Components

  • CodeRuntime: The execution environment (Kotlin, JS, etc.).
  • Symbols: Objects injected into the script context for tool use.
  • Self-Correction: Automatically feeds exceptions back to the LLM to generate fixes.
  • Interception: codeInterceptor allows transforming code before execution.

5. Media Agents

ImageAndText

Data class pairing text: String with an optional image: BufferedImage?.

ImageGenerationAgent

Uses a text LLM to refine prompts before sending them to an image model (e.g., DALL-E 3).

ImageProcessingAgent

Handles vision tasks like captioning or OCR by encoding images to Base64 PNG.

Summary Table

Agent ClassInputOutputUse Case
ChatAgentList<String>StringConversation
ParsedAgent<T>List<String>ParsedResponse<T>Data Extraction
CodeAgentCodeRequestCodeResultTool Use / Automation
ImageGenerationAgentList<String>ImageAndTextAsset Creation
ProxyAgent<T>Method ArgsMethod ReturnDynamic Logic

Advanced Topics

Code Interception

Wrap execution with logging or sanitization:

kotlin
codeInterceptor = { code -> "println(\"Starting...\")\n$code" }

Fallback Models

Use a cheaper model first, falling back to a more capable one on failure:

kotlin
CodeAgent(model = gpt35, fallbackModel = gpt4)