Frontmatter Schema

This document describes the YAML frontmatter schema used by DocProcessor to process markdown documentation files and manage relationships between documentation and source code.

Overview

DocProcessor processes markdown files that contain YAML frontmatter blocks. The frontmatter specifies how the documentation relates to source files — either as specifications that drive code generation, as documentation that should be updated based on source files, or as transformation rules between files. The processor also supports fetching and caching URL-based related resources, allowing documentation to reference external web content as context.

Frontmatter Format

Frontmatter must be enclosed between --- delimiters at the start of the markdown file:

---
key: value
list_key:
  - item1
  - item2
---

# Document content starts here

Supported Keys

specifies String | List<String>

Defines glob patterns for files that this documentation specifies. The matched files will be created or updated based on the documentation content.

Examples

# Single file
specifies: ../src/utils/helper.kt

# Single glob pattern
specifies: ../src/**/*.kt

# Multiple patterns
specifies:
  - ../src/models/*.kt
  - ../src/utils/*.kt

Glob Pattern Support

  • Simple patterns: *.kt, helper.kt
  • Recursive patterns: **/*.kt (matches files in all subdirectories)
  • Paths are resolved relative to the markdown file's directory
  • Bracket patterns: file[0-9].txt (matches character ranges)
  • Question mark patterns: file?.txt (matches single character)
  • Literal paths (without wildcards) are returned even if the file doesn't exist yet, enabling creation of new files
documents String | List<String>

Defines glob patterns for source files that this documentation describes. This is the inverse of specifies — the documentation file itself becomes the target to be updated based on the matched source files.

Examples

# Single file
documents: ../src/main/kotlin/MyClass.kt

# Multiple source files
documents:
  - ../src/**/*.kt
  - ../src/**/*.java
Use Case: Keep documentation in sync with source code changes. When source files change, the documentation can be automatically updated to reflect the current implementation.
transforms String | List<String>

Defines regex-based transformation rules that map source files to destination files. Uses regex capture groups and backreferences for flexible file mapping.

Format

sourcePattern -> destinationPattern

  • sourcePattern: A regex pattern to match source file paths (relative to the doc file's directory)
  • destinationPattern: The destination path with backreferences ($0, $1, $2, etc.)

Examples

# Single transform
transforms: src/(.+)\.java -> generated/$1.kt

# Multiple transforms
transforms:
  - src/models/(.+)\.java -> kotlin/models/$1.kt
  - src/utils/(.+)\.java -> kotlin/utils/$1.kt

Backreference Support

  • $0 — The entire matched string
  • $1, $2, etc. — Captured groups from the regex pattern
Note: Transform source patterns use Java regex syntax (not glob patterns). The regex is matched against file paths relative to the documentation file's parent directory. When rebasing, transform patterns are preserved as-is since they are resolved relative to the doc file at usage time.
Data File Detection: If a transform matches a JSON source file, it can be automatically used as a data source for template processing (see data_file under implicit frontmatter keys).
generates Map | List<Map>

Defines explicit output files to generate from specified input files. Unlike transforms, this doesn't use pattern matching — it explicitly lists the output file and its input sources.

Structure

generates:
  output: path/to/output/file
  inputs:
    - input/pattern/*.kt
    - another/input.kt

Examples

# Single generate spec
generates:
  output: ../generated/combined.kt
  inputs:
    - ../src/models/*.kt
    - ../src/utils/*.kt

# Multiple generate specs
generates:
  - output: ../generated/models.kt
    inputs:
      - ../src/models/**/*.kt
  - output: ../generated/utils.kt
    inputs:
      - ../src/utils/**/*.kt

Input Pattern Support

  • Simple globs: *.kt, models/*.kt
  • Recursive globs: **/*.kt (matches files in all subdirectories)
  • Paths are resolved relative to the markdown file's directory
  • A single string input is also accepted (converted to a single-element list)
Validation: A generate spec requires both output and inputs fields. Specs missing either field are skipped with a warning.
Use Case: Generate aggregate files, combined outputs, or files that depend on multiple input sources.
task_type String

Specifies which task type to use for processing the target files. This allows customization of how the AI processes the modification task.

Default

FileModification

Examples

# Use default file modification task
task_type: FileModification
# Use a different task type
task_type: CodeReview

Resolution Priority

When multiple specifications apply to a single target file, the task type is resolved in this order:

  1. specifies frontmatter (first non-null)
  2. transforms frontmatter (first non-null)
  3. documents frontmatter (first non-null)
  4. generates frontmatter (first non-null)
  5. Default: FileModification

Task Type Resolution

The task type name is resolved using TaskType.valueOf() with spaces removed. Unknown task type names log a warning and fall back to FileModification.

Use Case: Customize the AI's behavior when processing files. Different task types may have different prompts, validation rules, or processing strategies.
task_config_json String

Specifies a relative file path to a JSON file containing additional task type configuration. This allows for more complex configuration that would be unwieldy in YAML frontmatter.

Examples

# Reference a JSON config file
task_config_json: ./config/my-task-config.json
# Config file in parent directory
task_config_json: ../shared/task-settings.json
Use Case: Provide detailed task configuration without cluttering the frontmatter. Useful for complex task types that require many parameters or when sharing configuration across multiple documentation files.
overwrite String

Specifies the overwrite mode for this documentation file's targets. This controls how existing files are handled during processing.

Valid Values

Value Description
SkipExisting Skip files that already exist (no processing)
OverwriteExisting Always overwrite existing files with full replacement
OverwriteToUpdate Overwrite only if source/related files are newer than target
PatchExisting Always apply fuzzy patch to existing files
PatchToUpdate Apply fuzzy patch only if source/related files are newer than target (default)

Examples

# Always apply patches to existing files
overwrite: PatchExisting
# Skip processing if target exists
overwrite: SkipExisting
# Always fully overwrite
overwrite: OverwriteExisting
Use Case: Control how the processor handles existing target files. Use PatchExisting or PatchToUpdate for incremental updates that preserve manual changes. Use OverwriteExisting or OverwriteToUpdate for complete regeneration. Use SkipExisting to prevent accidental overwrites.
prompt String

Specifies a custom prompt string to use as the task description instead of the auto-generated one. Only used when there is exactly one spec for the target file.

Examples

# Custom prompt for the AI
specifies: ../src/Main.kt
prompt: Refactor this file to use coroutines instead of callbacks
Use Case: Override the default task description with a specific instruction for the AI.
template_file String

Specifies a template file to use when processing the target. The path is resolved relative to the markdown file's directory.

Examples

specifies: ../src/Generated.kt
template_file: ./templates/class-template.kt
Use Case: Provide a template that guides the structure of generated files.
data_file String

Specifies a JSON data file to use as structured data input for template processing. The path is resolved relative to the markdown file's directory.

Examples

specifies: ../src/Generated.kt
template_file: ./templates/class-template.kt
data_file: ./data/model-config.json
Implicit Detection: If no explicit data_file is specified and a transform matches a JSON source file, that JSON file is automatically used as the data source.
Use Case: Provide structured data that can be used in conjunction with templates for code generation.

Complete Example

api-documentation.md
---
specifies:
  - ../src/api/*.kt
  - ../src/models/*.kt
documents:
  - ../src/core/Engine.kt
transforms:
  - src/legacy/(.+)\.java -> src/modern/$1.kt
generates:
  output: ../generated/api-index.md
  inputs:
    - ../src/api/**/*.kt
related:
  - ../config/api-config.yaml
  - ./api-conventions.md
  - https://example.com/api-spec
overwrite: PatchExisting
task_type: FileModification
task_config_json: ./config/api-task-config.json
prompt: Update the API layer to conform to the latest specification
---

# API Documentation

This document specifies the API layer implementation...

Processing Behavior

  • Dependency Resolution: Tasks are sorted topologically so dependencies are processed before dependents. Cycles are detected and broken automatically.
  • File Resolution: All paths in frontmatter are resolved relative to the markdown file's parent directory.
  • Glob Expansion: Simple globs (*.kt) match files in the specified directory. Recursive globs (**/*.kt) match files in all subdirectories. Bracket patterns (file[0-9].txt) match character ranges. Question mark patterns (file?.txt) match single characters. For transforms, the source pattern is a regex (not a glob) that matches against file paths relative to the doc file's directory. Literal paths (without wildcards) are returned even if the target file doesn't exist, enabling file creation.
  • Multiple Specifications: A single target file can be specified by multiple documentation files. All specifications are combined when processing.
  • Overwrite Modes: The processor supports different overwrite strategies for handling existing files.

Overwrite Modes

Mode Description
SkipExisting Skip files that already exist (no processing)
OverwriteExisting Always overwrite existing files with full replacement
OverwriteToUpdate Overwrite only if source/related files are newer than target
PatchExisting Always apply fuzzy patch to existing files
PatchToUpdate Apply fuzzy patch only if source/related files are newer than target (default)

File Modification Time Checking

For OverwriteToUpdate and PatchToUpdate modes, the processor compares the target file's last modified time against:

  • The documentation file itself
  • All related files specified in the frontmatter
  • All source/input files that contribute to the target

If any of these are newer than the target, the target will be processed.

Task Description Generation

The processor automatically generates appropriate task descriptions based on the frontmatter type:

  • For specifies/transforms: Updates target files based on documentation and specifications
  • For documents: Updates documentation to reflect current source code state
  • For generates: Generates output files based on documentation and input files
  • Custom prompt: If a single spec has a prompt frontmatter key, that prompt is used directly as the task description
  • Non-file task types: Processes the file according to the task type with documentation as context

URL Fetching and Caching

Related resources specified as URLs (http:// or https://) are automatically fetched and cached locally:

  • Cache location: .doc-processor-cache/url-cache within the root directory
  • Cache TTL: 1 hour (cached content older than 1 hour is re-fetched)
  • HTML content is automatically simplified (scripts, styles, interactive elements removed)
  • Non-HTML content is stored as-is
  • Failed fetches log a warning and return null (the resource is skipped)
  • Cache files use a SHA-256 hash prefix for uniqueness

Rebasing

Both DocSpec and ModificationTask support rebasing from one root directory to another. This is used when the IntelliJ action needs to adjust paths for a different working directory. URL-based related resources are preserved as-is during rebasing.

Primary Source Resolution

When determining the primary source file for overwrite mode checks, the priority is:

  1. First transform's source file
  2. First spec's doc file
  3. First document match's first supporting file (or doc file if no supporting files)
  4. First generate match's first input file (or doc file if no input files)

Error Handling

  • Invalid frontmatter YAML will cause parsing to fail
  • Missing required fields in generates entries (like output) will result in incomplete specifications
  • Invalid regex patterns in transforms will cause matching to fail for those rules
  • Unknown task_type values will log a warning and fall back to FileModification
  • Invalid task_config_json paths will be stored but may cause errors during task execution
  • Files without frontmatter (not starting with ---) return null (silently skipped)
  • Files with unclosed frontmatter (no closing ---) return null
  • Files with frontmatter but no specifies, transforms, documents, or generates keys return null
  • Non-existent files referenced in related are still returned (downstream code handles them)
  • URL fetch failures log a warning and return null (the resource is skipped)
  • Errors processing individual target files are caught and logged; other targets continue processing

Data Structures

The frontmatter is parsed into a DocSpec containing:

Field Type Description
docFile File The markdown file itself
specifies List<String> Glob patterns for files this doc specifies
documents List<String> Glob patterns for files this doc describes
transforms List<TransformSpec> Source-to-destination transformation rules
generates List<GenerateSpec> Explicit generation specifications
related List<String> Additional context files or URLs
taskType String? Task type to use for processing (nullable, defaults to FileModification)
taskConfigJson String? Path to JSON file with additional task configuration (nullable)
content String The markdown body (after frontmatter)
frontmatter Map<String, Any> Raw parsed frontmatter
Note: The overwrite mode is not stored in DocSpec — it is configured at the DocProcessor level and applies to all targets processed by that instance.

TransformSpec

Field Type Description
sourcePattern String Regex pattern to match source files
destinationPattern String Destination pattern with backreferences

GenerateSpec

Field Type Description
output String The output file path (relative to doc file)
inputs List<String> Glob patterns for input files

ModificationTaskConfig

Represents the configuration for a single modification task:

Field Type Description
files List<String>? Target file paths (relative to root)
related_files List<String>? Related/context file paths (relative to root)
task_description String Generated or custom task description
template_file String? Path to template file (nullable)
data Map<String, Any>? Structured data from data_file or JSON source (nullable)

ModificationTask

Represents a complete modification task ready for execution:

Field Type Description
data ModificationTaskConfig Task configuration
message String Message content (context files or execute command)
patchProcessor PatchProcessors Patch processing strategy (default: Fuzzy)
shouldDeleteTarget Boolean Whether to delete the target file (default: false)
taskType TaskType<*, *> The resolved task type (default: FileModification)

Additional Processing Classes

TransformMatch

Represents a matched transformation from source to destination:

Field Type Description
sourceFile File The matched source file
destinationFile File The computed destination file
spec DocSpec The originating doc specification

GenerateMatch

Represents a matched generation specification:

Field Type Description
outputFile File The output file to generate
inputFiles List<File> The resolved input files
spec DocSpec The originating doc specification

DocumentMatch

Represents a documentation update specification:

Field Type Description
docSpec DocSpec The doc specification (target is the doc file itself)
supportingFiles List<File> Source files that provide context

Implementation Notes

Frontmatter Parsing

The frontmatter is parsed using a custom simple YAML parser (not SnakeYAML). The parser handles string values, list values, and map values (for generates).

  • Lines are split on the first colon to extract key-value pairs
  • Lines without colons are ignored
  • If the value after the colon is empty, the parser looks for subsequent list items (lines starting with - )
  • Empty keys (colon with no value and no subsequent list items) are not added to the result map
  • Values are trimmed of whitespace

Transform Pattern Matching

Transform patterns use Java regex syntax. The source pattern is matched against file paths relative to the documentation file's directory. When a match is found:

  1. The regex is applied to the relative file path
  2. Capture groups are extracted from the match
  3. Backreferences ($0, $1, etc.) in the destination pattern are replaced with the captured values
  4. The destination path is resolved relative to the documentation file's directory

IntelliJ Integration

The DocProcessorAction provides an IntelliJ IDE action that:

  1. Filters selected files to markdown files (.md or .markdown extensions)
  2. Creates a DocProcessor instance with the configured fast and smart models
  3. Calls getAll() to collect all modification tasks from the selected files
  4. Shows a DocProcessorTaskDialog with a checklist of tasks for user selection
  5. Executes the first selected task via SingleTaskApp in a browser session

The action is available through the DocProcessorActionGroup which provides a submenu with all overwrite mode options:

Label Mode
🚫 Skip Existing Files SkipExisting
🔄 Overwrite All Files OverwriteExisting
📅 Overwrite Outdated Files OverwriteToUpdate
🩹 Patch Existing Files PatchExisting
📝 Patch Outdated Files PatchToUpdate

The dialog includes an "Auto-fix issues" checkbox and displays task details including target files and related files.