Frontmatter Schema
This document describes the YAML frontmatter schema used by DocProcessor
to process markdown documentation files and manage relationships between documentation
and source code.
Overview
DocProcessor processes markdown files that contain YAML frontmatter blocks.
The frontmatter specifies how the documentation relates to source files — either as
specifications that drive code generation, as documentation that should be updated
based on source files, or as transformation rules between files.
The processor also supports fetching and caching URL-based related resources, allowing
documentation to reference external web content as context.
Frontmatter Format
Frontmatter must be enclosed between --- delimiters at the start of the markdown file:
---
key: value
list_key:
- item1
- item2
---
# Document content starts here
Supported Keys
specifies
String | List<String>
Defines glob patterns for files that this documentation specifies. The matched files will be created or updated based on the documentation content.
Examples
# Single file
specifies: ../src/utils/helper.kt
# Single glob pattern
specifies: ../src/**/*.kt
# Multiple patterns
specifies:
- ../src/models/*.kt
- ../src/utils/*.kt
Glob Pattern Support
- Simple patterns:
*.kt,helper.kt - Recursive patterns:
**/*.kt(matches files in all subdirectories) - Paths are resolved relative to the markdown file's directory
- Bracket patterns:
file[0-9].txt(matches character ranges) - Question mark patterns:
file?.txt(matches single character) - Literal paths (without wildcards) are returned even if the file doesn't exist yet, enabling creation of new files
documents
String | List<String>
Defines glob patterns for source files that this documentation describes.
This is the inverse of specifies — the documentation file itself
becomes the target to be updated based on the matched source files.
Examples
# Single file
documents: ../src/main/kotlin/MyClass.kt
# Multiple source files
documents:
- ../src/**/*.kt
- ../src/**/*.java
transforms
String | List<String>
Defines regex-based transformation rules that map source files to destination files. Uses regex capture groups and backreferences for flexible file mapping.
Format
sourcePattern -> destinationPattern
sourcePattern: A regex pattern to match source file paths (relative to the doc file's directory)destinationPattern: The destination path with backreferences ($0,$1,$2, etc.)
Examples
# Single transform
transforms: src/(.+)\.java -> generated/$1.kt
# Multiple transforms
transforms:
- src/models/(.+)\.java -> kotlin/models/$1.kt
- src/utils/(.+)\.java -> kotlin/utils/$1.kt
Backreference Support
$0— The entire matched string$1,$2, etc. — Captured groups from the regex pattern
data_file
under implicit frontmatter keys).
generates
Map | List<Map>
Defines explicit output files to generate from specified input files.
Unlike transforms, this doesn't use pattern matching — it explicitly
lists the output file and its input sources.
Structure
generates:
output: path/to/output/file
inputs:
- input/pattern/*.kt
- another/input.kt
Examples
# Single generate spec
generates:
output: ../generated/combined.kt
inputs:
- ../src/models/*.kt
- ../src/utils/*.kt
# Multiple generate specs
generates:
- output: ../generated/models.kt
inputs:
- ../src/models/**/*.kt
- output: ../generated/utils.kt
inputs:
- ../src/utils/**/*.kt
Input Pattern Support
- Simple globs:
*.kt,models/*.kt - Recursive globs:
**/*.kt(matches files in all subdirectories) - Paths are resolved relative to the markdown file's directory
- A single string input is also accepted (converted to a single-element list)
output and
inputs fields. Specs missing either field are skipped with a warning.
task_type
String
Specifies which task type to use for processing the target files. This allows customization of how the AI processes the modification task.
Default
FileModification
Examples
# Use default file modification task
task_type: FileModification
# Use a different task type
task_type: CodeReview
Resolution Priority
When multiple specifications apply to a single target file, the task type is resolved in this order:
specifiesfrontmatter (first non-null)transformsfrontmatter (first non-null)documentsfrontmatter (first non-null)generatesfrontmatter (first non-null)- Default:
FileModification
Task Type Resolution
The task type name is resolved using TaskType.valueOf() with spaces removed.
Unknown task type names log a warning and fall back to FileModification.
task_config_json
String
Specifies a relative file path to a JSON file containing additional task type configuration. This allows for more complex configuration that would be unwieldy in YAML frontmatter.
Examples
# Reference a JSON config file
task_config_json: ./config/my-task-config.json
# Config file in parent directory
task_config_json: ../shared/task-settings.json
overwrite
String
Specifies the overwrite mode for this documentation file's targets. This controls how existing files are handled during processing.
Valid Values
| Value | Description |
|---|---|
SkipExisting |
Skip files that already exist (no processing) |
OverwriteExisting |
Always overwrite existing files with full replacement |
OverwriteToUpdate |
Overwrite only if source/related files are newer than target |
PatchExisting |
Always apply fuzzy patch to existing files |
PatchToUpdate |
Apply fuzzy patch only if source/related files are newer than target (default) |
Examples
# Always apply patches to existing files
overwrite: PatchExisting
# Skip processing if target exists
overwrite: SkipExisting
# Always fully overwrite
overwrite: OverwriteExisting
PatchExisting or PatchToUpdate for incremental updates that
preserve manual changes.
Use OverwriteExisting or OverwriteToUpdate for complete regeneration.
Use SkipExisting to prevent accidental overwrites.
prompt
String
Specifies a custom prompt string to use as the task description instead of the auto-generated one. Only used when there is exactly one spec for the target file.
Examples
# Custom prompt for the AI
specifies: ../src/Main.kt
prompt: Refactor this file to use coroutines instead of callbacks
template_file
String
Specifies a template file to use when processing the target. The path is resolved relative to the markdown file's directory.
Examples
specifies: ../src/Generated.kt
template_file: ./templates/class-template.kt
data_file
String
Specifies a JSON data file to use as structured data input for template processing. The path is resolved relative to the markdown file's directory.
Examples
specifies: ../src/Generated.kt
template_file: ./templates/class-template.kt
data_file: ./data/model-config.json
data_file is specified and a
transform matches a JSON source file, that JSON file is automatically used as the data source.
Complete Example
---
specifies:
- ../src/api/*.kt
- ../src/models/*.kt
documents:
- ../src/core/Engine.kt
transforms:
- src/legacy/(.+)\.java -> src/modern/$1.kt
generates:
output: ../generated/api-index.md
inputs:
- ../src/api/**/*.kt
related:
- ../config/api-config.yaml
- ./api-conventions.md
- https://example.com/api-spec
overwrite: PatchExisting
task_type: FileModification
task_config_json: ./config/api-task-config.json
prompt: Update the API layer to conform to the latest specification
---
# API Documentation
This document specifies the API layer implementation...
Processing Behavior
- Dependency Resolution: Tasks are sorted topologically so dependencies are processed before dependents. Cycles are detected and broken automatically.
- File Resolution: All paths in frontmatter are resolved relative to the markdown file's parent directory.
-
Glob Expansion: Simple globs (
*.kt) match files in the specified directory. Recursive globs (**/*.kt) match files in all subdirectories. Bracket patterns (file[0-9].txt) match character ranges. Question mark patterns (file?.txt) match single characters. Fortransforms, the source pattern is a regex (not a glob) that matches against file paths relative to the doc file's directory. Literal paths (without wildcards) are returned even if the target file doesn't exist, enabling file creation. - Multiple Specifications: A single target file can be specified by multiple documentation files. All specifications are combined when processing.
- Overwrite Modes: The processor supports different overwrite strategies for handling existing files.
Overwrite Modes
| Mode | Description |
|---|---|
SkipExisting |
Skip files that already exist (no processing) |
OverwriteExisting |
Always overwrite existing files with full replacement |
OverwriteToUpdate |
Overwrite only if source/related files are newer than target |
PatchExisting |
Always apply fuzzy patch to existing files |
PatchToUpdate |
Apply fuzzy patch only if source/related files are newer than target (default) |
File Modification Time Checking
For OverwriteToUpdate and PatchToUpdate modes, the processor compares the
target file's last modified time against:
- The documentation file itself
- All related files specified in the frontmatter
- All source/input files that contribute to the target
If any of these are newer than the target, the target will be processed.
Task Description Generation
The processor automatically generates appropriate task descriptions based on the frontmatter type:
-
For
specifies/transforms: Updates target files based on documentation and specifications -
For
documents: Updates documentation to reflect current source code state -
For
generates: Generates output files based on documentation and input files -
Custom prompt: If a single spec has a
promptfrontmatter key, that prompt is used directly as the task description - Non-file task types: Processes the file according to the task type with documentation as context
URL Fetching and Caching
Related resources specified as URLs (http:// or https://) are automatically
fetched and cached locally:
- Cache location:
.doc-processor-cache/url-cachewithin the root directory - Cache TTL: 1 hour (cached content older than 1 hour is re-fetched)
- HTML content is automatically simplified (scripts, styles, interactive elements removed)
- Non-HTML content is stored as-is
- Failed fetches log a warning and return null (the resource is skipped)
- Cache files use a SHA-256 hash prefix for uniqueness
Rebasing
Both DocSpec and ModificationTask support rebasing from one root directory
to another. This is used when the IntelliJ action needs to adjust paths for a different working
directory. URL-based related resources are preserved as-is during rebasing.
Primary Source Resolution
When determining the primary source file for overwrite mode checks, the priority is:
- First transform's source file
- First spec's doc file
- First document match's first supporting file (or doc file if no supporting files)
- First generate match's first input file (or doc file if no input files)
Error Handling
- Invalid frontmatter YAML will cause parsing to fail
- Missing required fields in
generatesentries (likeoutput) will result in incomplete specifications - Invalid regex patterns in
transformswill cause matching to fail for those rules - Unknown
task_typevalues will log a warning and fall back toFileModification - Invalid
task_config_jsonpaths will be stored but may cause errors during task execution - Files without frontmatter (not starting with
---) return null (silently skipped) - Files with unclosed frontmatter (no closing
---) return null - Files with frontmatter but no
specifies,transforms,documents, orgenerateskeys return null - Non-existent files referenced in
relatedare still returned (downstream code handles them) - URL fetch failures log a warning and return null (the resource is skipped)
- Errors processing individual target files are caught and logged; other targets continue processing
Data Structures
The frontmatter is parsed into a DocSpec containing:
| Field | Type | Description |
|---|---|---|
docFile |
File |
The markdown file itself |
specifies |
List<String> |
Glob patterns for files this doc specifies |
documents |
List<String> |
Glob patterns for files this doc describes |
transforms |
List<TransformSpec> |
Source-to-destination transformation rules |
generates |
List<GenerateSpec> |
Explicit generation specifications |
related |
List<String> |
Additional context files or URLs |
taskType |
String? |
Task type to use for processing (nullable, defaults to FileModification) |
taskConfigJson |
String? |
Path to JSON file with additional task configuration (nullable) |
content |
String |
The markdown body (after frontmatter) |
frontmatter |
Map<String, Any> |
Raw parsed frontmatter |
overwrite mode is not stored in DocSpec —
it is configured at the DocProcessor level and applies to all targets processed by
that instance.
TransformSpec
| Field | Type | Description |
|---|---|---|
sourcePattern |
String |
Regex pattern to match source files |
destinationPattern |
String |
Destination pattern with backreferences |
GenerateSpec
| Field | Type | Description |
|---|---|---|
output |
String |
The output file path (relative to doc file) |
inputs |
List<String> |
Glob patterns for input files |
ModificationTaskConfig
Represents the configuration for a single modification task:
| Field | Type | Description |
|---|---|---|
files |
List<String>? |
Target file paths (relative to root) |
related_files |
List<String>? |
Related/context file paths (relative to root) |
task_description |
String |
Generated or custom task description |
template_file |
String? |
Path to template file (nullable) |
data |
Map<String, Any>? |
Structured data from data_file or JSON source (nullable) |
ModificationTask
Represents a complete modification task ready for execution:
| Field | Type | Description |
|---|---|---|
data |
ModificationTaskConfig |
Task configuration |
message |
String |
Message content (context files or execute command) |
patchProcessor |
PatchProcessors |
Patch processing strategy (default: Fuzzy) |
shouldDeleteTarget |
Boolean |
Whether to delete the target file (default: false) |
taskType |
TaskType<*, *> |
The resolved task type (default: FileModification) |
Additional Processing Classes
TransformMatch
Represents a matched transformation from source to destination:
| Field | Type | Description |
|---|---|---|
sourceFile |
File |
The matched source file |
destinationFile |
File |
The computed destination file |
spec |
DocSpec |
The originating doc specification |
GenerateMatch
Represents a matched generation specification:
| Field | Type | Description |
|---|---|---|
outputFile |
File |
The output file to generate |
inputFiles |
List<File> |
The resolved input files |
spec |
DocSpec |
The originating doc specification |
DocumentMatch
Represents a documentation update specification:
| Field | Type | Description |
|---|---|---|
docSpec |
DocSpec |
The doc specification (target is the doc file itself) |
supportingFiles |
List<File> |
Source files that provide context |
Implementation Notes
Frontmatter Parsing
The frontmatter is parsed using a custom simple YAML parser (not SnakeYAML). The parser handles
string values, list values, and map values (for generates).
- Lines are split on the first colon to extract key-value pairs
- Lines without colons are ignored
- If the value after the colon is empty, the parser looks for subsequent list items (lines starting
with
-) - Empty keys (colon with no value and no subsequent list items) are not added to the result map
- Values are trimmed of whitespace
Transform Pattern Matching
Transform patterns use Java regex syntax. The source pattern is matched against file paths relative to the documentation file's directory. When a match is found:
- The regex is applied to the relative file path
- Capture groups are extracted from the match
- Backreferences (
$0,$1, etc.) in the destination pattern are replaced with the captured values - The destination path is resolved relative to the documentation file's directory
IntelliJ Integration
The DocProcessorAction provides an IntelliJ IDE action that:
- Filters selected files to markdown files (
.mdor.markdownextensions) - Creates a
DocProcessorinstance with the configured fast and smart models - Calls
getAll()to collect all modification tasks from the selected files - Shows a
DocProcessorTaskDialogwith a checklist of tasks for user selection - Executes the first selected task via
SingleTaskAppin a browser session
The action is available through the DocProcessorActionGroup which provides a submenu
with all overwrite mode options:
| Label | Mode |
|---|---|
| 🚫 Skip Existing Files | SkipExisting |
| 🔄 Overwrite All Files | OverwriteExisting |
| 📅 Overwrite Outdated Files | OverwriteToUpdate |
| 🩹 Patch Existing Files | PatchExisting |
| 📝 Patch Outdated Files | PatchToUpdate |
The dialog includes an "Auto-fix issues" checkbox and displays task details including target files and related files.