mdwright
mdwright is a Markdown linter and round-trip formatter for any Markdown project.
Four commitments shape the tool.
Fast. On a 79-file corpus of math-heavy technical prose, mdwright fmt-check runs ≥
50× faster than mdformat --check. The multiplier scales with file count and core count;
see Performance for the measurement, host, and reproducer.
Design choices that buy this are in Architecture.
Round-trip safe. mdwright fmt renders to the same HTML before and after; every
change in the rendered DOM is treated as a bug. Whitespace inside a paragraph may shift
(a b becomes a b), but word boundaries and the rendered tree do not. Where the
formatter cannot prove equivalence it refuses to rewrite; the
deviation table lists every exception with a reproducer.
Configurable, preserve by default. Source style choices — emphasis delimiters, list
markers, thematic breaks, link-destination angle brackets — pass through untouched.
Canonicalisation is opt-in one knob at a time in .mdwright.toml, or via
fmt.profile = "mdformat" for mdformat-compatible spelling where verified rewrites
preserve the parsed document.
Math-resilient. \( … \), \[ … \], and \begin{NAME} … \end{NAME} pass through
verbatim. The scanner identifies math regions before any other pass touches the document,
so the formatter never reflows \frac{a}{b} into \\frac{a}{b} and the linter never
flags a backslash inside \begin{align*}. See Math regions
for the design.
Who this site is for
- Users writing Markdown with math, code, or strict formatting requirements: start with Getting started.
- CI operators wiring mdwright into pre-commit, GitHub Actions, or other automation: Integration.
- Rule authors extending mdwright with project-specific lints: Extending → Lint rules.
The narrative pages (concepts, extending) explain the why; the reference pages (rules, CLI, public API, diagnostic schema) are the source-of-truth what.
Stability
mdwright is pre-1.0. The release surface, including public Rust API, CLI, configuration schema, diagnostic JSON, and lint-rule trait, is documented descriptively at Public API; minor versions may include breaking changes until 1.0, patch releases never do. See Semver policy.
Installation
mdwright has no runtime dependencies: it ships as a single binary. Pick whichever channel matches your environment.
One-line install (recommended)
No Rust toolchain required. The cargo-dist shell installer pulls the prebuilt binary for your platform from the latest
release and places it on your $PATH:
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/jcreinhold/mdwright/releases/latest/download/mdwright-installer.sh | sh
Supported targets: Linux x86_64, macOS aarch64 (see Platform support below).
From crates.io
cargo install mdwright
Requires Rust 1.91 or later (the MSRV is enforced in CI). The install drops a single binary, mdwright, on your
$PATH. Rust integrations depend on the component crates directly; see Public API surface
for the roster and what each owns.
Via cargo-binstall
cargo-binstall pulls the GitHub-release tarball for your target and falls back to a source build if no prebuilt binary
is available:
cargo binstall mdwright
Release tarball
Download a .tar.xz directly from the GitHub releases page and place
the mdwright binary on your $PATH. Useful for air-gapped environments or when you want to pin a specific build
artifact.
Building from a clone
git clone https://github.com/jcreinhold/mdwright
cd mdwright
cargo build --release -p mdwright
./target/release/mdwright --help
cargo nextest run exercises the full test suite (golden snapshots, GFM spec runner, property tests). cargo bench
runs the Criterion benches; cargo xtask doc-rules --check and cargo xtask doc-cli --check verify that the
auto-generated documentation pages are up to date.
Platform support
| Tier | Targets | Coverage |
|---|---|---|
| 1 | x86_64-unknown-linux-gnu, aarch64-apple-darwin | CI on every push; prebuilt binary attached to each release |
| 2 | x86_64-pc-windows-msvc, x86_64-apple-darwin, aarch64-unknown-linux-gnu | CI on every push; source build via cargo install |
Other targets work in principle but are not tested.
Getting started
This walkthrough takes ten minutes. By the end you will have linted a Markdown file, fixed a diagnostic, reformatted the file, and configured one rule.
Set up
Create a directory with one Markdown file:
mkdir mdwright-demo && cd mdwright-demo
Save the following as README.md:
# Demo
See https://example.com for the spec.
The Euler identity, $e^{i\pi} + 1 = 0$, is famous.
Here is some code:
Lint
mdwright check README.md
You see two diagnostics:
error[bare-url]: bare URL should be wrapped in angle brackets or rendered as a link
--> README.md:3:5
|
3 | See https://example.com for the spec.
| ^^^^^^^^^^^^^^^^^^^
= help: CommonMark autolinks need angle brackets (`<https://example.com>`) to render as a link.
= fix (safe): <https://example.com>
= note: see `mdwright explain bare-url`
error[unbalanced-backtick]: unterminated fenced code block
--> README.md:9:1
...
Every rule has a long-form explanation reachable from the command line:
mdwright explain bare-url
The bottom line is the documentation URL. Open it for the same content rendered with examples.
Fix the easy one
bare-url carries a safe fix. Apply it:
mdwright fix README.md
Re-run mdwright check; the bare-URL diagnostic is gone. The unbalanced-backtick diagnostic remains because closing a
fence cannot be inferred safely.
Fix the hard one by hand
Add the closing fence to README.md:
Here is some code:
```sh
echo hello
```
Re-run mdwright check. Output is empty: the file is clean.
Reformat
mdwright fmt README.md
fmt rewrites the file in place. Run git diff (in a real project) to see what changed. The defaults preserve source
style, including emphasis delimiters, list markers, thematic breaks, and line wrap, so the diff is usually small.
Display math, inline math, and fenced code blocks pass through verbatim. Opt in to canonicalisation per knob in
.mdwright.toml; see Formatter policy and Style knobs.
Configure one rule
mdwright reads configuration from the nearest .mdwright.toml, mdwright.toml, or pyproject.toml with a
[tool.mdwright] table, walking up from $PWD until it hits a .git/ directory. Create .mdwright.toml:
[lint]
# `default` enables the curated baseline; `ignore` removes rules.
preset = "default"
ignore = ["bare-url"]
Now mdwright check does not flag bare URLs. See Configuration for the complete schema.
Where to go next
- Lint vs. format: when each subcommand fires.
- Math regions: what mdwright protects and why.
- Integration → Pre-commit: wire mdwright into your VCS hooks.
- Rules catalogue: every rule with rationale and examples.
Configuration
mdwright reads configuration from (in precedence order):
- The file given via
--config PATH. - The nearest ancestor config discovered by walking upward from the
current directory. At each ancestor, candidates are tried in this
order:
.mdwright.toml,mdwright.toml,pyproject.tomlcontaining a[tool.mdwright]table. The walk stops at the filesystem root or at the first directory containing.git/(the workspace boundary). - Built-in defaults.
A pyproject.toml without [tool.mdwright] does not stop the walk;
discovery continues to the parent directory. A .mdwright.toml wins
over a pyproject.toml in the same directory (matching ruff's
"more-specific-name first" rule).
Run mdwright config init to create a documented .mdwright.toml
starter file with every option set to its default.
Single-file integration via pyproject.toml
For projects that already use pyproject.toml, the entire mdwright
configuration can live there under [tool.mdwright]:
# pyproject.toml
[tool.mdwright]
lint.preset = "default"
lint.extend-select = ["latex-command"]
[tool.mdwright.fmt]
wrap = 100
CLI overrides
The following knobs accept CLI flags that take precedence over the config file:
lint.preset,lint.select,lint.extend-select,lint.ignore:--rulesrender.profile:mdwright render --render-profile--no-suppresstoggles whether<!-- mdwright: allow ... -->comments are honoured; there is no config-file equivalent.
All [fmt] knobs are config-file-only.
Schema reference
[lint] and nested tables
| Key | Type | Default | CLI override | Description |
|---|---|---|---|---|
lint.preset | "default" | "all" | "none" | "default" | --rules | Baseline lint rule set. Use default for curated defaults, all for every registered rule, or none with lint.select for an explicit set. |
lint.select | array of string | [] | --rules | Exact lint rule names to enable when lint.preset = "none". Preset names are not valid rule names here. |
lint.extend-select | array of string | [] | --rules | Lint rule names to add on top of lint.preset. |
lint.ignore | array of string | [] | --rules | Lint rule names to remove after applying lint.preset, lint.select, and lint.extend-select. |
lint.exclude | array of string | [] | none | Gitignore-style patterns. Matching files are dropped from lint runs. Patterns are anchored to the directory containing the config file. |
lint.info-strings.extra | array of string | [] | none | Project-specific additions to the info-string-typo allowlist. The stdlib default allowlist still applies. |
[fmt] and nested tables
| Key | Type | Default | CLI override | Description |
|---|---|---|---|---|
fmt.profile | "preserve" | "mdformat" | "preserve" | none | Formatter style profile. preserve keeps mdwright's identity-oriented defaults; mdformat applies mdformat-compatible defaults where verified rewrites can preserve semantics. Explicit [fmt] keys override profile defaults. |
fmt.wrap | "keep" | "no" | int | "keep" | none | Wrap mode for prose paragraphs. keep leaves existing breaks alone; no forbids new breaks; an integer enforces that display-column budget for breakable lines in every formatter profile. |
fmt.wrap-strategy | "stable" | "balanced" | "stable" | none | Reflow strategy used when fmt.wrap is an integer. stable greedily fills soft-break runs and is the default; balanced rebalances paragraphs for more even line lengths. |
fmt.italic | "asterisk" | "underscore" | "preserve" | "preserve" | none | Italic delimiter canonicalisation. preserve leaves source bytes; asterisk or underscore opts into the post-pass rewrite. See Style knobs. |
fmt.strong | "asterisk" | "underscore" | "preserve" | "preserve" | none | Strong-emphasis delimiter canonicalisation. Independent of fmt.italic: *italic* with __strong__ is expressible. |
fmt.list-marker | "dash" | "asterisk" | "plus" | "preserve" | "preserve" | none | Unordered-list bullet canonicalisation. Each marker is rewritten through a marker-local fact and the family commits only after verification. |
fmt.ordered-list | "one" | "consistent" | "preserve" | "preserve" | none | Ordered-list number canonicalisation. one rewrites markers to 1. only when verification preserves the list start; consistent renumbers each list from the source's first item; preserve keeps source numbering verbatim. |
fmt.thematic-break | "dash" | "asterisk" | "underscore" | "underscore-70" | "preserve" | "preserve" | none | Thematic-break canonicalisation. Fixed character modes preserve the source repeat count and spacing; underscore-70 rewrites the whole break line to mdformat's 70 underscores. |
fmt.trailing-newline | "preserve" | "strip" | "ensure" | bool | "preserve" | none | Trailing-newline policy at the document boundary. true is accepted as a synonym for ensure and false for strip. |
fmt.end-of-line | "lf" | "crlf" | "keep" | "lf" | none | Line-ending normalisation. keep adopts the first newline seen in the source. |
fmt.exclude | array of string | [] | none | Formatter-specific exclude globs, independent of [lint] exclude. |
fmt.heading-attrs | "preserve" | "canonicalise" | "preserve" | none | ATX heading {#id .class key=val} trailer emission. preserve emits the source trailer byte-verbatim. canonicalise emits id first, then classes, then key-value pairs. |
fmt.refs.placement | "end" | "preserve" | "end" | none | Where reference-link definitions are emitted: gathered and sorted at the end of the document, or kept in source order. |
fmt.refs.style | "bare" | "angle" | "preserve" | "preserve" | none | Destination style for reference-link and inline-link URLs. preserve keeps each destination's source form; bare strips wrapping <...> where the bare form still parses; angle wraps every destination in <...>. |
fmt.footnotes.placement | "end" | "preserve" | "preserve" | none | Where footnote definitions are emitted. Default is preserve because pulldown-cmark's HTML renderer ties footnote position to parse order; moving definitions would change the rendered HTML. |
fmt.tables.style | "preserve" | "pad" | "preserve" | none | GFM table spacing policy. preserve keeps source cell spacing; pad aligns cells and delimiter rows to mdformat-compatible widths when verification preserves semantics. |
fmt.lists.continuation-indent | "marker-width" | "four-space" | "marker-width" | none | Continuation indentation for wrapped list-item paragraphs. marker-width aligns to the source marker width; four-space matches mdformat's list continuation spelling. |
fmt.frontmatter.preserve | bool | true | none | Whether to emit document frontmatter byte-verbatim. false strips it. |
fmt.math.normalise | bool | false | none | Whether whole-block math regions are normalised. Off by default because math bytes are opaque to CommonMark. |
fmt.math.render | "none" | "commonmark-katex" | "dollar" | "none" | none | Math delimiter rendering policy for downstream renderers. none preserves source math regions; commonmark-katex records intent without rewriting; dollar rewrites bracket and paren math to dollar delimiters. |
[parse] and nested tables
| Key | Type | Default | CLI override | Description |
|---|---|---|---|---|
parse.math.delimiters | "tex" | "github" | "tex" | none | Math delimiter recognition policy. tex recognises \(...\), \[...\], and LaTeX environments; github also recognises $...$ and $$...$$. |
parse.extensions.definition-lists | bool | true | none | Recognise Term\n: definition\n definition lists. Turn off on non-mkdocs corpora to suppress recognition. |
parse.extensions.abbreviation-lists | bool | true | none | Recognise *[ABBR]: definition abbreviation declarations as a scan-and-preserve overlay. mdwright does not expand occurrences; the downstream renderer does. |
parse.extensions.heading-attribute-lists | bool | true | none | Recognise # Heading {#id .class} trailers via pulldown's heading-attribute extension. When off, the trailer reads as plain text in the heading body. |
parse.extensions.block-attribute-lists | bool | true | none | Recognise { .class } on a line by itself after a non-empty block as a scan-and-preserve overlay. Inline attribute lists are out of scope. |
parse.extensions.gfm.autolinks | "disabled" | "urls" | "urls-and-emails" | "urls-and-emails" | none | Recognise GFM bare URL and email autolinks as document facts and render them as links. Use urls to leave bare emails as text or disabled for strict CommonMark-style text treatment. |
parse.extensions.gfm.tagfilter | bool | true | none | Apply GFM tagfiltering when rendering or building semantic signatures. This escapes the raw HTML tags that cmark-gfm filters, without rewriting source bytes. |
parse.extensions.myst.directive-containers | bool | true | none | Recognise MyST :::{name} directive containers with :KEY: value options as a scan-and-preserve overlay. mdwright does not expand directives; downstream renderers do. |
parse.extensions.myst.inline-roles | bool | true | none | Recognise MyST {role}`payload` inline roles as a scan-and-preserve overlay inside paragraph text. |
parse.extensions.myst.substitution-references | bool | true | none | Recognise MyST {{name}} inline substitution references as a scan-and-preserve overlay. Declarations live in YAML frontmatter and round-trip through the frontmatter path. |
parse.extensions.myst.comments | bool | true | none | Recognise MyST % line comments at line-start as a scan-and-preserve overlay. |
parse.extensions.pandoc.fenced-divs | bool | true | none | Recognise Pandoc ::: {.cls} fenced div openers. The closer is a colon-only line of matching count. |
parse.extensions.pandoc.short-form-divs | bool | true | none | Recognise Pandoc :::name fenced div openers. |
parse.extensions.pandoc.inline-attribute-spans | bool | true | none | Recognise Pandoc [content]{.cls} inline attribute spans as a scan-and-preserve overlay. |
[render] and nested tables
| Key | Type | Default | CLI override | Description |
|---|---|---|---|---|
render.profile | "pulldown" | "cmark-gfm" | "pulldown" | --render-profile | HTML spelling profile for mdwright render. pulldown preserves the default renderer; cmark-gfm matches cmark-gfm spelling where parser semantics already agree. |
Round-trip safety
mdwright fmt is a semantic rewriter, not a string-level one. The contract: the rendered HTML of the output matches
the rendered HTML of the input, modulo whitespace inside a paragraph that does not change word boundaries. The
gfm_spec_snapshot test enforces it on every commit. Any input that fails the gate is either fixed at the root or
recorded in the deviation table with a one-line reason.
The HTML-equivalence gate
For every document mdwright formats, the gate runs:
- Render the input to HTML.
- Format the input, then render the output to HTML.
- Assert (1) and (2) match, ignoring whitespace-only differences inside text paragraphs.
"Render" here means parse with pulldown-cmark and emit HTML from the event stream, so a parse divergence is caught in
the same comparison. If the assertion fails the formatter has changed semantics; there is no exception path.
What "semantic" buys you
Some syntactically-equivalent rewrites are not applied. The clearest case: mdwright leaves a setext heading as-is rather than converting it to ATX when the conversion would change the HTML id-anchor that external links point at. The cost of round-trip safety is that the formatter sometimes declines a clean-up it could otherwise perform.
Reading deviation errors
When the gate trips during development, the test output names the input file, the formatted output, and the divergent line of HTML. The fix lives in one of three places:
- Document recognition misclassified a span: fix the document facts, not the formatter.
- A rewrite producer proposed a stale or over-broad byte edit: fix the candidate owner/range or the verification signature, not the caller.
- A new spec case the existing rules do not handle: extend the recognised facts, then the formatter or linter.
See also
- Deviations: every documented exception, with rationale.
- Architecture: the two-IR design that makes the gate enforceable.
- Lint vs. format: the formatter never relies on linter output.
Math regions
This is mdwright's reason for existing. Generic Markdown formatters mangle LaTeX: they reflow \frac{a}{b} into
\\frac{a}{b}, collapse the blank line before \begin{align*}, and apply emphasis rules inside \(\alpha\). mdwright
treats math as opaque: recognised before any other pass runs, emitted verbatim.
What counts as math
The default math grammar:
- Inline.
\( … \)(paired backslash-paren, single line). - Display.
\[ … \](paired backslash-bracket, may span lines). - Environments.
\begin{NAME} … \end{NAME}for anyNAMEmatching[A-Za-z][A-Za-z0-9*]*, paired with a non-overlapping\end{NAME}.
$ … $ and $$ … $$ are not math by default. Dollar-delimited math is common in academic prose but collides with
literal-dollar use (prices, shell prompts). Opt in via configuration:
[lint]
math.dollar = true
The stray-dollar lint flags lone dollar signs when this option is off, so authors migrating from a dollar-delimited dialect catch the change.
How the scanner runs
The math crate recognises candidate math spans over strings and byte ranges. The document crate supplies Markdown exclusion ranges (code, HTML, other opaque regions), then stores the accepted math regions as document facts with stable coordinates back to the original source. The exact source bytes, including whitespace, casing, comment chars, and trailing backslashes, pass through unchanged. The formatter cannot accidentally apply emphasis, escape, or wrap logic inside a math region: rewrite candidates are verified against the document's math-region signature before they commit. Lint rules that match on text see the same opaque region; latex-command, for instance, only fires outside math.
Block-level math
A math environment whose start delimiter sits at column 1 of an otherwise-blank line is a block. The formatter emits blocks with one blank line above and below, never indented inside a list item unless the source already indented it. This avoids the canonical bug:
input: generic formatter: mdwright:
A paragraph. A paragraph. A paragraph.
\begin{align*}
\begin{align*} E &= mc^2 \begin{align*}
E &= mc^2 \end{align*} E &= mc^2
\end{align*} \end{align*}
Stripping the blank line above \begin{align*} rolls the environment into the paragraph and breaks the rendered DOM.
Math-adjacent rules
Three rules check math without parsing it:
- math/unbalanced-delim:
\(without\). - math/unbalanced-env:
\begin{x}without matching\end{x}. - math/unbalanced-braces:
{count diverges from}count inside a region.
Each runs on the recognised region as a string; none of them care about the math semantics.
Math inside code blocks
If \( appears inside a fenced code block or inline code (`\(x\)`), mdwright does not treat it as math; code
regions are recognised earlier still. The math scanner consults the same exclusion ranges the formatter does, so it
never produces false positives inside code or HTML.
See also
- Round-trip safety: the gate that catches math corruption.
- Configuration: math options under
[fmt.math]and[lint.math]. - stray-dollar rule: migration aid for dollar-delimited corpora.
Math rendering
mdwright does not try to be TeX. It shapes math regions so a downstream renderer, such as KaTeX, MathJax,
mkdocs-material's math plugin, or jupyter-book, can do browser-quality typesetting. --math-render chooses the source
delimiter shape for formatter and HTML-render checks.
For terminal inspection, mdwright preview --math=unicode has a first-party Unicode renderer for a large common subset
of MathJax-style TeX math input: symbols, Greek letters, scripts, accents, fractions, roots, delimiters, arrows,
relations, operators, and matrix-like environments where Unicode terminal text can represent the result honestly.
Unsupported math falls back to source text instead of guessing.
For editable source translation, use mdwright math. It translates math bodies between LaTeX commands and Unicode
source while preserving Markdown math delimiters. Unicode-to-LaTeX translation is parser-backed for the supported
subset, so scripts, styled alphabets, accents, arrows, and direct symbols are recognised as source structure before
canonical LaTeX is emitted. Normal mdwright fmt never rewrites math notation silently.
For what mdwright treats as math, see Math regions. This page is about how those regions are emitted.
The two modes
| Mode | Behaviour |
|---|---|
none | Pass math regions through verbatim. Default. |
dollar | Rewrite \[ … \] to $$ … $$ and \( … \) to $ … $. Environments stay. |
A third value, commonmark-katex, is a documentation alias: the behaviour matches none exactly, but the name leaves a
greppable signal in CI logs that the build expects KaTeX downstream.
When to use which
nonefits most projects. KaTeX (viaauto-render), MathJax v3's auto-renderer, mkdocs-material's math plugin, jupyter-book, and Pelican all recognise\[ … \]and\( … \)out of the box.dollarfits Pandoc-style pipelines that expect$delimiters. The rewrite is one-directional:\[becomes$$,\(becomes$, source already in dollar form passes through unchanged, and LaTeX environments stay environments (there is no dollar form of\begin{align*}).
CLI and config
mdwright fmt --math-render=dollar path/to/notes.md
[fmt.math]
render = "dollar" # or "none", "commonmark-katex"
The CLI flag overrides the config file; both fall back to MathRender::None.
Inspecting the rendered HTML
mdwright render pipes the formatted output through mdwright's HTML renderer to stdout:
mdwright render notes.md > notes.html
mdwright render --math-render=dollar notes.md
mdwright render --render-profile=cmark-gfm notes.md
mdwright render --open notes.md
Captured stdout is raw HTML by default. --color=always highlights the HTML for terminal reading, and --open writes
the HTML to a temporary file and opens it in the system browser.
This is a diagnostic surface, not a production renderer. mdwright's HTML emitter does not enable pulldown-cmark's math extension: math regions land in the HTML as plain text in whatever delimiter form the formatter produced. Feed that HTML through KaTeX, MathJax, or your static-site generator's math plugin to see browser-quality typeset output.
--render-profile=cmark-gfm changes HTML spelling only. It is useful when comparing diagnostic HTML with
cmark-gfm-based tools, but it does not change parser semantics or formatter source rewrites.
Terminal preview
mdwright preview renders static terminal text:
mdwright preview notes.md
mdwright preview --color=always notes.md
mdwright preview --math=source notes.md
preview is for fast local inspection. It renders headings, lists, block quotes, links, code blocks, tables, and simple
math as terminal text. It does not claim CSS layout, images, browser fonts, KaTeX, or MathJax equivalence. Use
render --open when the browser view matters.
Source translation
mdwright math is the explicit command for changing math notation:
printf '$\alpha_i$\n' | mdwright math --to-unicode -
printf '$αᵢ$\n' | mdwright math --to-latex -
mdwright math --to-unicode --diff notes.md
mdwright math --to-unicode --write notes.md
File mode translates recognised Markdown math bodies and preserves their delimiters, so \( \alpha_i \) becomes
\( αᵢ \). Use --check for CI, --diff to inspect a patch-compatible diff, and --write to mutate files. Stdin
without recognised Markdown math delimiters is treated as one math source body.
Translation is conservative. Direct symbols, scripts, styled alphabets, accents, roots, aliases, and other constructs with honest editable Unicode forms are translated. Unsupported Unicode, ambiguous accent/prime ownership, diagrams, fractions, complex environments, macros, colour/style commands, and other constructs without a plain source form remain visible and are reported on stderr rather than being approximated.
The gate under dollar mode
The HTML-equivalence gate in Round-trip safety compares pre-format HTML against post-format
HTML. Under --math-render=dollar that comparison would always diverge, because the formatter intentionally rewrites
math. The gate's actual contract is idempotence-on-mode: formatting the output a second time with the same options
must produce the same canonical event stream. Divergence between the first and second pass is still a hard failure. See
mdwright_format::format_validated for the entry point.
Markdown extensions
mdformat-mkdocs (the formatter most mkdocs-material projects reach for today) recognises a few constructs that plain CommonMark / GFM does not. mdwright matches it for each, so a project can swap one tool for the other without visible churn.
Recognition is preservation, not interpretation: mdwright knows the constructs exist, emits them canonically, and
gates each via a per-extension toggle. It does not expand abbreviations, render {...} to HTML, or change semantics.
The downstream renderer (Python-Markdown, mkdocs-material, jupyter-book) does that work.
GFM URL and email autolinks are recognised by default. mdwright also applies GFM tagfiltering when rendering or building semantic signatures. These behaviours close the cmark-gfm rendering gap while keeping formatter output byte-preserving.
The four extensions
| Extension | Source shape | Default |
|---|---|---|
| Definition lists | Term\n: definition\n | on |
| Heading attribute lists | # Heading {#id .class key=val} | on |
| Abbreviation lists | *[HTML]: Hyper Text Markup Language\n | on |
| Non-heading attribute lists | Paragraph\n{ .note .important }\n | on |
Defaults are on: each recognises something the source is already doing, not a formatter opinion. Turn them off in
.mdwright.toml when running mdwright on non-mkdocs corpora where false positives matter more than coverage:
[parse.extensions]
definition-lists = false
abbreviation-lists = false
heading-attribute-lists = false
block-attribute-lists = false
Definition lists
Source:
Term
: Single-paragraph definition body. Continuation lines are
indented four spaces and aligned with the body column.
Operating system
: The software that manages hardware resources. Notable examples:
- Linux
- macOS
- Windows
Run `uname -a` to see your kernel version.
Canonical emission matches mdformat-mkdocs:
- Tight form (
Term\n: body) for single-paragraph definitions. - Loose form (blank line between term and the
:marker) when the definition has multiple block children: a paragraph plus a nested list / code block, or multi-paragraph text. The blank line is the syntactic boundary that makes the multi-block body parse correctly.
Multiple definitions for one term emit on consecutive : lines with no blank between them; blank lines separate term
groups.
Heading attribute lists
Source:
# Heading {#section-one}
## Multiple classes {.warning .important}
### Mixed shape {#mix .alpha .beta key=val}
The trailer parses through pulldown-cmark's ENABLE_HEADING_ATTRIBUTES flag, lands on the typed Heading, and re-emits
based on [fmt] heading-attrs:
| Mode | Behaviour |
|---|---|
preserve (default) | Emit the source trailer byte-verbatim between the inline body and the line break. |
canonicalise | Emit {#id .class₁ .class₂ k=v}: id first, then classes (source order), then key=value pairs (source order). Values containing whitespace are double-quoted. |
[fmt]
heading-attrs = "preserve" # or "canonicalise"
Pulldown limitation. pulldown-cmark 0.13's heading-attribute parser splits the trailer on whitespace and does not
honour double-quoted values. # H {title="hello world"} parses as two attributes, title="hello and world", not one.
mdformat-mkdocs (which uses python-markdown's attr_list) handles the quoted form correctly. Until pulldown upstream
lands the fix, mdwright's heading-attribute output for quoted values diverges from mdformat-mkdocs; documented in
Deviations from spec.
Abbreviation lists
Source:
The HTML standard is maintained by the W3C.
*[HTML]: Hyper Text Markup Language
*[W3C]: World Wide Web Consortium
mdwright recognises the *[TERM]: definition shape and preserves the declarations verbatim. It does not expand
occurrences (the downstream renderer wraps them in <abbr title="…">…</abbr>). Each declaration is one source line;
continuation lines are not supported, matching python-markdown's abbr extension.
Consecutive abbreviation lines (no blank line between them) are bundled into one source paragraph by pulldown and emitted as one verbatim block. A blank line above the first declaration is conventional but not required.
Non-heading attribute lists
Source:
This paragraph carries a class trailer used by the renderer to style it.
{ .note .important }
The trailer must:
- sit on the line immediately after a non-empty block (no blank-line separator), and
- contain only the brace-delimited attribute list and optional surrounding whitespace.
When mdwright recognises the pattern, the entire block (body + trailer) is emitted as a single verbatim source slice. Other paragraph-level rewrites (line wrap, link normalisation, escape rewrites) are skipped for that paragraph, so preservation narrows the formatter's active surface for the formatter on annotated blocks.
Inline attribute lists (some *emphasised* { .em } text mid-paragraph) are explicitly out of scope. mdwright's
inline formatter has no overlay mechanism today; adding one is a separate design exercise. Inline {...} tokens flow
through as plain text.
Round-trip and idempotence
Reformatting under any combination of these extensions still goes through the HTML-equivalence gate. Verbatim overlays satisfy it trivially, and the canonical emission shape for typed-block constructs is a fixed point of its own parser by construction.
Parity with mdformat-mkdocs
The parity goal is concrete: an mkdocs-material site running mdformat-mkdocs swaps in mdwright with no visible diff. The
parity test at tests/extension_parity.rs byte-compares mdwright's output against mdformat-mkdocs reference output for
the five extension regression fixtures; any divergence is fixed in mdwright or recorded in
Deviations from spec.
MyST + Pandoc directives
MyST (Markedly Structured Text) is the substrate for jupyter-book and Sphinx-MyST. Pandoc has overlapping syntax for the same shapes. mdwright recognises the common constructs from both flavours and preserves their bytes verbatim; it does not expand directives, render roles, or resolve substitutions. The downstream renderer (Sphinx, jupyter-book, Pandoc) does that work.
Like Markdown extensions and math rendering, recognition is preservation, not interpretation. Defaults are on: these recognise what the source already says, not formatter opinion.
What mdwright recognises
| Construct | Source shape | Default |
|---|---|---|
| MyST directive container | :::{name}\n…\n::: | on |
| Pandoc fenced div (attr form) | ::: {.warning}\n…\n::: | on |
| Pandoc fenced div (short) | :::note\n…\n::: | on |
| MyST inline role | {term}`Vector Space` | on |
| MyST substitution reference | {{name}} | on |
| Pandoc inline attribute span | [content]{.cls} | on |
| MyST line comment | % comment text | on |
Turn individual recognisers off in .mdwright.toml when running mdwright on non-MyST corpora:
[parse.extensions.myst]
directive-containers = false
inline-roles = false
substitution-references = false
comments = false
[parse.extensions.pandoc]
fenced-divs = false
short-form-divs = false
inline-attribute-spans = false
Block directive containers
Source:
:::{note}
This is a MyST note. It can contain *inline* and
multiple paragraphs.
:::
Pandoc variants (attr form and short form) are also recognised:
::: {.warning}
Pandoc fenced div, attribute form.
:::
:::note
Pandoc short form.
:::
Directives with options round-trip verbatim:
:::{figure} ./img.png
:alt: A diagram of the system
:width: 300px
:align: center
The figure caption text.
:::
Nested directives use opener / closer counts that increase outward: :::: outside, ::: inside. mdwright preserves the
nesting:
::::{note}
Outer body.
:::{tip}
Inner body.
:::
::::
mdwright records the outermost directive's byte range and emits it verbatim; inner directives sit inside that range and are preserved implicitly. Two directives at the same colon count separated by a blank line are sibling regions, not a nested pair.
Inline overlays
Inline roles attach a role name to a backtick-delimited payload. The role name is unrestricted: mdwright does not know
what {term} or {download} means; that is downstream's job. The bytes round-trip:
The {term}`Vector Space` is a fundamental concept.
Substitution references look the same but with double braces and no backticks:
Some content with {{my-sub}}.
The declaration lives in YAML frontmatter under myst_substitutions: and round-trips through the same verbatim path
mdwright uses for frontmatter:
---
myst_substitutions:
my-sub: "Replacement text"
another: "{{my-sub}} again"
---
Body content uses {{my-sub}} and {{another}}.
Pandoc inline attribute spans wrap a fragment in square brackets and follow it with a brace attribute list. mdwright
distinguishes them from CommonMark links (where the brackets are followed by () and preserves the byte sequence:
Highlight a [span of text]{.note} in the middle of a paragraph.
Line comments
MyST's % line comment is a line whose first non-whitespace byte is %. mdwright preserves it verbatim:
% This line is dropped by MyST renderers but mdwright keeps it.
Unlike LaTeX, % is only a comment at the start of a line; inline % characters in prose are literal text and
survive untouched.
What mdwright does not do
Expansion, role rendering, substitution resolution, and directive-name validation are all the downstream renderer's job.
A :::{figure} is emitted as :::{figure}; the image is not inlined and the options are not rendered;
{term}`Vector Space` stays as-is; {{my-sub}} is preserved even when the frontmatter declares a replacement; any
directive name matching [a-zA-Z0-9_-]+ is accepted, and an unknown name is downstream's problem.
Run mdwright before Sphinx, jupyter-book, or Pandoc: it normalises the surrounding Markdown without touching the MyST / Pandoc constructs the downstream renderer needs.
Round-trip and idempotence
Every MyST / Pandoc construct passes through the same idempotence-on-mode contract as the rest of the formatter; see Round-trip safety. Verbatim preservation overlays satisfy it trivially as long as the recogniser classifies the same bytes the same way on both passes. It does, since the scanner is fully deterministic over source bytes plus the exclusion vectors (fenced code, inline code, HTML, math).
Lint vs. format
mdwright has two pipelines and four subcommands. They share one event walk over pulldown-cmark but otherwise do not
interact: a lint diagnostic never blocks a format pass, and the formatter never depends on lint state.
The four subcommands
| Subcommand | Writes | Exit non-zero when |
|---|---|---|
mdwright check | nothing | --check is set and a non-advisory diagnostic fires |
mdwright fix | files (safe fixes only) | --check is set and a non-advisory diagnostic still remains |
mdwright fmt | files (every input) | parse fails or the safety gate refuses the rewrite (exit 2) |
mdwright fmt-check | nothing | any input would be reformatted (exit 1) |
check is the audit; fix is the audit that may mutate; fmt is the unconditional rewrite; fmt-check is the
rewrite-or-fail-CI variant. By default check and fix exit 0 even with diagnostics present; pass --check to make
them fail CI.
Why the pipelines are separate
The linter answers a local question: does this Markdown have problems? A bare URL, a mismatched code fence, a
duplicate heading id. Diagnostics carry locations and optional fixes. Rules implement the
LintRule trait and operate on a flat IR (events with byte spans).
The formatter answers a whole-document question: which verified byte rewrites should apply? Structural emit is identity: default formatting preserves source bytes modulo document-boundary normalisation. Canonicalisation and wrapping are proposed as rewrite candidates and committed only after document-level verification.
The two pipelines share a parse but nothing else.
When you want both
Most projects run both in CI; the two are independent. A project can format with mdwright and disable every default-on lint, or run a tight lint set without ever invoking the formatter.
mdwright check . && mdwright fmt-check .
For pre-commit hooks, see Integration → Pre-commit.
What --check means
--check on mdwright check (or mdwright fix) makes the command exit 1 when any non-advisory diagnostic fires.
Without it, check prints diagnostics and exits 0, which is useful for tooling that wants to consume the output without
aborting.
mdwright fmt-check has no --check flag; it always exits non-zero when any file would be reformatted, matching
rustfmt --check's contract.
See also
- Suppression comments: silencing a diagnostic without disabling the rule entirely.
- Configuration: separate
[lint]and[fmt]tables. - Rules catalogue: every shipping lint rule.
Suppression comments
A suppression comment silences one lint rule on the next block, the next line, or a range. They look like HTML comments so they are invisible in the rendered document.
Forms
Next block. Silence one rule on the block immediately following:
<!-- mdwright: allow bare-url -->
See https://example.com for the spec.
Next line. Silence on the next non-blank line only:
<!-- mdwright: allow-next-line bare-url -->
See https://example.com for the spec.
Range. Open with allow-begin, close with allow-end. Useful for tables, generated content, or vendored sections:
<!-- mdwright: allow-begin bare-url -->
| Source | URL |
| --- | --- |
| Spec | https://spec.commonmark.org/ |
| GFM | https://github.github.com/gfm/ |
<!-- mdwright: allow-end bare-url -->
Separate multiple rules with commas: <!-- mdwright: allow bare-url, latex-command -->. Use the literal all to
silence every rule (rarely the right choice): <!-- mdwright: allow all -->.
Auditing what you have silenced
mdwright check --no-suppress .
ignores every suppression marker and reports the full diagnostic set. Use this to find suppressions that no longer correspond to a real diagnostic.
mdwright check itself reports unused suppressions: a <!-- mdwright: allow bare-url --> whose target block has no
bare URLs surfaces as an advisory, so you can delete the marker.
Suppression vs. disabling
Use a suppression marker when a rule is right project-wide but wrong at one location, and add a sibling HTML comment explaining why:
<!-- mdwright: allow bare-url -->
<!-- The renderer in this project linkifies bare URLs itself. -->
See https://example.com for the spec.
When the same suppression appears in dozens of places, disable the rule in configuration instead:
[lint]
preset = "default"
ignore = ["bare-url"]
See Configuration.
See also
- Lint vs. format: suppression only affects linting; the formatter has no per-document opt-out.
- Rules catalogue: every rule's kebab-case name (the literal that goes in the suppression comment).
Formatter policy
mdwright's formatter has two responsibilities, in this order:
1. Identity Emit: Preserve
Start with the user's source bytes. With every style knob at its default and wrap = "keep", formatting returns those
bytes unchanged except for the document-boundary policies: line endings, trailing newline handling, and end-of-line
selection.
This is the load-bearing invariant. Default formatting is idempotent by construction because the formatter does not synthesise Markdown for recognised structures.
You opt out of preservation by setting the rewrite knobs below. There is no "semi-preserve" mode.
2. Verified Rewrite Families: Opt In
The formatter crate runs style-canonicalisation and wrapping through private rewrite families: inline delimiters, list markers, thematic breaks, link destinations, heading attributes, tables, math, frontmatter, and terminal wrap. Each canonical family builds a local normal-form edit plan, proves its edits do not overlap within the family, applies the plan to a scratch buffer, and verifies the result before it can commit.
If verification fails, the whole family skips. The engine never commits half of a family plan. If the family pipeline cannot reach a pass with no commits before its guard trips, mdwright leaves the original source bytes unchanged instead of returning a partial normal form.
Tables are parent normal forms. The table family runs after inline canonicalisers, reads cell contents from the current snapshot, and rewrites each table block only when document-owned table facts account for the full table shape. It does not emit row- or cell-level edits that could race inline rewrites.
Wrap is terminal. It runs only after a full canonical-family scan commits no edits for the current snapshot. If wrap commits paragraph edits, the engine returns to the first canonical family on a fresh parse before wrapping again. Paragraph shapes the wrap pass cannot model stay unchanged and are counted in the formatter report.
An integer wrap setting is a line-budget contract, not a profile-specific preference. With wrap = 120, breakable
paragraph lines are kept at or below 120 display columns in both the default formatter profile and the mdformat profile.
The only accepted overflow is one indivisible atomic token, such as a code span, URL, math atom, or single long word.
The default wrap strategy is stable soft-break reflow: ordinary source newlines inside a paragraph may be joined, hard
breaks stay hard boundaries, and overlong breakable runs are wrapped to the configured budget. wrap-strategy = "balanced" opts into a paragraph rebalancer for authors who prefer more even line lengths.
Default: every style knob is Preserve and wrapping is Keep. With the default config the rewrite-family pipeline
short-circuits before running. Set per-knob targets in .mdwright.toml to opt in.
Why the separation
Synthesising structural output during canonicalisation creates a bug class where one emit decision perturbs the parse
context of another: rewriting _foo_ to *foo* can change an adjacent site's emphasis-flanking class, so the next
rewrite reads a different pulldown event stream than the one it planned against.
Identity emit removes that perturbation source. Rewrite families keep the remaining byte changes in formatter-owned normal-form plans, so a stale local string edit cannot commit without reparsing and verification.
How to opt in
In .mdwright.toml:
[fmt]
italic = "asterisk" # _foo_ → *foo*
strong = "underscore" # **bar** → __bar__
list-marker = "dash" # * x → - x
thematic-break = "dash" # *** → ---
ordered-list = "consistent" # 3. a / 5. b / 9. c → 3. a / 4. b / 5. c
[fmt.refs]
style = "angle" # [ref]: url → [ref]: <url>
Each knob also accepts "preserve" to explicitly disable canonicalisation. See Style knobs for the per-knob
reference, including which rewrites might skip verification (e.g. intraword underscore that can't safely become
asterisk).
What the canonicalisation pass does NOT do
- Does not rewrap prose (
wrapis a separate knob; see Configuration). - Does not change content semantics: every rewrite must reparse to the same canonical event stream as the bytes it replaces, or it is skipped.
- Does not expose rewrite families, snapshot ownership, or verification signatures as public API. Those details stay
private to
mdwright-format.
For mdformat-compatible spelling where verified rewrites preserve the parsed document, use [fmt] profile = "mdformat".
Style knobs
This page documents each style knob in [fmt]. Every knob defaults to "preserve", which means the canonicalisation
pass leaves source bytes unchanged for that construct. Set a non-preserve value to opt into rewriting.
See Formatter policy for the overall design (structural emit + opt-in canonicalisation) and
Configuration for the full .mdwright.toml schema.
[fmt] italic
| Value | Effect |
|---|---|
"preserve" (default) | Emphasis delimiters round-trip from source. _foo_ stays _foo_; *foo* stays *foo*. |
"asterisk" | Rewrite _…_ to *…* when verification preserves the parse. |
"underscore" | Rewrite *…* to _…_ when verification preserves the parse. |
Verification skips when: the rewrite would change the parse of the enclosing paragraph window. The most common case
is intraword underscore (id_S, Hom_{cart}): pulldown already treats these as plain text under CM §6.2 rule 6, so no
rewrite is proposed and nothing skips. Where rewrites do skip silently is in dense multi-delimiter runs
(*_*…*_*-style chains) whose pairing depends on flanking neighbours; verification catches these and leaves the source
bytes in place.
[fmt]
italic = "asterisk"
[fmt] strong
| Value | Effect |
|---|---|
"preserve" (default) | Strong delimiters round-trip from source. **foo** stays **foo**; __foo__ stays __foo__. |
"asterisk" | Rewrite __…__ to **…**. |
"underscore" | Rewrite **…** to __…__. |
Independent of italic. With italic = "asterisk" and strong = "underscore" you get *italic* alongside
__strong__. italic and strong are independent knobs.
[fmt]
italic = "asterisk"
strong = "underscore"
[fmt] list-marker
| Value | Effect |
|---|---|
"preserve" (default) | Each unordered list keeps its source bullet character. |
"dash" | Rewrite each bullet to -. |
"asterisk" | Rewrite each bullet to *. |
"plus" | Rewrite each bullet to +. |
Marker-local. The document crate exposes one fact per list-item marker. The formatter rewrites those marker bytes only, then verifies the full document before committing the family plan. Nested list markers are separate facts, so an outer list rewrite cannot cover child markers accidentally.
[fmt]
list-marker = "dash"
[fmt] ordered-list
| Value | Effect |
|---|---|
"preserve" (default) | Each ordered list keeps its source numbering. 3. a / 5. b / 9. c stays. |
"one" | Rewrite markers to 1. when verification preserves the list start. This matches mdformat's default spelling for ordinary lists that already start at 1.. |
"consistent" | Renumber so item k (0-indexed) becomes start_num + k, where start_num is the source's first item's number. 3. a / 5. b / 9. c → 3. a / 4. b / 5. c. |
Marker-local: each ordered item exposes its digit range, list start, and ordinal. The family plan rewrites those digit ranges and commits only after full-document verification. The starting number is preserved; only the increment is canonicalised.
[fmt]
ordered-list = "consistent"
[fmt] thematic-break
| Value | Effect |
|---|---|
"preserve" (default) | Thematic breaks keep their source character (---, ***, ___). |
"dash" | Rewrite to ---. |
"asterisk" | Rewrite to ***. |
"underscore" | Rewrite to ___. |
"underscore-70" | Rewrite the whole line to 70 underscores, matching mdformat's default thematic-break spelling. |
The repeat count and internal spacing are preserved; only the character changes. So * * * becomes _ _ _ under
"underscore", not ___. Use "underscore-70" when you want the mdformat spelling.
[fmt]
thematic-break = "dash"
[fmt.refs] style
| Value | Effect |
|---|---|
"preserve" (default) | Each link destination keeps its source form: [ref]: url or [ref]: <url> survives. |
"bare" | Strip angle brackets where the bare form would still parse. [ref]: <url> → [ref]: url. |
"angle" | Wrap destinations in angle brackets. [ref]: url → [ref]: <url>. |
Applies to both reference-link definitions ([ref]: dest) and inline link destinations ([text](dest)). Verification
skips when the bare form contains whitespace, unbalanced parentheses, or other bytes that would prevent pulldown from
parsing it as a bare destination; the angle-wrapped form is kept in those cases.
[fmt.refs]
style = "angle"
[fmt.tables] style
| Value | Effect |
|---|---|
"preserve" (default) | GFM table spacing round-trips from source. |
"pad" | Pad cells and delimiter rows to mdformat-compatible widths when verification preserves the parse. |
Padding is a table-level operation. Inline delimiter and link destination rewrites run first; table padding then reads the current cell bytes and rewrites the table block as one verified replacement. Tables with source cells the document facts cannot account for are left unchanged rather than partially rewritten.
[fmt.tables]
style = "pad"
[fmt] wrap
| Value | Effect |
|---|---|
"keep" (default) | Preserve existing paragraph line breaks. |
"no" | Collapse soft line breaks inside paragraphs where verification preserves the parse. |
| integer | Wrap breakable prose lines at that display-column width. |
wrap = 120 means breakable output lines should fit within 120 columns in every formatter profile. The accepted
exception is an indivisible atomic token, such as a long code span, URL, math atom, or single long word. Those tokens
are left intact rather than split into invalid Markdown.
The default wrap strategy is "stable": ordinary source newlines inside a paragraph are soft break positions, hard
breaks stay hard boundaries, and each hard-break-bounded run is filled greedily up to the configured column. Use
wrap-strategy = "balanced" when you want mdwright to rebalance paragraphs for more even line lengths.
[fmt]
wrap = 120
[fmt]
wrap = 120
wrap-strategy = "balanced"
[fmt.lists] continuation-indent
| Value | Effect |
|---|---|
"marker-width" (default) | Continuation lines align under the source list marker width. |
"four-space" | Continuation lines use four spaces after the containing block prefix. |
This setting only affects paragraphs that are wrapped inside list items. It is separate from list-marker because the
bullet character and continuation indentation are independent style decisions. The mdformat profile defaults this key to
"four-space"; explicit config overrides that default.
[fmt]
wrap = 120
[fmt.lists]
continuation-indent = "four-space"
Combined example
[fmt]
profile = "mdformat"
This keeps mdformat's default wrap = keep, sets list continuation indentation to four spaces, and applies mdformat
spelling for supported style knobs. Explicit keys override the profile:
[fmt]
profile = "mdformat"
wrap = 120
[fmt.lists]
continuation-indent = "marker-width"
A per-knob spelling can also be written without the profile:
[fmt]
list-marker = "dash"
thematic-break = "underscore-70"
ordered-list = "one"
[fmt.refs]
style = "angle"
[fmt.tables]
style = "pad"
This is mdformat-compatible where mdwright has verified rewrite support. It does not move orphan footnotes or copy mdformat behaviours that would change the parsed document.
How verification skips become visible
When a rewrite would change the parse of the enclosing paragraph window, the canonicalisation pass logs a
tracing::warn! with the byte span and skipped rewrite. Capture these in production with
RUST_LOG=mdwright_format=warn. A high skip rate on one document usually points at a structural-emit edge case worth
filing as a regression input.
Lint rules
Every rule shipped by mdwright's standard library, grouped by how they behave on
a fresh install. Each link points to the rule's long-form explanation;
mdwright explain <name> prints the same text from the command line.
Default rules
On by default. A diagnostic from one of these fails mdwright check --check.
| Rule | Fix | Description |
|---|---|---|
unbalanced-backtick | no | Backtick in prose that could not be paired with a closing fence. |
math/unbalanced-delim | no | TeX-style math open delimiter (\[, \(, $$, $) with no matching close. |
math/unbalanced-env | no | LaTeX \begin{env} with no matching \end{env} at the same nesting depth. |
math/unbalanced-braces | no | { / } inside a math body do not balance; math body normalisation is skipped for that region. |
adjacent-code-no-space | no | Inline code span adjacent to a letter without whitespace. |
heading-punctuation | no | Trailing . or : on a heading. |
orphan-reference-link | no | Reference-style link with no matching [label]: definition. |
duplicate-link-label | no | Two [label]: definitions with the same label. |
bare-url | yes | Bare URL in prose; wrap in <…> for a CommonMark autolink. |
trailing-whitespace | yes | Trailing whitespace at end of line. |
inconsistent-list-marker | no | Mixed - / * / + markers in one bullet list. |
Default advisories
On by default but informational: they report but do not fail --check.
| Rule | Fix | Description |
|---|---|---|
duplicate-heading | no | Two headings at the same level under the same parent with the same text. |
unicodeable-subscript | yes | Braced super/subscript that has a single-codepoint Unicode form. |
info-string-typo | no | Fenced code block info string not in the known-languages allowlist. |
Opt-in rules
Off by default. Enable with lint.extend-select = ["name"] in configuration.
| Rule | Fix | Description |
|---|---|---|
list-tightness-flipped | no | list tightness from the tree disagrees with tightness from source bytes |
stray-dollar | yes | Literal $ in prose (opt-in for projects that don't use $…$ math). |
latex-command | yes | LaTeX control sequence in prose (opt-in for Unicode-math projects). |
escaped-emphasis | yes | Literal \_, \*, or \` escape in prose (mdformat damage). |
subscript-damage | yes | Identifier with * where a _ subscript was expected (formatter damage). |
name: adjacent-code-no-space default: true advisory: false fix: false since: 0.1.0
adjacent-code-no-space
Inline code span adjacent to a letter without whitespace.
What it does
Flags inline code spans whose backticks touch an adjacent alphanumeric or backtick character
without an intervening space, e.g. `foo`bar or `foobar` ``.
Why
CommonMark renders `foo`bar as <code>foo</code>bar; the visual result runs the code
into the prose with no visual break, which is almost always a typo for `foo` bar or
`foobar`. Two consecutive code spans with no space between them
( `foobar` ``) is even more ambiguous: it depends on backtick counting and renders
inconsistently across implementations.
Example (bad)
Call `vec.push(x)`afterwards.
Example (good)
Call `vec.push(x)` afterwards.
Configuration
- Disable inline:
<!-- mdwright: allow adjacent-code-no-space -->. - Disable in config:
[lint] ignore = ["adjacent-code-no-space"]. - Severity: non-advisory.
References
name: bare-url default: true advisory: false fix: true since: 0.1.0
bare-url
Bare URL in prose; wrap in <…> for a CommonMark autolink.
What it does
Flags http:// and https:// URLs that appear in prose without being wrapped in a CommonMark
autolink (<https://example.com>) or a [text](url) link.
Why
mdwright recognises GFM bare URL autolinks for rendering, but whether the same source renders as
a clickable link still depends on each downstream renderer's extension set. Wrapping the URL in
<…> makes the link explicit and portable across CommonMark renderers.
The autofix (safe: true) wraps the URL in angle brackets in place; mdwright fix applies it.
Example (bad)
See https://example.com for details.
Example (good)
See <https://example.com> for details.
Configuration
- Disable inline:
<!-- mdwright: allow bare-url -->. - Disable in config:
[lint] ignore = ["bare-url"]. - Severity: non-advisory. Safe autofix available.
References
name: duplicate-heading default: true advisory: true fix: false since: 0.2.0
duplicate-heading
Two headings at the same level under the same parent with the same text.
What it does
Flags two or more headings whose slug (lowercase, hyphenated text) collide within the same document.
Why
Markdown renderers (GitHub, mdBook, GitLab) assign each heading a URL fragment derived from its
text. Two headings with the same text collide on the fragment: only one is reachable, and which
one depends on whether the renderer disambiguates with a -1 suffix or silently overwrites.
External links to the document then drift unpredictably as new sections are added.
Example (bad)
## Examples
…
## Examples
Example (good)
## Examples
…
## More examples
Configuration
- Disable inline:
<!-- mdwright: allow duplicate-heading -->. - Disable in config:
[lint] ignore = ["duplicate-heading"]. - Severity: advisory. Math/theorem documents legitimately repeat
### Proofor### Corollaryunder one chapter, so the diagnostic surfaces but does not failmdwright check --check.
References
- GitHub's slug algorithm: lowercase, replace whitespace with
-, strip non-word characters.
name: duplicate-link-label default: true advisory: false fix: false since: 0.1.0
duplicate-link-label
Two [label]: definitions with the same label.
What it does
Flags [label]: … link definitions that share a label (case-insensitive, normalised) with
another definition in the same document.
Why
CommonMark says the first definition wins; later duplicates are silently discarded. The author usually intended for one of them to be a different label, so a duplicate is almost always a copy-paste mistake. Worse, the discarded definition often documents the intended target, so the link still resolves, but to the wrong URL.
Example (bad)
See the [docs][readme] and the [tutorial][readme].
[readme]: https://example.com/readme
[readme]: https://example.com/tutorial
Example (good)
See the [docs][readme] and the [tutorial][tutorial].
[readme]: https://example.com/readme
[tutorial]: https://example.com/tutorial
Configuration
- Disable inline:
<!-- mdwright: allow duplicate-link-label -->. - Disable in config:
[lint] ignore = ["duplicate-link-label"]. - Severity: non-advisory.
References
name: escaped-emphasis default: false advisory: false fix: true since: 0.1.0
escaped-emphasis
Literal \_, \*, or \` escape in prose (mdformat damage).
What it does
Flags \_ and \* escape sequences in prose that look like a writer trying to escape an
emphasis marker, but where the surrounding context confirms the writer meant the emphasis to
fire, e.g. \_text\_ (two escapes around a word) reading as a damaged italic.
Why
mdformat and a few other roundtrip tools used to defensively escape _ and * in prose,
even where CommonMark would not have parsed them as emphasis. After enough roundtrips,
documents accumulate \_word\_ patterns that no longer render as italic; they render as
literal _word_. The rule finds these and proposes the unescaped form.
The autofix removes the escapes (\_text\_ → _text_); safe to apply, but review first if
the prose genuinely contains literal underscores (filenames, identifiers).
Example (bad)
This is \_actually italic\_, despite the escapes.
Example (good)
This is _actually italic_, despite the escapes.
Configuration
- This rule is off by default. Enable with
[lint] extend-select = ["escaped-emphasis"]. - Disable inline:
<!-- mdwright: allow escaped-emphasis -->. - Severity: non-advisory. Safe autofix available.
References
name: heading-punctuation default: true advisory: false fix: false since: 0.1.0
heading-punctuation
Trailing . or : on a heading.
What it does
Flags ATX or setext headings that end with ., !, or ?: terminal sentence punctuation.
Why
A heading is a title, not a sentence; terminal punctuation on titles reads as a typo, breaks heading-anchor slugs in some renderers, and inflates the table of contents with stray characters. The convention is shared by Microsoft, GitHub, and Google's documentation style guides.
Example (bad)
## Configuring the linter.
Example (good)
## Configuring the linter
If the heading genuinely needs to be a question, this rule still fires; either reword it as a declarative title or suppress on that block.
Configuration
- Disable inline (for one heading): place
<!-- mdwright: allow heading-punctuation -->on the line before the heading. - Disable in config:
[lint] ignore = ["heading-punctuation"]. - Severity: non-advisory.
References
name: inconsistent-list-marker default: true advisory: false fix: false since: 0.1.0
inconsistent-list-marker
Mixed - / * / + markers in one bullet list.
What it does
Flags bulleted lists whose items use a mix of marker characters (-, *, +) within the same
list. Ordered lists are not affected.
Why
CommonMark says any of -, *, + is a valid bullet marker; switching markers between items
either (a) reads as a typo, or (b) actually starts a new list under CommonMark's rules, which
produces a visible gap in the rendered output that the author rarely intended. Either way the
fix is to pick one marker per list and stick to it.
The formatter normalises markers across the whole document; this rule fires when the source as written would render two adjacent lists when one was meant.
Example (bad)
- first
* second
- third
Example (good)
- first
- second
- third
Configuration
- Disable inline:
<!-- mdwright: allow inconsistent-list-marker -->. - Disable in config:
[lint] ignore = ["inconsistent-list-marker"]. - Severity: non-advisory.
References
- CommonMark §5.2: List items.
[fmt] list-markerconfig key controls the canonical marker for the formatter.
name: info-string-typo default: true advisory: true fix: false since: 0.2.0
info-string-typo
Fenced code block info string not in the known-languages allowlist.
What it does
Flags fenced code blocks whose info string (the word after the opening fence, e.g. rust in
```rust) is not in mdwright's allowlist of known languages and tools.
Why
Renderers ignore unknown info strings (no syntax highlighting); the most common cause is a
typo like ```pyhton or ```jsons. Catching them in the linter is faster than spotting the
unhighlighted block in a preview. The rule is advisory; projects use long-tail languages
mdwright cannot anticipate, so a flagged unknown info string might be legitimate.
The shipped allowlist covers the languages this project uses (Rust, Python, Lean, …) plus the
usual web stack (HTML, CSS, JS/TS, JSON, YAML, …) and common shell/console variants. Extend the
allowlist via [lint.info-strings] extra = […] in your config rather than disabling the rule.
Example (bad)
```pyhton
def f(): pass
```
Example (good)
```python
def f(): pass
```
Configuration
- Extend allowlist:
[lint.info-strings] extra = ["mycustomlang", "vendor-dsl"]. - Disable inline:
<!-- mdwright: allow info-string-typo -->. - Disable in config:
[lint] ignore = ["info-string-typo"]. - Severity: advisory.
References
- CommonMark §4.5: Fenced code blocks.
- Default allowlist:
DEFAULT_ALLOWLISTinsrc/stdlib/info_string_typo.rs.
name: latex-command default: false advisory: false fix: true since: 0.1.0
latex-command
LaTeX control sequence in prose (opt-in for Unicode-math projects).
What it does
Flags TeX-style \command{…} invocations in prose (outside math regions), for example
\textbf{important} or \emph{see below}.
Why
LaTeX commands in Markdown prose render literally in most renderers, breaking the visual
result. Authors who write \textbf{x} almost always wanted Markdown's **x** instead.
Projects targeting Pandoc may legitimately use LaTeX commands; for them, this rule is opt-in
and stays off.
The autofix is conservative: it replaces the command with the equivalent Markdown construct
where one exists (\textbf{x} → **x**, \emph{x} → *x*) and skips otherwise.
Example (bad)
This is \textbf{important}.
Example (good)
This is **important**.
Configuration
- This rule is off by default. Enable with
[lint] extend-select = ["latex-command"]. - Disable inline:
<!-- mdwright: allow latex-command -->. - Severity: non-advisory. Safe autofix where a Markdown equivalent exists.
References
- mdwright's command list (
src/stdlib/latex_command.rs).
name: list-tightness-flipped default: false advisory: true fix: false since: 0.2.0
list-tightness-flipped
list tightness from the tree disagrees with tightness from source bytes
What it does
Flags lists whose items mix tight (single-paragraph) and loose (blank-line-separated) shapes across the same list, leaving CommonMark's spec-defined "tightness" algorithm to make a surprising choice.
Why
CommonMark decides a list is "loose" if any item has a blank line between it and the next.
That single blank line then re-renders every item with <p> wrappers, which adds vertical
padding throughout the list. Authors who write one stray blank line frequently don't notice the
cascading effect on items above and below.
This rule is advisory because the "wrong" tightness is rarely a bug per se, but the surprise is consistent enough that flagging it is worth a one-line nudge.
Example (bad)
- first
- second
- third
(Loose: every item gets <p> wrappers because of the blank line above third.)
Example (good)
Tight throughout:
- first
- second
- third
Or loose throughout:
- first
- second
- third
Configuration
- This rule is off by default (opt-in). Enable with
[lint] extend-select = ["list-tightness-flipped"]. - Disable inline:
<!-- mdwright: allow list-tightness-flipped -->. - Severity: advisory (does not fail
--check).
References
name: orphan-reference-link default: true advisory: false fix: false since: 0.1.0
orphan-reference-link
Reference-style link with no matching [label]: definition.
What it does
Flags reference-style links of the form [text][label] or shortcut form [label] where
label has no matching [label]: … definition anywhere in the document.
Why
CommonMark renders an unresolved reference link as literal text; [text][label] shows up in
the output as [text][label] rather than as a clickable link. This silently breaks navigation,
usually because the author renamed a link definition without updating its references (or vice
versa).
Example (bad)
See the [installation guide][install] for details.
[setup]: docs/setup.md
Example (good)
See the [installation guide][install] for details.
[install]: docs/install.md
Configuration
- Disable inline:
<!-- mdwright: allow orphan-reference-link -->. - Disable in config:
[lint] ignore = ["orphan-reference-link"]. - Severity: non-advisory.
References
name: stray-dollar default: false advisory: false fix: true since: 0.1.0
stray-dollar
Literal $ in prose (opt-in for projects that don't use $…$ math).
What it does
Flags $ characters in prose that are not part of a recognised math region.
Why
Some Markdown renderers (Pandoc with --mathjax, KaTeX-aware mdBook, GitHub) treat $…$ as
inline math. Authors who rely on TeX-style \( … \) instead can accidentally produce math
output where they wanted a literal $5, and authors who rely on $…$ for math can produce
prose where they meant math. Either way, a stray single $ in prose is almost always a typo.
This rule is opt-in because projects that consistently use $…$ for math have no use for it.
The linter would flood with false positives. Turn it on in projects that standardise on
\( … \) or no inline math at all.
The autofix escapes the dollar (\$); review before applying.
Example (bad)
That costs $5.
Example (good)
That costs \$5.
Configuration
- This rule is off by default. Enable with
[lint] extend-select = ["stray-dollar"]. - Disable inline:
<!-- mdwright: allow stray-dollar -->. - Severity: non-advisory. Safe autofix available.
References
- Pandoc Markdown: math extension.
- mdwright treats
\( … \),\[ … \], and named environments as math regions;$…$is excluded by default because it conflicts with literal-dollar prose.
name: subscript-damage default: false advisory: false fix: true since: 0.2.0
subscript-damage
Identifier with * where a _ subscript was expected (formatter damage).
What it does
Flags damaged subscript notation produced by older roundtrip Markdown tools: patterns like
x\_i (escaped underscore that was supposed to be a TeX subscript) where the surrounding
context confirms a math intent (digits, single-letter identifiers, sign-and-digit pairs).
Why
Older mdformat versions and a few other tools defensively escaped _ inside what looked
like prose, even when the underscore was a math subscript. The result is x\_i in the
source, which renders as x_i literally, not as a subscript. The rule finds these and
proposes either the TeX form (x_i inside math) or the Unicode form (xᵢ) depending on
context.
The autofix is conservative: it removes the backslash if the surrounding context is unambiguously math; otherwise it leaves the source untouched and prints the diagnostic only.
Example (bad)
Take the i-th element x\_i.
Example (good)
Take the i-th element xᵢ.
(Or, inside math: Take the i-th element $x_i$.)
Configuration
- This rule is off by default. Enable with
[lint] extend-select = ["subscript-damage"]. - Disable inline:
<!-- mdwright: allow subscript-damage -->. - Severity: non-advisory. Safe autofix where context is unambiguous.
References
- See also
unicodeable-subscriptfor substituting Unicode subscript characters in prose.
name: trailing-whitespace default: true advisory: false fix: true since: 0.1.0
trailing-whitespace
Trailing whitespace at end of line.
What it does
Flags lines that end with one or more trailing space or tab characters. The exception is the
CommonMark hard-break convention: exactly two trailing spaces followed by a newline introduces
a <br> inside a paragraph, and that case is left alone.
Why
Trailing whitespace is invisible noise that survives copy-paste, complicates diffs (one-byte changes that touch every line), and frequently triggers spurious changes when collaborators have different editor settings. The autofix strips the trailing run while preserving the two-space hard break form.
Example (bad)
A paragraph.···
Another line.·
(· represents a stray trailing space.)
Example (good)
A paragraph.
Another line.
Configuration
- Disable inline:
<!-- mdwright: allow trailing-whitespace -->. - Disable in config:
[lint] ignore = ["trailing-whitespace"]. - Severity: non-advisory. Safe autofix available.
References
name: unbalanced-backtick default: true advisory: false fix: false since: 0.1.0
unbalanced-backtick
Backtick in prose that could not be paired with a closing fence.
What it does
Flags inline code spans whose backtick fence was not closed before the end of a paragraph or the
end of the document, e.g. `foo with no matching `.
Why
CommonMark's inline code rule requires the same number of opening and closing backticks. An unclosed opener silently turns the rest of the paragraph into prose (it is not rendered as a code span), so the visual result drifts from what the author meant. Worse, the unclosed run often eats markup intended for later constructs (links, emphasis), producing a cascade of silently broken rendering.
Example (bad)
Run `cargo build to compile.
Example (good)
Run `cargo build` to compile.
Configuration
- Disable inline:
<!-- mdwright: allow unbalanced-backtick -->. - Disable in config:
[lint] ignore = ["unbalanced-backtick"]. - Severity: non-advisory.
References
name: unicodeable-subscript default: true advisory: true fix: true since: 0.2.0
unicodeable-subscript
Braced super/subscript that has a single-codepoint Unicode form.
What it does
Flags math-region subscripts whose contents are simple enough to express as Unicode subscript characters: single digits, single letters that have a Unicode subscript codepoint, and short sign/digit pairs, when the surrounding context is prose rather than display math.
Why
In running prose like the i-th element x_i, the TeX form x_i renders as a code-styled
fragment in most renderers; the Unicode form xᵢ is cleaner, screen-reader-friendly, and
copyable. The rule fires only when the substitution is unambiguous and the surrounding context
does not already use TeX-style display math.
The autofix substitutes the Unicode form (safe: false); review the change before applying.
Example (bad)
Take the i-th component, x_i.
Example (good)
Take the i-th component, xᵢ.
Configuration
- Disable inline:
<!-- mdwright: allow unicodeable-subscript -->. - Disable in config:
[lint] ignore = ["unicodeable-subscript"]. - Severity: advisory.
References
- Unicode subscript block:
U+2080–U+209F. - mdwright's substitution table (
mdwright-math/src/unicode.rs).
name: math/unbalanced-braces default: true advisory: false fix: false since: 0.1.0
math/unbalanced-braces
{ / } inside a math body do not balance; math body normalisation is skipped for that region.
What it does
Inside a math region, flags { and } whose depths do not balance: either an extra opening
brace with no matching close, or a stray }.
Why
Math renderers (KaTeX, MathJax, pdflatex) all reject unbalanced braces, but they fail with opaque messages far from the source location. Catching the imbalance in the linter pinpoints the offending region in the markdown source, before the math reaches the renderer.
The check only runs inside math regions identified by [math/unbalanced-delim] and
[math/unbalanced-env]; balanced-brace checking in prose would be noise, since prose { and
} are not paired.
Example (bad)
$$\sum_{i=1^n i$$
Example (good)
$$\sum_{i=1}^n i$$
Configuration
- Disable inline:
<!-- mdwright: allow math/unbalanced-braces -->. - Disable in config:
[lint] ignore = ["math/unbalanced-braces"]. - Severity: non-advisory.
References
- mdwright's brace scanner (
src/stdlib/math_unbalanced_braces.rs). - TeX: braces group arguments to commands such as
\frac{a}{b}and\sum_{i}.
name: math/unbalanced-delim default: true advisory: false fix: false since: 0.1.0
math/unbalanced-delim
TeX-style math open delimiter (\[, \(, $$, $) with no matching close.
What it does
Flags display-math openers (\[) and inline-math openers (\() that have no matching
closer (\] / \)) before the end of the document.
Why
\[ … \] and \( … \) are TeX-style math delimiters. mdwright treats the region between an
opener and its closer as math: it suspends prose lint rules inside, and the formatter passes the
bytes through verbatim. An unbalanced opener means we cannot tell where math ends; every
following prose rule misreads the rest of the document, and the formatter might break the
content's rendering.
The check runs before any prose rule fires, so this is the first diagnostic you should fix in a document.
Example (bad)
The Laplacian is \[ \Delta f = \sum_i \partial_i^2 f
Example (good)
The Laplacian is \[ \Delta f = \sum_i \partial_i^2 f \].
If you wanted a literal \[ in prose, escape it: \\[.
Configuration
- Disable inline:
<!-- mdwright: allow math/unbalanced-delim -->. - Disable in config:
[lint] ignore = ["math/unbalanced-delim"]. - Severity: non-advisory (fails
mdwright check --check).
References
- mdwright's math-region recogniser (
src/stdlib/math_unbalanced_delim.rs). - LaTeX:
\[ … \]is the unnumbered display-math environment;\( … \)is the inline form.
name: math/unbalanced-env default: true advisory: false fix: false since: 0.1.0
math/unbalanced-env
LaTeX \begin{env} with no matching \end{env} at the same nesting depth.
What it does
Flags TeX \begin{env} blocks that have no matching \end{env} (or vice versa), where env is
one of the math environments mdwright tracks (equation, align, aligned, cases, matrix,
pmatrix, bmatrix, vmatrix, Vmatrix, gather, multline, split, and their starred
variants).
Why
\begin{align} … \end{align} and friends are math regions. Like \[ … \], they suspend prose
lint rules and are passed through the formatter verbatim. An unmatched \begin leaves the
parser unable to tell where math ends; an unmatched \end is almost always a copy-paste error
that will silently break rendering in any math-aware renderer.
Example (bad)
\begin{align}
a + b &= c \\
d - e &= f
Example (good)
\begin{align}
a + b &= c \\
d - e &= f
\end{align}
Configuration
- Disable inline:
<!-- mdwright: allow math/unbalanced-env -->. - Disable in config:
[lint] ignore = ["math/unbalanced-env"]. - Severity: non-advisory.
References
- mdwright's environment recogniser (
src/stdlib/math_unbalanced_env.rs). - amsmath user's guide for the canonical environment list.
Pre-commit
mdwright ships a .pre-commit-hooks.yaml at its repo root, so adding it to a project that uses the
pre-commit framework is a single repos: entry.
Quickest path: prebuilt binary
If contributors already have mdwright on their $PATH (e.g. via cargo binstall mdwright or a GitHub release
tarball), the -system variants avoid any toolchain dance:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/jcreinhold/mdwright
rev: v0.1.0
hooks:
- id: mdwright-check-system
- id: mdwright-fmt-check-system
mdwright-check-system runs mdwright check --check; mdwright-fmt-check-system runs mdwright fmt-check. Both exit
non-zero on issues, blocking the commit.
Letting pre-commit build mdwright
If you don't want to require an out-of-band install, the source-build hooks invoke cargo run -p mdwright from the
checked-out repository. First commit after a clean cache takes ~30 s; subsequent runs reuse Cargo's cache.
repos:
- repo: https://github.com/jcreinhold/mdwright
rev: v0.1.0
hooks:
- id: mdwright-check
- id: mdwright-fmt-check
Each contributor needs a Rust toolchain on the machine running the hook.
Available hook IDs
| ID | Equivalent CLI | Language |
|---|---|---|
mdwright-check | mdwright check --check | system via Cargo |
mdwright-fmt | mdwright fmt | system via Cargo |
mdwright-fmt-check | mdwright fmt-check | system via Cargo |
mdwright-check-system | mdwright check --check | system |
mdwright-fmt-system | mdwright fmt | system |
mdwright-fmt-check-system | mdwright fmt-check | system |
The mdwright-fmt / mdwright-fmt-system hooks rewrite files in place; combine with git add in a post-formatting
workflow, or prefer mdwright-fmt-check in CI gates that should never auto-commit.
Performance notes
pre-commit invokes hooks once per batch of matching files, not once per file, so per-invocation startup cost is paid
once per git commit (not once per changed file). The binary's cold-start is well under 50 ms on Linux release builds.
See also
- GitHub Actions: server-side CI gate.
- Editor integrations: fix-on-save flow.
GitHub Actions
Lint and format-check Markdown on every push and pull request.
Quickest path: the bundled composite action
mdwright publishes a composite action at the repo root (action.yml). It fetches the prebuilt binary from the matching
GitHub release and runs whatever mdwright command you pass:
name: markdown
on:
push:
branches: [main]
pull_request:
jobs:
mdwright:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: jcreinhold/mdwright@v0.1.0
with:
args: check --check .
- uses: jcreinhold/mdwright@v0.1.0
with:
args: fmt-check .
args defaults to check --check .. Pin the version to a tag (@v0.1.0) rather than @main so upstream releases
don't silently rebreak your CI.
The action ships prebuilt binaries for ubuntu-latest (x86_64-unknown-linux-gnu) and macos-latest
(aarch64-apple-darwin). Other targets fall back to the source-build recipe below.
Source-build fallback
For Windows runners or any platform we don't ship a prebuilt for, install from source:
jobs:
mdwright:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- run: cargo install mdwright --locked
- run: mdwright check --check .
- run: mdwright fmt-check .
The Swatinem/rust-cache@v2 step keeps subsequent runs around 5 s once the cache is warm; cold builds take a couple of
minutes.
cargo-binstall
If you want the binary speed of the composite action without depending on the action's action.yml contract, run
cargo-binstall directly:
- uses: cargo-bins/cargo-binstall@main
- run: cargo binstall --no-confirm mdwright
- run: mdwright check --check .
This resolves the same release artifacts and skips the compile step.
Reading the output in PR annotations
mdwright's pretty output is human-readable in the Actions log. For PR annotations (squiggles in the GitHub UI), pipe JSON v2 through a converter; there is no first-class action yet, but the schema is documented at Diagnostic schema and stable across 0.x.
See also
- Pre-commit: client-side gate before push.
- CI recipes: non-GitHub CI providers.
Editor integration
mdwright ships a built-in language server. Point your editor at mdwright lsp and you get diagnostics, hover docs,
quick-fixes, and on-save / on-type formatting without configuring an external formatter command.
The smallest possible config is Helix:
[language-server.mdwright]
command = "mdwright"
args = ["lsp"]
[[language]]
name = "markdown"
language-servers = ["mdwright"]
Position encoding gotcha
mdwright advertises UTF-8 position encoding. Clients that negotiate UTF-8 (VS Code 1.74+, Helix, Zed, neovim 0.10+) get
the full surface: diagnostics, hover, formatting, range formatting, on-type formatting, and code actions. Clients that
only support UTF-16 get diagnostics + hover; formatting and code-action providers are withdrawn rather than risk
corrupting non-ASCII sources, and a warning is logged via window/logMessage. Check your editor's LSP log if formatting
unexpectedly does nothing, it usually means the client never granted UTF-8.
VS Code
mdwright does not publish a dedicated VS Code extension. Install a generic LSP-client extension that lets you point at
an arbitrary LSP binary, then configure it to launch mdwright lsp:
{
"[markdown]": {
"editor.defaultFormatter": "<your-lsp-client-extension-id>"
},
"yourLspClient.servers": [
{
"command": "mdwright",
"args": ["lsp"],
"languages": ["markdown"]
}
]
}
Helix
Add to ~/.config/helix/languages.toml (or the workspace .helix/languages.toml):
[language-server.mdwright]
command = "mdwright"
args = ["lsp"]
[[language]]
name = "markdown"
language-servers = ["mdwright"]
auto-format = true
:lsp-restart after editing. Helix's space-a opens the code-action menu; pick Fix `bare-url`: … for a single
diagnostic or Apply all mdwright safe fixes to run every safe quick-fix at once.
Zed
Add to ~/.config/zed/settings.json:
{
"lsp": {
"mdwright": {
"binary": {
"path": "mdwright",
"arguments": ["lsp"]
}
}
},
"languages": {
"Markdown": {
"language_servers": ["mdwright"],
"format_on_save": "on"
}
}
}
Neovim
Using nvim-lspconfig on neovim 0.10+:
local lspconfig = require("lspconfig")
local configs = require("lspconfig.configs")
if not configs.mdwright then
configs.mdwright = {
default_config = {
cmd = { "mdwright", "lsp" },
filetypes = { "markdown" },
root_dir = lspconfig.util.find_git_ancestor,
settings = {},
},
}
end
lspconfig.mdwright.setup({
on_attach = function(_, bufnr)
vim.api.nvim_create_autocmd("BufWritePre", {
buffer = bufnr,
callback = function() vim.lsp.buf.format({ async = false }) end,
})
end,
})
Configuration
The server discovers .mdwright.toml, mdwright.toml, or pyproject.toml's [tool.mdwright] table by walking up from
the workspace root, exactly like the CLI. Edit one of those files and the server re-lints every open buffer on the next
file-watcher event; workspace/didChangeConfiguration triggers the same refresh.
The LSP server keeps the same default input-size boundary as the CLI: a single open buffer above 10 MB stays open, but mdwright publishes one document-level diagnostic and suppresses linting, formatting, range formatting, and code actions for that version.
Range-formatting caveats
textDocument/rangeFormatting and textDocument/onTypeFormatting snap the requested range out to the nearest whole
top-level block before formatting. For sources without document-scope reorderable constructs the snapped output is a
verbatim substring of the whole-document format; link definitions ([label]: dest) and footnote definitions
([^label]: …) are document-scope, so a range format may leave them in place where a whole-document format would have
moved them to the canonical location. Save the file (or invoke whole-document formatting) periodically to reconcile.
Smoke test
Before publishing an editor integration, run this manual check:
- Start the server with
mdwright lsp. - Open a Markdown file that contains
https://example.comand confirm thebare-urldiagnostic appears. - Insert
- [n]:Zfollowed by a carriage return, newline, and two tabs. The server should publish one parser diagnostic at the start of the file and keep running. - Replace the file contents with valid Markdown. Normal diagnostics should return without restarting the server.
- Run whole-document formatting and range formatting on a paragraph that mdwright changes.
- Edit
.mdwright.tomland trigger your editor's LSP config reload or file-watcher refresh. Open buffers should be re-linted with the new policy. - Check the editor's LSP log if formatting is unavailable; the common cause is a client that did not negotiate UTF-8 positions.
See also
- Pre-commit: backstop for missed editor saves.
- Lint vs. format: editor flow drives both pipelines.
CI recipes
Snippets for CI providers other than GitHub Actions. All assume mdwright is on $PATH.
GitLab CI
mdwright:
image: rust:1.91
cache:
paths:
- .cargo/
script:
- cargo install --root .cargo mdwright --locked
- ./.cargo/bin/mdwright check --check .
- ./.cargo/bin/mdwright fmt-check .
rules:
- changes:
- "**/*.md"
CircleCI
version: 2.1
jobs:
mdwright:
docker:
- image: cimg/rust:1.91
steps:
- checkout
- run: cargo install mdwright --locked
- run: mdwright check --check .
- run: mdwright fmt-check .
workflows:
docs:
jobs:
- mdwright
Buildkite
steps:
- label: ":memo: mdwright"
command: |
cargo install mdwright --locked
mdwright check --check .
mdwright fmt-check .
plugins:
- docker#v5.10.0:
image: rust:1.91
Drone
kind: pipeline
name: mdwright
steps:
- name: mdwright
image: rust:1.91
commands:
- cargo install mdwright --locked
- mdwright check --check .
- mdwright fmt-check .
Bare-metal / cron
A nightly job that lints a docs corpus and posts a report:
#!/usr/bin/env bash
set -euo pipefail
cd "$DOCS_REPO"
git pull --quiet
mdwright check --format=json . > /tmp/mdwright-report.jsonl
jq -s 'length' /tmp/mdwright-report.jsonl | xargs -I {} \
echo "mdwright: {} diagnostics in $DOCS_REPO"
The JSON v2 schema is stable; consume it programmatically (see Diagnostic schema).
See also
- GitHub Actions: the most-tested path.
- Pre-commit: local-machine equivalent.
Writing a lint rule
A lint rule is a type that implements LintRule. Rules see the parsed document via a curated query surface and
emit Diagnostic values. mdwright ships nineteen stdlib rules; this page shows how to add an external twentieth
without forking the binary.
The trait
#![allow(unused)] fn main() { pub trait LintRule: Send + Sync { fn name(&self) -> &str; fn description(&self) -> &str; fn check(&self, doc: &Document, out: &mut Vec<Diagnostic>); fn is_default(&self) -> bool { true } fn is_advisory(&self) -> bool { false } fn produces_fix(&self) -> bool { false } fn explain(&self) -> &str { "" } } }
nameis the kebab-case identifier ("no-todo-in-prose"); the dispatcher stamps it onto each emitted diagnostic.descriptionis the one-line summary shown bymdwright list-rules.checkreads theDocumentand pushesDiagnosticvalues.is_defaultcontrols whether the rule fires under--rules defaultand under the config defaultlint.preset = "default"when--rulesis omitted.is_advisorymakes diagnostics informational; they do not failmdwright check --check.produces_fixclaims that at least one diagnostic carries aFix.explainis the long-form markdown shown bymdwright explain <name>.
Worked example: no-todo-in-prose
A rule that flags TODO (case-sensitive) inside paragraph text but not inside code blocks, inline code, math regions,
or HTML blocks: Document::prose_chunks() handles every skip for you.
#![allow(unused)] fn main() { use mdwright_document::Document; use mdwright_lint::{Diagnostic, LintRule}; pub struct NoTodoInProse; impl LintRule for NoTodoInProse { fn name(&self) -> &str { "no-todo-in-prose" } fn description(&self) -> &str { "Literal TODO in paragraph text" } fn explain(&self) -> &str { "TODOs in user-facing documentation are usually accidents. \ Track pending work in an issue tracker, or suppress this \ rule with `<!-- mdwright: allow no-todo-in-prose -->`." } fn check(&self, doc: &Document, out: &mut Vec<Diagnostic>) { for slice in doc.prose_chunks() { for (offset, _) in slice.text.match_indices("TODO") { if let Some(d) = Diagnostic::at( doc, slice.byte_offset, offset..offset + "TODO".len(), "literal `TODO` in prose".to_owned(), None, ) { out.push(d); } } } } } }
Diagnostic::at performs the byte-offset arithmetic and line-index lookup. It returns Option because pathological
offsets could fall outside the source; on failure the diagnostic is dropped rather than the rule panicking.
Registering the rule
Add it to a RuleSet and lint:
#![allow(unused)] fn main() { use mdwright_document::Document; use mdwright_lint::RuleSet; use mdwright_lint::{Diagnostic, LintRule}; struct NoTodoInProse; impl LintRule for NoTodoInProse { fn name(&self) -> &str { "no-todo-in-prose" } fn description(&self) -> &str { "" } fn check(&self, _: &Document, _: &mut Vec<Diagnostic>) {} } let mut rules = RuleSet::stdlib_defaults(); rules.add(Box::new(NoTodoInProse)).expect("unique name"); let doc = Document::parse("My TODO: write the docs.")?; let diagnostics = rules.check(&doc); }
RuleSet::add returns Result<&mut Self, DuplicateRuleName> so two rules with the same name fail fast.
Shipping a custom binary
The CLI crate entry point mdwright::run_with_rules takes your assembled RuleSet and runs the whole CLI on top
of it; clap parsing, config discovery, every output format, the LSP server, the suppression machinery. Your main is
ten lines:
use mdwright_lint::stdlib; struct NoTodoInProse; impl mdwright_lint::LintRule for NoTodoInProse { fn name(&self) -> &str { "no-todo-in-prose" } fn description(&self) -> &str { "" } fn check(&self, _: &mdwright_document::Document, _: &mut Vec<mdwright_lint::Diagnostic>) {} } fn main() -> std::process::ExitCode { let mut rules = stdlib::all(); rules.add(Box::new(NoTodoInProse)).expect("unique name"); mdwright::run_with_rules(rules) }
Pass stdlib::all() (not stdlib::defaults()) so every opt-in stdlib rule remains selectable via --rules.
run_with_rules filters down from this pool based on the user's --rules argument and the active configuration file.
A complete working sample lives at examples/extending/ in the mdwright repository. The same crate has
integration tests that prove the rule fires end-to-end.
Publishing your custom binary
Your binary is just a crate. Push to crates.io with cargo publish (we recommend a name like <org>-mdwright so users
distinguish it from the official binary), or distribute the compiled artifact directly. Downstream users install your
binary and run it in place of mdwright; the command-line interface is identical.
Caveats
- Config-driven rule reconfiguration applies to stdlib rules only. The
[lint.info-strings] extraoption, for example, mutates the stdlibinfo-string-typorule even when a downstream binary has registered its own implementation under that name. Downstream rules read their own configuration; mdwright does not route config keys into them. - mdwright does not load lint rules at runtime. See Plugin loading for the rationale and the comparison of dynamic-loading alternatives we considered.
See also
- Plugin loading: why custom binaries are the supported path; what we rejected and why.
- Architecture: the IR boundary
LintRule::checksees. - Suppression comments: how rules opt out per-document.
- Diagnostic schema: the shape your diagnostics take after the dispatcher stamps them.
Plugin loading
mdwright does not load lint rules at runtime. The supported extension path is the one
Writing a lint rule describes: depend on mdwright-document and mdwright-lint, implement LintRule,
call mdwright::run_with_rules, and ship your own binary. This page explains why: what dynamic loading would buy, what
it would cost, and what would have to change for the decision to flip.
The decision
| Architecture | Verdict | Available in |
|---|---|---|
| A. Component crates + custom binary | Supported | today |
B. Dynamic cdylib loading via libloading | Rejected | never |
C. WASM plugins via wasmtime | Not planned | — |
The same trio shipped in ruff, which is mdwright's closest analogue in spirit. ruff thrives without a plugin runtime;
the trait surface plus a documented "ship your own binary" path covers every adopter who hits the limits of the stdlib.
Architecture A: Supported
A user writes a rule in their own crate, depends on mdwright-document, mdwright-lint, and mdwright from crates.io,
and ships a small binary:
use mdwright_lint::stdlib; struct MyRule; impl mdwright_lint::LintRule for MyRule { fn name(&self) -> &str { "my-rule" } fn description(&self) -> &str { "" } fn check(&self, _: &mdwright_document::Document, _: &mut Vec<mdwright_lint::Diagnostic>) {} } fn main() -> std::process::ExitCode { let mut rules = stdlib::all(); rules.add(Box::new(MyRule)).expect("unique name"); mdwright::run_with_rules(rules) }
| Capability | Full library access. Any rule the stdlib could write, an external rule can write. |
| Complexity | One CLI-crate function (run_with_rules). The rest is the trait that already shipped. |
| Cost to user | They ship a Rust binary. CI needs cargo. They pin a major version of mdwright. |
| Cost to maintainer | None new. The LintRule trait and the mdwright::run_with_rules signature are the surface; semver protects them. |
| Semver implications | LintRule is 1.0-grade. cli::run_with_rules is a fn(RuleSet) -> ExitCode; that signature is stable. |
This is what mdwright ships, in the mdwright crate and the examples/extending/ workspace member.
Architecture B: Dynamic Loading via libloading (Rejected)
.mdwright.toml:
[plugins]
my_rules = "./target/release/libmy_rules.dylib"
mdwright would load each cdylib at startup and look up a extern "Rust" fn mdwright_register(&mut Registry) symbol.
| Capability | Anything Rust can express. |
| Complexity | A libloading integration, a Registry shim, a plugin ABI versioning story. |
| Cost to user | They build a cdylib and put it in a path. First-run UX is opaque when the path is wrong. |
| Cost to maintainer | Substantial. The ABI surface is every type a plugin touches, including Diagnostic, Document, and every accessor. Rust has no stable ABI. Every Rust release risks breaking every plugin. |
| Semver implications | repr(Rust) types cross the boundary; layout is unspecified. Every release becomes an ABI compatibility check. |
Verdict: rejected. The maintenance burden is high, the gain over Architecture A is small (a cargo build versus an
in-process load), and Rust's lack of a stable ABI makes the contract perpetually fragile. Linking a single cdylib into
the official binary buys nothing a custom binary doesn't already give you.
Architecture C: WASM via wasmtime (Not planned)
.mdwright.toml:
[plugins]
my_rules = "./my-rules.wasm"
The plugin compiles to WebAssembly; mdwright runs it in a wasmtime sandbox, serialising documents and diagnostics
across the boundary.
| Capability | Restricted to whatever API mdwright exposes through the host bindings. |
| Complexity | Define and document a sandbox API; write host bindings; serialise Document and Diagnostic (no zero-copy across the boundary); manage WASM startup cost per file. |
| Cost to user | Plugin authors learn wasm-bindgen-style discipline; the trait is harder to use than the native one. |
| Cost to maintainer | Maintain the WASM API forever, plus a reference implementation, plus a performance story (parsing each file twice, once natively and once through the boundary, is not free). |
| Semver implications | The WASM API is its own semver surface, parallel to the native LintRule trait. |
Verdict: not planned. The cost is real and the demand is hypothetical. Revisit only when a concrete adopter has tried Architecture A, hit a specific limit (sandbox isolation, language diversity, hot reload), and articulated what the WASM contract would need to address.
What would change this decision
Architecture B is unlikely to ever become attractive: Rust would have to grow a stable ABI, which is not on any horizon, and even then the cost-benefit against custom binaries would barely move.
Architecture C is more interesting in principle. If you have a use case where:
- the rule body is in a language other than Rust, or
- you need to load and unload rules without restarting
mdwright, or - a sandboxed evaluation model is a hard requirement (e.g. running rules submitted by untrusted contributors),
please open an issue describing the concrete adoption story. A motivated maintainer behind a real need is the precondition for revisiting this.
Until then: depend on the library, write the rule, ship a binary. The example at examples/extending/ in the repository
is ready to fork.
Architecture
The design intent. Read this before you change document recognition, linting, or formatting.
Workspace boundaries
Each crate hides a different kind of knowledge. Read the layers top-down as a dependency stack: each depends only on layers below it:
Surfaces mdwright (CLI) mdwright-lsp
Engines mdwright-format mdwright-lint
Glue mdwright-config
Document mdwright-document
Math spans mdwright-math
TeX bodies mdwright-latex
mdwright-latex: TeX math-body parsing, Unicode layout, command vocabulary, and source translation.mdwright-math: Markdown math-span scanning and normalisation.mdwright-document: source coordinates, pulldown invocation, parse options, recognised Markdown facts.mdwright-config: interprets user config files into document, format, and lint policy.mdwright-format: formatting options, rewrite-family planning, and verification.mdwright-lint: diagnostics, rule execution, suppression, safe fixes.mdwright: the command-line binary: file discovery, terminal output, process exit policy.mdwright-lsp: editor-state delivery over LSP.
The repository root is a virtual workspace. There is no facade crate; library users depend directly on the crate that owns the capability they need.
Document facts
Document is parse/query only. It wraps the original source, canonical source mapping, line index, pulldown-derived
events, references, lists, code and HTML exclusion ranges, heading attributes, frontmatter, and math regions. Lint rules
and formatter rewrite producers consume these immutable facts instead of invoking pulldown independently.
Recognition policy lives in ParseOptions. Formatting policy lives in FmtOptions.
Math regions
The Markdown math scanner lives in mdwright-math and knows only about strings and byte ranges. The document crate
supplies Markdown exclusion ranges, stores accepted math regions, and gives downstream crates one stable inventory to
query. TeX body parsing and Unicode rendering belong in mdwright-latex, so Markdown delimiter policy does not leak
into the TeX parser.
This is the design choice that makes mdwright math-resilient. See Math regions for the user-facing view.
Formatting
Default formatting is identity emit: source bytes survive unchanged except for document-boundary policies. Opt-in style canonicalisation and wrapping run through private rewrite families owned by the current document snapshot.
Only private rewrite-family code in mdwright-format may apply formatter byte edits. It runs families in a fixed order,
rejects local overlaps within each family, applies a family plan to a scratch buffer, and verifies Markdown and math
signatures before committing the whole plan. It does not expose partial family progress as successful formatting.
Linting
RuleSet owns rule execution. Callers parse a Document, then call rules.check(&doc) or rules.check_with(&doc, opts). Suppressions, diagnostic sorting, standard-rule registry construction, and safe-fix application are lint-crate
details.
Doc tests
The crates/mdwright/tests/docs_examples.rs suite walks docs/src/**/*.md and validates every fenced code block:
```markdown/```md→ must parse withpulldown-cmark(no panic; non-empty event stream for non-empty input).```toml→ must parse withConfig::load_explicit.```toml,no-check→ skipped. Use this fence for non-config TOML (e.g.book.toml,pyproject.tomlexcerpts that show structure but are not valid config payloads).
A PR that introduces a broken example fails CI. The convention is invisible to mdBook (which treats the language tag as a CSS class) but the test sees it.
Where to look
| Want to change… | Edit… |
|---|---|
| A lint rule | crates/mdwright-lint/src/stdlib/<rule>.rs + its explanation |
| Document recognition | crates/mdwright-document/src/ |
| Math body language | crates/mdwright-latex/src/ |
| Math span recognition | crates/mdwright-math/src/ |
| Formatter rewrites | crates/mdwright-format/src/format/ |
| Wrap algorithm | crates/mdwright-format/src/format/wrap_pass.rs |
| Config schema | crates/mdwright-config/src/config.rs + xtask/src/config_docs.rs |
| CLI surface | crates/mdwright/src/cli.rs |
| LSP surface | crates/mdwright-lsp/src/lsp.rs |
Crate boundaries
mdwright is a virtual workspace. Each crate hides a different volatile decision; library users depend directly on the component crate they need. The repository root is a workspace manifest, not a package—there is no facade library and no root binary.
Cargo.toml # virtual workspace root, no package targets
crates/mdwright # command-line package and `mdwright` binary
crates/mdwright-document # parsed Markdown facts with stable source coords
crates/mdwright-latex # TeX and Unicode math-body lexing, parsing, layout, translation
crates/mdwright-math # Markdown math-span recognition and normalisation
crates/mdwright-format # formatter policy, rewrite-family planning, oracles
crates/mdwright-lint # diagnostics, rule execution, suppression, safe fixes
crates/mdwright-config # TOML schema, discovery, resolved option construction
crates/mdwright-lsp # tower-lsp server and editor-state bridge
Dependency direction:
mdwright-latex
│ │
│ └─────────────┐
│ │
mdwright-math
│
mdwright-document
│ │
mdwright-format mdwright-lint
│ │
mdwright-config
│
mdwright / mdwright-lsp
What each crate owns
| Crate | Hides |
|---|---|
mdwright-latex | TeX/LaTeX and Unicode math-body lexing, parsing, command vocabulary, Unicode layout, and source translation. |
mdwright-math | Markdown math delimiter and environment recognition; extraction of math bodies from source. |
mdwright-document | CommonMark/pulldown quirks, GFM extension overlays, source-coordinate invariants, parser-panic containment. Owns the only production pulldown-cmark chokepoint. |
mdwright-format | Formatter style policy, rewrite-family planning, local ownership checks, semantic verification. |
mdwright-lint | Rule dispatch, suppressions, diagnostic shape, safe-fix edit ordering, standard-rule registry. |
mdwright-config | TOML schema and discovery rules; resolves into the per-crate option types. |
mdwright | File discovery, argument parsing, terminal output, parallel execution, exit policy. |
mdwright-lsp | Editor-state delivery over LSP. |
The document crate is a parse/query abstraction; formatting and linting are operations owned by the crates that hide
their algorithms. Other crates consume document facts as domain records (structural spans, paragraphs, list-marker
sites, inline delimiter slots, heading attribute trailers, link destination slots, math regions, frontmatter, code/HTML
exclusions, top-level checkpoints) and do not couple to pulldown's event vocabulary, offset iterator, panic payloads, or
backtraces. Markdown math-region recognition and TeX math-body parsing are separate boundaries: mdwright-math
recognises where math lives in Markdown, while mdwright-latex owns the language inside those regions. That ownership
includes parser-backed Unicode-to-LaTeX translation: unsupported Unicode remains visible and records diagnostics or
losses instead of being silently guessed. Lint rules that need LaTeX vocabulary facts depend on mdwright-latex
directly rather than copying command tables or asking mdwright-math to pass them through.
pulldown_model tests may import pulldown directly because they deliberately probe upstream drift.
Public API entry points
mdwright_document::Document::parse_with_options(source, ParseOptions) -> Result<Document, ParseError>parses fallibly at the parser trust boundary. The returnedDocumentstores itsParseOptions; formatter entry points read that policy from theDocumentso every rewrite, semantic signature, verification reparse, and range-format checkpoint uses the same recognition.mdwright_format::{format_document, format_validated}over a parsedDocument.format_source(source, opts)is the convenience path for default parse policy.mdwright_lint::RuleSet::{check, check_with}for lint dispatch;mdwright_lint::apply_safe_fixesfor safe-fix application.- The
mdwrightpackage exposes command-extension helpers such asrun_with_rulesbut is otherwise a binary, not a library.
ExtensionOptions, MystOptions, and PandocOptions are document parse policy under ParseOptions. GFM extension
policy is parse.extensions.gfm.autolinks and parse.extensions.gfm.tagfilter; the document crate exposes autolinks as
general AutolinkFact values rather than URL-specific GFM facts. HTML render spelling is document-owned policy exposed
as [render] profile.
Dependency fences
Enforced by crates/mdwright/tests/dependency_fences.rs via cargo tree:
mdwright-latexdepends on no othermdwright-*crate and has no terminal, browser, Markdown parser, formatter, lint, config, CLI, or LSP dependencies.mdwright-mathmay depend onmdwright-latex; it must not depend on document, format, lint, config, CLI, or LSP.mdwright-documentmay depend onmdwright-math; it must not depend on latex body parsing directly, format, lint, config, CLI, LSP,clap,ignore,rayon,serde,toml,tokio,tower-lsp,owo-colors, oranyhow.mdwright-formatmay depend onmdwright-documentandmdwright-math; it must not depend on lint, CLI, LSP,clap,tokio, ortower-lsp. It does not importpulldown-cmarkormdwright_document::parsein production code.mdwright-lintdepends onmdwright-documentand may depend directly onmdwright-latexfor command vocabulary; it must not depend on format, CLI, LSP,clap,tokio, ortower-lsp, and it must not depend directly onmdwright-mathfor vocabulary.mdwright-configmay depend on document/format/lint option types and on the lint rule registry for resolving configured rule selection; it must not depend on CLI or LSP.mdwrightandmdwright-lspare delivery crates; heavy delivery dependencies belong there.mdwright-documentdoes not publicly export parser helpers.mdwright-mathdoes not publicly re-exportmdwright-latexas a pass-through facade.- Config schema and docs hold recognition keys under
[parse.extensions], not formatter policy. - Workspace-internal dependencies carry both
pathandversion.
Packaging
Every publishable crate uses versioned internal dependencies with local path entries for development. Cargo
strips paths during packaging while local workspace builds keep using the checked-out crates. The repository
.cargo/config.toml patches internal packages to local paths so local cargo package --no-verify checks run before
those packages exist on crates.io. Publishing order:
mdwright-latexmdwright-mathmdwright-documentmdwright-format,mdwright-lintmdwright-configmdwright-lspmdwright
Why these crates, not others
- No
mdwright-source/mdwright-source-map/mdwright-text: source canonicalisation and byte mapping are part of the document abstraction. Callers want a recognised document whose spans map back to user bytes, not a separate coordinate package. mdwright-latexis a real boundary, not a facade. TeX math-body parsing, Unicode math-source parsing, command vocabulary, Unicode layout, and source translation share grammar knowledge and should change behind one narrow API.mdwright-mathremains separate because Markdown delimiter recognition changes for different reasons and has different callers. Seelatex-boundary-and-dependency-audit.mdfor the design comparison and dependency audit. The release claim for this crate is evidence-backed common MathJax-style coverage where Unicode has honest representations, plus parser-backed Unicode-to-LaTeX source translation for the supported subset. It is not TeX macro expansion, browser-grade MathJax layout, or diagram interpretation.- No
mdwright-util: a utility crate has no domain responsibility and becomes a junk drawer. - No
mdwright-rules: standard rules and rule dispatch share suppression, diagnostic, and registry semantics; separating them would mirror an old directory layout shallowly. - No root facade package and no root
Documentnewtype to preservedoc.format()/doc.lint(): either wrapper would expose the same abstraction asmdwright-document::Documentand add no hiding.
The alternative was a larger mdwright-engine crate that owned Document, lint, format, and safe-fix operations. That
keeps formatter and linter complected with document recognition: one central crate would know parser byte ranges, lint
suppression semantics, formatter transactional verification, standard-rule registration, and safe-fix edit ordering. The
current split makes a new formatter rewrite touch mdwright-format plus tests; a new lint rule touch mdwright-lint
plus docs; a new config key touch mdwright-config plus the option type it resolves; a new CLI flag not drag parser,
formatter, or lint internals into the CLI surface.
Parser Boundary
mdwright-document is the only production crate that invokes pulldown-cmark.
Markdown is semantically total after mdwright canonicalises source bytes, but the parser implementation can still panic on malformed edge cases. The document crate contains that implementation risk:
- source bytes are canonicalised into one parser input;
- parser iteration is collected eagerly inside one private
catch_unwindboundary; - parser panics become
mdwright_document::ParseError; - document facts, signatures, HTML rendering, and block checkpoints are built only after successful collection.
Callers do not catch parser panics. They parse source with:
#![allow(unused)] fn main() { let doc = mdwright_document::Document::parse(source)?; }
Operations over an existing Document stay pure over recognised facts. Formatting a parsed document remains infallible:
#![allow(unused)] fn main() { let formatted = mdwright_format::format_document(&doc, &opts); }
Source-convenience APIs are fallible because they cross the parser boundary:
#![allow(unused)] fn main() { let formatted = mdwright_format::format_source(source, &opts)?; let html = mdwright_document::render_html(source)?; }
The transactional formatter uses the same policy for verification reparses. If a candidate output cannot be parsed, the
candidate is rejected and the unverified bytes are not committed. CLI and LSP delivery turn ParseError into controlled
file/editor diagnostics; they do not install parser-specific panic handlers.
Formatter rewrite boundary
The formatter starts from identity emit. Opt-in style and wrap changes run through private rewrite families in
mdwright-format; each family builds a locally non-overlapping plan, verifies the resulting document, and commits the
whole plan or none of it. Verification is a safety gate. It is not a convergence strategy.
Parser facts stay in mdwright-document. Rewrite policy stays in mdwright-format. The document crate tells the
formatter where the syntactic slots are; the formatter decides whether a configured style should rewrite those slots.
Adopted design
The rewrite subsystem uses ordered families:
- inline delimiters;
- list markers;
- thematic breaks;
- link destinations;
- heading attributes;
- table normal forms;
- math;
- frontmatter;
- terminal wrap.
Each canonical family sees a parsed snapshot of the current bytes. If it produces edits, the family plan checks that those edits do not overlap within the family. A local overlap rejects the family; it does not drop one edit and keep another. If the plan verifies, the whole plan commits and the pipeline starts again from the first canonical family on a fresh parse. If verification fails, the family skips; verification never repairs an incomplete plan.
Terminal wrap is not a peer canonical family. It runs only after a full canonical-family scan commits nothing for the current snapshot. If wrap commits paragraph edits, the pipeline starts again from the first canonical family so any newly exposed syntactic slots are normalized before wrap runs again.
The successful terminal state is a full pass with no family commits. If the guard pass count trips before that state, the formatter leaves the original source bytes unchanged. It does not return the last verified partial output as successful formatting.
Design comparison
| Design | Result |
|---|---|
| Typed candidates in one global list | Rejected. Enriching the old candidate type still leaves one shared selector that has to compare unrelated edits. It can express "keep this parent edit, drop that child edit" even when neither producer meant to own that relationship. |
| Ordered rewrite families | Chosen. Each family owns one style decision and must prove local non-overlap before commit. Cross-family order is explicit, and a family cannot silently steal ownership from another family through a range sort. |
The old global model was shallow: callers supplied a phase, owner, byte range, replacement, verification mode, and label, then relied on a common engine to interpret those fields correctly. The family pipeline hides that coordination in the formatter implementation. Producers no longer compete in one phase/range list.
Ownership rules
An edit must be created for the owner kind the producer intends. There is no fallback from a requested owner to the smallest containing owner. A list-marker edit asks for a list item; a thematic-break edit asks for a thematic break; a math edit asks for a math region. If the matching owner does not contain the range, no edit exists.
This follows the pattern established by list marker and inline slot facts. mdwright-document exposes marker-local
facts, delimiter slots, and link destination slots, so nested constructs cannot be represented as one enclosing rewrite
that accidentally covers child bytes.
Table Normal Forms
Table padding is a parent operation. It runs after inline delimiter and link-destination families, reads cell bytes from the current snapshot, and rewrites the whole table block as one verified operation. Row and cell edits are not exposed as candidates.
The table family uses document-owned table facts: source ranges for the table, rows, cells, and alignments. If a row has source cells beyond the recognised table column count, or a cell range is not contained in its row, the table family skips that table instead of dropping bytes it cannot model.
Terminal Wrap
Paragraph wrapping is a terminal operation. It reads document-owned paragraph facts from the current snapshot: line ranges, content ranges, prefixes, hard breaks, and inline atomics. It computes paragraph replacements after all earlier canonicalizers have reached local normal form, verifies the paragraph batch, and commits the batch or none of it.
Unsupported paragraph shapes stay unchanged. They are counted in the formatter report rather than widened into paragraph edits whose safety depends on later passes.
Pulldown-cmark model
Reference for the per-construct behaviours of pulldown-cmark 0.13 that mdwright depends on. Every emit-site decision
in crates/mdwright-format either matches a rule on this page or contradicts pulldown. A contradiction is a bug.
This file is paired with crates/mdwright/tests/pulldown_model.rs. Each rule below has one test in that file that feeds
the documented example to pulldown and asserts the documented event-stream shape. When pulldown changes upstream (a
release bump, a bug fix on their side), the test fails and this document must be updated before any mdwright code is
changed in response.
Every production parse flows through private helpers in crates/mdwright-document/src/parse.rs, which take a private
CanonicalSource<'_>. Construction routes through the document crate's source canonicalisation, so pulldown's input is
always CR-free and NUL-free in production. Rules below assume that pre-condition.
§1 Line endings
Source::canonicalise strips CR / CRLF → LF and NUL → U+FFFD before pulldown sees the buffer (CM §2.1, §2.3). Inside
HTML blocks, code blocks, math regions, and inline code, pulldown preserves the (now-LF) bytes verbatim in the CowStr
payload. In prose, a single \n between non-blank content lines becomes Event::SoftBreak; two consecutive \ns end
the current block.
Consequence: no CowStr produced by Event::Text, Event::Code, Event::Html, Event::InlineHtml,
Event::InlineMath, or Event::DisplayMath can ever contain a CR byte in production. The semantic-equivalence walker
in crates/mdwright-format relies on this; there is no per-event CR scrub.
Test: line_endings_softbreak_between_lines.
§2 Trailing blank lines in containers
Pulldown strips trailing blank lines from indented code blocks before emitting the final Event::Text. A
whitespace-only line is "blank."
The source "\t|\n\t" produces a single Event::Text("|\n") inside the indented code block: the trailing tab-only line
is consumed as a blank line, but the terminating \n of the content line stays in the payload. The formatter's
normalize_trailing_newline consumes that trailing LF when re-emitting; without it the formatter would emit one
trailing LF too many.
Cite: regression fixture crates/mdwright/tests/regressions/fuzz_indented_code_trailing_ws_drop.in.
Test: indented_code_keeps_content_terminating_newline.
§3 Emphasis pairing scope
CM §6.2 / §6.3: emphasis delimiters pair within their enclosing pairing container. The set of pairing containers pulldown observes: paragraph, heading, table cell, link body, image body, footnote definition.
Strikethrough (~~…~~) is not a pairing container: emphasis delimiters can open inside one strikethrough run and
close inside another, or across a strikethrough boundary entirely. The canonicalisation pass's per-rewrite verification
window includes surrounding bytes so a candidate that would re-pair across a strikethrough boundary is rejected.
Link bodies are a pairing boundary because CM §6.5 gives link text grouping higher precedence than emphasis grouping.
The two are not symmetric: *[foo*](bar) parses with the * not pairing (it's outside the link, the link doesn't
enclose it), but the link text [foo*] does not contribute to an outer *…* pair either.
Test: emphasis_pairs_within_paragraph and emphasis_pairs_across_strikethrough and
link_body_breaks_emphasis_pairing.
§4 Reference label normalisation
CM §4.7: trim leading and trailing whitespace; collapse internal runs of whitespace to a single U+0020; case-fold via Unicode default case folding. Two labels resolve to the same definition iff their normalised forms agree.
Pulldown 0.13 does not emit a LinkReferenceDefinition event. Definitions are resolved internally during parse, and
reference uses surface as Tag::Link { id: ".." } where id is the raw label bytes the source used (not the
normalised form). The mdwright-side authoritative scan for definitions lives in
crates/mdwright-document/src/refs.rs::build_reference_table; that module is the sole site that runs CM §4.7
normalisation.
Test: reference_label_normalisation_matches.
§5 HTML block boundaries
CM §4.6 defines seven HTML block types, each with its own start / end conditions. Two of the important asymmetries:
- Type 2 (
<!-- … -->or<?…?>style with a multi-char end marker): the block ends at the line containing the matching end marker (or EOF). The block's events are a sequence ofEvent::Html(line)per source line, each payload including its trailing newline, except possibly the last, which can omit the newline if the source did. - Type 6 (recognised tag names like
<table>): the block ends at the first blank line after the start (or EOF). Recognition is by tag name, not by close-tag matching:<table>opens a type-6 block; the close</table>does not by itself end it. A blank line does.
The block's payload bytes round-trip verbatim (modulo §1 canonicalisation), so the formatter emits HTML blocks by stamping the captured source slice rather than reconstructing from events.
Test: html_block_type2_emits_per_line_events.
§6 Emphasis-event range semantics
Event::Start(Tag::Emphasis) and Event::End(TagEnd::Emphasis) ranges in the offset iterator cover the entire run,
from the byte position of the first character of the opening delimiter, to the byte position after the last character
of the closing delimiter.
range.startofStart(Emphasis): index of the first*or_of the opening run.range.endofEnd(Emphasis): index after the last*or_of the closing run.- The body bytes occupy
[start_range.end, end_range.start).
Same convention for Strong. mdwright-document turns these ranges into inline delimiter-slot facts that name only the
opening and closing delimiter bytes. A pulldown change to either range convention would silently change those facts; the
model test catches the drift first.
Test: emphasis_event_range_spans_delimiters.
§7 Strong vs nested emphasis disambiguation
CM §6.5 disambiguates runs of two through six * / _ characters:
**foo**→Start(Strong),Text("foo"),End(Strong). Not emphasis-of-emphasis.***foo***→Start(Strong),Start(Emphasis),Text("foo"),End(Emphasis),End(Strong)(the nesting order depends on pairing direction; pulldown's left-flank rule decides).*_foo_*→Start(Emphasis),Start(Emphasis),Text("foo"),End(Emphasis),End(Emphasis). Two distinct delimiter characters pair independently.
Canonicalisation must keep these distinct. Inline delimiter families edit only delimiter slots and verify the resulting document before commit; a rewrite that would let pulldown re-segment the construct differently is skipped.
Test: strong_distinct_from_nested_emphasis.
§8 Definition-list event shape
With Options::ENABLE_DEFINITION_LIST set on the parser, the source
Term
: defn
emits the nested triple Start(DefinitionList) → Start(DefinitionListTitle) → … → End(DefinitionListTitle) →
Start(DefinitionListDefinition) → … → End(DefinitionListDefinition) → End(DefinitionList). Each definition's body
is opened/closed independently, so a definition containing multiple paragraphs emits multiple Start(Paragraph) /
End(Paragraph) pairs inside one DefinitionListDefinition.
The private document tree relies on this nesting shape to construct definition-list nodes in
crates/mdwright-document/src/tree.rs. Public callers consume document facts and signatures; they do not see pulldown's
event nesting directly.
Test: definition_list_emits_tag_triple.
§9 Heading attribute fields
With Options::ENABLE_HEADING_ATTRIBUTES set, the trailing { #id .class₁ .class₂ key=val } on an ATX heading
populates the id: Option<CowStr>, classes: Vec<CowStr>, and attrs: Vec<(CowStr, Option<CowStr>)> fields on
Tag::Heading. With the flag unset, those fields are None / empty regardless of source content (the trailer remains
in the heading text).
mdwright-document records the parsed trailer as a HeadingAttrSite. The mdwright-format heading-attribute family
emits the canonical trailer (#id first, then classes in source order, then key=val pairs in source order) when
FmtOptions::heading_attrs is Canonicalise. Under Preserve (the default), the source bytes round-trip unchanged.
Test: heading_attributes_populate_tag_fields.
§10 MyST / Pandoc directives, roles, substitutions, comments
pulldown-cmark v0.13.3 emits no events for any of the following constructs; mdwright treats them as source-owned extension regions under document parse policy:
| Construct | Owning policy |
|---|---|
| MyST / Pandoc directive containers | ParseOptions::extensions.myst.directive_containers |
MyST % line comments | ParseOptions::extensions.myst.comments |
| MyST inline roles | ParseOptions::extensions.myst.inline_roles |
| MyST substitution references | ParseOptions::extensions.myst.substitution_references |
| Pandoc inline attribute spans | ParseOptions::extensions.pandoc.inline_attribute_spans |
Pulldown sees these as plain paragraph / text events. mdwright therefore treats their source bytes as opaque unless a document-owned fact proves a narrower rewrite slot nearby.
For directive containers, an opener whose colon count is n matches the next colon-only line of count ≥ n. Nested directive bytes are preserved by source identity.
The formatter starts from source bytes, so unknown extension syntax is preserved by default. Opt-in rewrite families must use document-owned facts and exclusion regions before touching bytes near these constructs.
There is no drift test for these constructs because pulldown emits nothing to drift on. Per-fixture regression coverage
in crates/mdwright/tests/regressions/{directive_*,inline_role_*,myst_*}.in plus the vendored jupyter-book round trip
at crates/mdwright/tests/external_corpora.rs is the safety net.
Test matrix
mdwright's correctness sits on these test surfaces. For each: the invariant it defends, where it lives, and what it does NOT cover. Use this to decide which gate(s) a change to the formatter or canonicalisation pass needs to clear.
Per-construct golden suites
Location: crates/mdwright/tests/golden_inline/, crates/mdwright/tests/golden_block/, crates/mdwright/tests/golden_frontmatter/.
Each fixture is an *.in / *.out pair. Optional *.config.toml overrides FmtOptions::default(). The driver tests
live at crates/mdwright/tests/golden_inline.rs, crates/mdwright/tests/golden_block.rs,
crates/mdwright/tests/golden_frontmatter.rs and assert byte equality of the
formatted input against .out.
Invariant: structural emit and canonicalisation produce the expected bytes for the exact shapes the project cares about. This is where new features and bugfixes land their single load-bearing example.
Does NOT cover: behaviour on random inputs (property tests do that), behaviour under options not represented by a
*.config.toml (the matrix is per-fixture, not per-mode).
Property tests
Location: crates/mdwright/tests/properties.rs, generators at crates/mdwright/tests/common/proptest_gen.rs.
Four families:
| Family | Properties | Cases | Sweep gate |
|---|---|---|---|
| Whole-document, default opts | idempotent, html_preserving, lint_preserving, reference_resolver_round_trips | 256 | *_sweep at 4096, #[ignore] |
| Per-construct, default opts | <construct>_fragments_idempotent, <construct>_fragments_html_preserving for emphasis, strong, link-inline, link-reference, autolink, code-span, heading, fenced-code, quote, list, table, thematic, footnote | 256 each | none |
| Canonicalisation, 15 modes | canonicalise_<construct>_semantic_equivalence, canonicalise_<construct>_idempotent, canonicalise_document_*. Each iterates canon_opts() (preserve + per-knob × variants + 2 all-knobs-together). | 256 × 15 modes | canonicalise_document_*_sweep at 4096, #[ignore] |
| Rewrite-law interactions | *_interactions_are_profile_idempotent for nested lists, nested inline slots, tables with inline content, wrapped paragraphs with atomics, link destinations, math, and frontmatter. Each iterates preserve, mdformat, known fuzz profiles, and an all-family profile. | 96 × 5 profiles | none |
Invariants tested:
- Idempotence:
format(format(s)) == format(s): strict byte equality. - Rewrite-law completion: the second pass over generated rewrite-interaction inputs commits no rewrites; family planning must reach its normal form in the first public format call.
- HTML preservation / semantic equivalence:
semantically_equivalent(s, format(s)): canonical pulldown event streams agree. - Lint preservation:
formatdoes not introduce new default-on diagnostics (modulobare-url, which the formatter is allowed to fix into<...>autolinks).
Does NOT cover: option combinations beyond canon_opts(). The two "all-knobs" modes (opts_all_asterisk,
opts_all_underscore_or_dash) are the cross-knob coverage; a full Cartesian product would be 4·3·4·3·2·3 = 864 modes
and is not pulled in here.
Regression suite
Location: crates/mdwright/tests/regressions/, driver at crates/mdwright/tests/regressions.rs.
Each *.in file is a minimal failing input committed in the same change as its fix. Two gates per fixture:
regression_inputs_preserve_html:format_validatedmust succeed (HTML equivalent to source). Skipped for fixtures whose stem ends in.idem.regression_inputs_are_idempotent: byte equality across two format passes. Applied to every fixture.
Invariant: previously-broken shapes do not re-regress.
Does NOT cover: anything not in the file list. Adding a fixture is the way to lock in a new invariant.
GFM spec snapshot
Location: crates/mdwright/tests/gfm_spec.rs, vendored spec at crates/mdwright/tests/gfm-spec/spec.txt, snapshot
at crates/mdwright/tests/gfm-spec/snapshot.txt.
Two tests:
gfm_spec_snapshot: runs every spec case and compares the residual allowlist againstsnapshot.txt. Update withMDWRIGHT_UPDATE_SNAPSHOT=1.gfm_spec_coverage: asserts the bucketing (fully matching / intentional dev / tracked regression / unexpected) and refuses anyunexpectedcount.
Invariant: the formatter's GFM conformance is stable; the snapshot only changes when intentionally rebaselined.
Does NOT cover: behaviour outside the GFM-spec cases. Project-specific extensions (admonitions, frontmatter, math regions) live in their own golden suites.
Parser backend audit
Location: cargo xtask parser-audit, classifications in docs/architecture/parser-backend-audit.md.
The audit compares mdwright's pulldown-cmark backend against the vendored cmark-gfm expected HTML and a pinned
cmark-gfm binary. It renders mdwright through the cmark-gfm render profile so parser drift is not hidden by HTML
serializer spelling. Optional comrak output is reported as diagnostic evidence, not as a release gate. The audit also
performs risk-gated source-position checks for constructs that mdwright uses as formatter or lint facts.
Invariant: parser-backend differences are explicit. Unclassified pulldown HTML mismatches, unclassified
source-position risks, uncontained parser panics, rows marked fixed, and rows marked needs-mdwright-mitigation fail
the command.
Does NOT cover: formatter idempotence or rewrite safety; those remain covered by the GFM snapshot, property tests, fuzz, and production soak.
Fuzz oracles
Location: fuzz/fuzz_targets/.
| Target | Oracle | Option byte |
|---|---|---|
fuzz_idempotence | format(format(s)) == format(s) | Yes; drives wrap × mode × math × canonicalisation |
fuzz_parse_format | semantically_equivalent(s, format(s)) | Yes; same allocation as fuzz_idempotence |
fuzz_structured_idempotence | Structured-document idempotence over generated Markdown | Yes |
fuzz_verbatim_identity | Default options are identity modulo document-boundary normalisations | No |
fuzz_lint | Standard lint rules do not panic and diagnostics are deterministic/in-bounds | No |
fuzz_latex_render | TeX math-body parse plus Unicode render never panics; malformed or unsupported input returns typed errors | No |
fuzz_latex_translate | LaTeX-to-Unicode and Unicode-to-LaTeX source translation never panic; diagnostic/loss spans stay in bounds | No |
fuzz_markdown_math_translate | Markdown math-span scanning plus body-only translation never panics and preserves valid span accounting | No |
fuzz_unicode_latex_roundtrip | Supported Unicode math source reaches the public translation fixed point L(U(L(y))) == L(y) | No |
Option byte allocation (fuzz_idempotence and fuzz_parse_format, identical):
| Bits | Field |
|---|---|
| 0–1 | wrap (Keep, No, At(80), At(120)) |
| 2 | math.normalise |
| 3 | reserved for corpus continuity |
| 4–7 | Canonicalisation mode (16 enumerated: preserve, one per style knob, two combined) |
Invariant: no input causes a panic or property violation in 10 minutes. Parser implementation panics are converted
to ParseError at the mdwright-document boundary, so fuzz targets discard parse errors through normal Result
handling rather than wrapping product calls in catch_unwind. TeX math-body failures return LatexError or translation
diagnostics through mdwright-latex; fuzz treats those as normal product output and checks that reported spans are
valid. Unicode-to-LaTeX fuzzing exercises the parser-backed public translator rather than private lexer or AST APIs.
Findings are committed to crates/mdwright/tests/regressions/ or to mdwright-latex coverage fixtures as appropriate.
Production soak
cargo xtask production-soak --corpus-root <path> runs parser, lint, format-validation, idempotence, and fmt-check
checks over the corpus enumerated by crates/mdwright/benches/corpus.list plus representative external Markdown
fixtures. The command reports parse errors, validation failures, idempotence failures, fmt-check disagreements, rewrite
candidate totals, maximum file size, and slowest files.
Does NOT cover: behaviour beyond MAX_INPUT = 65 536 bytes; the libFuzzer harness skips bigger inputs. The CLI
enforces the same shape via --max-input-bytes.
mdformat parity
cargo xtask mdformat-parity --corpus-root <path> --corpus-name <name> --mdwright-config <path> --mdformat-config xtask/fixtures/mdformat-parity/mdformat.toml copies a corpus into isolated temp roots, runs mdwright and mdformat, and
writes JSON / Markdown reports under target/mdwright/parity/. The command compares changed file sets, line-diff stats,
idempotence, mdBook buildability when applicable, and semantic equivalence of each formatter output to the original.
The mdformat config is checked in as an xtask fixture so mdformat does not look like the repository's own formatter.
The parity gate is intentionally not byte-equality with mdformat. Differences are allowed only when
docs/architecture/mdformat-parity.md classifies them as configured, intentional, or upstream-owned. The command fails
on unclassified differences, mdwright semantic drift, parser errors, idempotence failures, mdBook failures, rows marked
fixed that still appear, and rows marked open-bug.
Release evidence
cargo xtask release-evidence --output target/mdwright/release aggregates local release-candidate evidence into
release-evidence.json and release-evidence.md. The command records git state and tool versions, reads existing
parser-audit, mdformat-parity, production-soak, and package/install reports, and points at manual notes for fast checks,
fuzz rounds, and benchmarks.
Invariant: the release candidate has one inspectable evidence bundle that states the current claim, lists accepted divergences, and names missing evidence as blockers.
Does NOT cover: running expensive gates. The command summarizes evidence; it does not replace parser-audit, mdformat-parity, production-soak, fuzzing, packaging, or Criterion.
How to choose what to add when
| Symptom | Right surface |
|---|---|
| One specific fixture or shape misbehaves | Golden suite (add an *.in / *.out pair) |
| A bug class spans many inputs of one construct | Per-construct property (a new <construct>_fragments_* pair, or strengthen the existing one) |
| A canonicalisation mode misbehaves | Canonicalisation property (extend canon_opts()) |
| A minimal counterexample of a property failure surfaces | Regression suite (*.in next to the fix commit) |
| GFM conformance shifts | Audit gfm_spec_coverage first, then rebaseline the snapshot with a comment line above each new entry |
| Pathological inputs reach a panic / property violation | Add the input as a regression fixture; libFuzzer will not re-find it once it round-trips |
What this matrix does NOT include
Lint-rule coverage lives with each rule under crates/mdwright-lint/src/stdlib/* and its tests/; that's a parallel
matrix and isn't summarised here. CLI-surface tests live at crates/mdwright/tests/cli_*.rs. The diagnostic JSON v2
schema is gated by crates/mdwright/tests/diagnostic_json_v2.rs.
Stability charter
Invariant. Formatting a parsed document preserves Markdown meaning, or refuses the rewrite that would change it. Default formatting is identity emit modulo document-boundary normalisation; opt-in style and wrap changes are transactional byte rewrites, each verified against document-owned parser facts.
mdwright's correctness rests on three deep modules in mdwright-document and mdwright-format, not on layered
agreements between consumers:
- One pulldown chokepoint in
mdwright-document. Every productionpulldown_cmark::Parserinvocation goes through private helpers incrates/mdwright-document/src/parse.rsthat take the privateCanonicalSource<'_>newtype. Construction routes through source canonicalisation, so the type system enforces the chokepoint. Upstream parser panics convert toParseErrorat this boundary. - Structural emit is identity.
format_documentstarts from the parsed document's canonical source bytes; default formatting reaches only document-boundary normalisation. - Style canonicalisation and wrapping are rewrite-family operations. Opt-in rewrites run as ordered families. Each family builds a locally non-overlapping normal-form plan, verifies the whole plan, and commits all edits or none.
The bug class that motivated this design—formatter mutations that perturb their own parse context—survives only as private rewrite-family edits. A family cannot commit unless the document-level verification predicate accepts it.
The bug class
As long as any emit site reads source bytes to choose its representation, perturbation is possible. The bugs that drove this design all share one shape: a downstream pass predicted what pulldown would do, instead of asking pulldown what it does. Two examples:
_*/*_(5 bytes). Pulldown sees nested emphasis; a predictive formatter emitted*\*/\**, which re-parses to a single emphasis.**u*~***~. Pulldown sees one Strong wrapping Emphasis-and-text plus trailing literals; a predictive formatter oscillated between**u*~*\*\*~and**u*~~\*\*\*~~on successive passes.
Removing the read site—preserving source representation byte-for-byte— removes the bug class. Style canonicalisations that do need to choose a representation move into a separate pass where each rewrite family verifies before committing.
The pipeline
source → CanonicalSource → pulldown::Parser → typed IR
→ structural emit (source-preserving)
→ normalize_line_endings_lf
→ [if opts enables rewrites: rewrite-family pipeline]
→ normalize_trailing_newline → apply_end_of_line → out
Only document-owned canonicalisation can produce a CanonicalSource; only mdwright-document invokes pulldown-cmark.
Parser panics become ParseError at that boundary. The rewrite-family pipeline reparses after each committed family so
later families see current document facts. Success means a full pass over enabled families commits nothing. If the guard
pass count trips first, mdwright leaves the original source bytes unchanged rather than returning a partially normalized
buffer as success.
Public API
| Symbol | Behaviour |
|---|---|
Document::parse(&str) -> Result<Document, ParseError> | Fallible at the parser trust boundary. |
format_document(&doc, opts) -> String | Infallible over an already-parsed document. |
format_validated(&doc, opts) -> Result<String, FormatError> | Carries parse failures and semantic divergence. |
semantically_equivalent(a, b) -> Result<bool, ParseError> | Reparses both inputs to build semantic signatures. |
FmtOptions style knobs default to Preserve. Fluent setters (with_italic, with_strong, with_list_marker,
with_ordered_list, with_thematic_break, with_link_def_style) cover programmatic callers; the TOML keys are
[fmt] strong, [fmt] thematic-break, and the existing per-knob spellings. User-facing surfaces are documented in
docs/src/format/policy.md and docs/src/format/style.md.
Risk register
| Risk | Bound | Evidence |
|---|---|---|
| A rewrite family contains overlapping local edits. | The family plan rejects before verification; no individual edit is selected out of the overlap. | Unit tests in mdwright-format cover local-overlap rejection. |
| The rewrite-family pipeline never reaches a no-commit pass. | The guard pass count logs tracing::warn! and returns the original source bytes unchanged. | Idempotence regressions and fuzz replay cover known sustained-fuzz failures. |
| Verification misses a cross-paragraph effect. | Families verify the whole document and skip if the document or math signature diverges. | Skips are logged; high-skip-rate documents surface in production traces. |
| Structural emit edge cases the 4096-case sweep doesn't reach. | Two accepted FmtOptions::default() regressions: an empty list item at EOF, and an ATX heading with a trailing hash. | Both reproduce as pre-existing structural-emit bugs surfaced by broader option-space fuzz coverage. |
| Pulldown behaviour drifts between releases. | docs/architecture/pulldown-model.md documents the invariants; tests/pulldown_model.rs fails when pulldown disagrees. | One chokepoint at crates/mdwright-document/src/parse.rs is the single site any drift mitigation lands. |
Out of scope
- Replacing
pulldown-cmark. The bug class is about agreement with pulldown; a different parser trades one disagreement surface for another. - AST-level structural diff in the verification gate. Event-stream equivalence is sufficient and cheap; AST diff amplifies position-noise into false divergence.
- A custom emphasis tokeniser. CM §6.2 is correct; mdwright's job is to produce output that lets pulldown's tokeniser reach the same answer as it did on the source.
- Cross-knob canonicalisation modes beyond what
FmtOptionsexposes. For aggressive cross-knob normalisation, use mdformat; see the README.
What the bar is now
Two rg invariants guard against regression of the design above:
rg 'opts\.(italic|strong|list_marker|thematic|link_def|ordered_list)' crates/mdwright-format/src/returns only the style-policy call sites incrates/mdwright-format/src/format/canonicalise.rs. Structural emit does not read style knobs.- Every production
pulldown_cmark::Parserinvocation routes through the document parse boundary;#[cfg(test)]exceptions carry an inline justification.
The normalize_* post-passes (normalize_trailing_newline, source_has_effective_trailing_newline,
normalize_line_endings_lf, apply_end_of_line) live in crates/mdwright-format/src/format/mod.rs and are wired
through the public formatting entry points. They are boundary-policy transforms, not perturbation sources:
normalize_trailing_newline reads source bytes to decide whether the output ends with \n; the LF normaliser checks
the invariant carried by document construction.
mdformat parity
cargo xtask mdformat-parity compares mdwright against mdformat (with the GFM, frontmatter, footnote, and MkDocs
plugins) over an isolated corpus copy. The goal is classified compatibility, not byte identity. Every mdwright/mdformat
output difference is either fixed, configured, or recorded below as intentional; otherwise the command fails as a
release gate.
Use [fmt] profile = "mdformat" to ask "how close can mdwright get to mdformat while keeping verified rewrites?" The
profile keeps mdformat's default wrap = keep; a project that wants mdformat with a column limit must set wrap
explicitly. When wrap is an integer, mdwright enforces that line budget for breakable prose in every profile. The
default stable wrap strategy uses mdformat-compatible soft-break reflow. The mdformat profile also defaults list
continuation indentation to four spaces.
Status values
open-bug: known unresolved gap; reported as a failing release gate.intentional-divergence: mdwright deliberately keeps a different byte style while preserving semantics.upstream-parser-limitation: difference pinned to parser behaviour outside mdwright.configured: caused by mdwright project configuration, usually generated-doc excludes.fixed: should no longer appear; the xtask fails if it does.
Class is free-text and groups rows by root cause. style-option-mismatch covers remaining wrap or indentation policy
differences; mdformat-semantic-drift covers cases where mdformat's output is not semantically equivalent to the
source; intentional-policy covers generated files excluded by configuration.
Classifications
The table below is parsed by xtask::mdformat_parity::load_classifications: each row must have exactly seven cells.
Path patterns support *, **, and prefix/** globs; find_classification returns the first matching row, so
specific paths come first and catch-all ** rows last. Formatter divergences are owned by the formatter team;
generated-doc exclusions are owned by docs.
| Corpus | Path | Construct | Class | Status | Owner | Resolution |
|---|---|---|---|---|---|---|
| external | jupyter_book_minimal/admonitions.md | MyST directives | mdformat-semantic-drift | intentional-divergence | formatter | mdwright preserves MyST directive structure; mdformat with --no-validate rewrites this fixture in a way mdwright's semantic oracle rejects. |
| external | jupyter_book_minimal/asides.md | MyST directives | mdformat-semantic-drift | intentional-divergence | formatter | Same shape as admonitions.md. |
| external | jupyter_book_minimal/directives.md | MyST directives | mdformat-semantic-drift | intentional-divergence | formatter | Same shape as admonitions.md. |
| external | jupyter_book_minimal/blocks.md | MyST and Pandoc blocks | mdformat-semantic-drift | intentional-divergence | formatter | mdwright preserves MyST and Pandoc block structure; mdformat with --no-validate rewrites this fixture in a way mdwright's semantic oracle rejects. |
| mdwright-docs | src/SUMMARY.md | nested list indentation | style-option-mismatch | intentional-divergence | formatter | mdwright preserves the existing two-space mdBook summary nesting; mdformat rewrites nested bullets to four spaces. |
| mdwright-docs | src/extending/lint-rules.md | list continuation indentation | style-option-mismatch | intentional-divergence | formatter | The repository policy keeps marker-width continuation; fmt.lists.continuation-indent = "four-space" provides the mdformat spelling when requested. |
| mdwright-docs | src/configuration.md | generated docs | intentional-policy | configured | docs | Generated by cargo xtask doc-config; excluded so source docs and generator drift checks do not fight. |
| mdwright-docs | src/reference/cli.md | generated docs | intentional-policy | configured | docs | Generated by cargo xtask doc-cli. |
| mdwright-docs | src/reference/diagnostic-schema.md | generated docs | intentional-policy | configured | docs | Generated from diagnostic schema tests. |
| mdwright-docs | src/rules/** | generated rule docs | intentional-policy | configured | docs | Generated by cargo xtask doc-rules; rule pages intentionally contain lint violations. |
| mdwright-docs | ** | prose wrap | style-option-mismatch | intentional-divergence | formatter | Integer wrap now enforces a line budget for breakable prose in every profile. mdwright may wrap lines that mdformat leaves above the configured width. |
| release-prose-corpus | ** | prose wrap line budget | style-option-mismatch | intentional-divergence | formatter | mdwright enforces wrap = 120 for breakable prose lines. mdformat 1.0.0 leaves the observed over-budget source lines unchanged. |
| release-math-corpus | **/*-template.md | mdformat semantic drift | mdformat-semantic-drift | intentional-divergence | formatter | mdwright preserves the source semantics; mdformat changes the rendered HTML on some template files in this corpus. |
| release-math-corpus | ** | math-heavy prose and list reflow | style-option-mismatch | intentional-divergence | formatter | mdwright treats ordinary paragraph newlines as soft breaks and enforces over-budget breakable lines. mdformat leaves some over-budget lines unchanged. |
| external | ** | prose wrap | style-option-mismatch | intentional-divergence | formatter | Same as the mdwright-docs catch-all. Single oversized atomics may still exceed the target by policy. |
Release use
Run against the pinned mdformat baseline:
cargo xtask mdformat-parity \
--corpus-root docs \
--corpus-name mdwright-docs \
--mdwright-config .mdwright.toml \
--mdformat-config xtask/fixtures/mdformat-parity/mdformat.toml
The mdformat config lives under xtask/fixtures/ because it is an oracle fixture, not the repository's own formatter.
Output lands at target/mdwright/parity/mdformat-parity.{json,md}. A clean release run has no unclassified differences,
no semantic drift, no parse errors, no idempotence failures, and no rows marked open-bug.
Parser Backend Audit
cargo xtask parser-audit compares mdwright's production pulldown-cmark backend with cmark-gfm, using the vendored
GFM spec expected HTML as the primary oracle. The audit renders mdwright through the opt-in cmark-gfm render profile
so renderer spelling drift is separated from parser-tree drift. It does not replace mdwright-document as the
production parser boundary.
cmark-gfm is the primary oracle because crates/mdwright/tests/gfm-spec/spec.txt is vendored from cmark-gfm and the
GFM ecosystem treats its rendered HTML as the reference. comrak is optional diagnostic evidence for rendered HTML and
source-position behaviour; it is not a release gate unless a future audit shows it catches mdwright-relevant risks that
cmark-gfm cannot expose.
Running
cargo xtask parser-audit \
--case-set all \
--output target/mdwright/parser-audit \
--ensure-tools \
--include-comrak
The command builds a pinned cmark-gfm under target/mdwright/tools/ when --ensure-tools is passed. To use an
already-built binary explicitly, pass --cmark-gfm-bin <path>.
Reports are written to:
target/mdwright/parser-audit/parser-audit.jsontarget/mdwright/parser-audit/parser-audit.md
Examples marked disabled in the vendored GFM spec are still reported, but cmark-gfm binary drift from the expected
HTML for those cases is not a command failure because the upstream spec does not treat the rendered checkbox spelling as
a strict conformance assertion.
The audit also checks source-position envelopes for constructs mdwright uses as formatter/linter facts. It maps
cmark-gfm data-sourcepos line/column ranges back to source bytes and compares them against mdwright document facts by
construct kind. This is a risk gate, not exact AST equality: a difference is reported only when mdwright has no
overlapping fact for a rewrite/lint-owned construct.
Status Values
pulldown-html-mismatch: mdwright's pulldown-backed HTML differs from cmark-gfm expected HTML.mdwright-policy: mdwright intentionally differs from the cmark-gfm oracle for a documented parser policy.extension-gap: the compared parser does not implement the construct.sourcepos-risk: rendered output matches, but coordinate facts may affect formatter/lint safety.event-only: internal event/AST shape differs while rendered HTML and semantic signatures match.upstream-panic: parser panic or crash contained bymdwright-document.needs-mdwright-mitigation: upstream behaviour is unsafe for mdwright and still needs a fix.fixed: the difference should no longer appear; the audit fails if it does.
Classifications
Current gfm-spec audit snapshot with mdwright's cmark-gfm render profile:
| Metric | Count |
|---|---|
| Cases | 673 |
| HTML mismatches | 15 |
| Sourcepos envelopes checked | 1071 |
| Sourcepos differences | 0 |
| Unclassified differences | 0 |
Observed difference classes:
| Observed | Count |
|---|---|
pulldown-html-mismatch:emphasis-resolution | 9 |
pulldown-html-mismatch:html-block-rendering | 3 |
pulldown-html-mismatch:tasklist-rendering | 2 |
pulldown-html-mismatch:table-rendering | 1 |
upstream-panic | 1 |
| Case Set | Key | Observed | Status | Owner | Resolution |
|---|---|---|---|---|---|
| * | * | mdwright-policy:gfm-bare-autolinks-enabled | fixed | document | Parser-audit now mirrors the cmark-gfm extension set per spec case, so default production GFM policy no longer creates non-extension CommonMark audit drift. |
| * | * | mdwright-policy:gfm-email-autolinks-disabled | fixed | document | GFM email autolinks are recognised by mdwright-document's source-positioned GFM overlay. |
| * | * | mdwright-policy:gfm-tagfilter-disabled | fixed | document | GFM tagfiltering is enabled by default in mdwright-document's render/signature policy. |
| * | * | pulldown-html-mismatch:gfm-autolink | fixed | document | GFM URL and email autolink mismatches should be handled by mdwright-document's GFM autolink overlay. |
| * | * | pulldown-html-mismatch:gfm-tagfilter | fixed | document | GFM tagfilter mismatches should be handled by mdwright-document's GFM tagfilter overlay. |
| * | * | pulldown-html-mismatch:quote-escaping | fixed | document | The cmark-gfm render profile escapes double quotes in text/code contexts where cmark-gfm emits ". |
| * | * | pulldown-html-mismatch:href-escaping | fixed | document | The cmark-gfm render profile percent-encodes link destinations where cmark-gfm percent-encodes them. |
| gfm-spec | Tables (extension) | pulldown-html-mismatch:table-rendering | fixed | document | The cmark-gfm render profile spells ordinary GFM table markup with cmark-gfm row, alignment, and body layout. |
| gfm-spec | case-160 | pulldown-html-mismatch:table-rendering | pulldown-html-mismatch | document | This is a raw HTML table containing indented code, not a GFM table. The remaining drift is parser/backend handling of blank raw-HTML text around child blocks, not formatter rewrite risk. |
| gfm-spec | case-279, case-280 | pulldown-html-mismatch:tasklist-rendering | pulldown-html-mismatch | document | These spec examples are marked disabled; cmark-gfm's binary output and mdwright's cmark-gfm profile match, while the vendored expected HTML intentionally does not assert the checkbox spelling. |
| * | * | extension-gap:myst-definition-list | extension-gap | document | cmark-gfm does not own MyST directive syntax; mdwright's default definition-list recognition can make directive-heavy fixtures render differently through pulldown HTML, while formatter preservation is handled by mdwright document facts. |
| corpus | external:jupyter_book_minimal/admonitions.md, external:jupyter_book_minimal/asides.md, external:jupyter_book_minimal/blocks.md, external:jupyter_book_minimal/directives.md | sourcepos-risk:paragraph | extension-gap | document | cmark-gfm reports MyST directive/admonition syntax as ordinary paragraph source ranges, while mdwright treats the same bytes as extension-owned containers or preservation facts. The corpus rows pin that non-GFM coordinate drift so it cannot silently expand. |
| gfm-spec | case-120, case-152, case-153 | pulldown-html-mismatch:html-block-rendering | pulldown-html-mismatch | document | pulldown's event stream omits leading indentation on raw HTML blocks that cmark-gfm preserves in rendered HTML. mdwright accepts this as backend render drift because source-coordinate facts remain stable. |
| gfm-spec | case-144 | pulldown-html-mismatch:html-block-rendering | fixed | document | The cmark-gfm render profile now matches cmark-gfm's newline placement for this list/raw-HTML case. |
| gfm-spec | case-398, case-426, case-434, case-435, case-436, case-473, case-474, case-475, case-477 | pulldown-html-mismatch:emphasis-resolution | pulldown-html-mismatch | document | pulldown's emphasis resolution differs from cmark-gfm on these delimiter-stack edge cases; mdwright currently treats this as a parser-backend conformance gap, not a formatter-local bug. |
| operational | known-pulldown-link-ref-tab-panic | upstream-panic | upstream-panic | document | pulldown-cmark issue 1095 is contained by mdwright-document::ParseError; product paths do not panic. |
The cmark-gfm render profile is an HTML spelling profile. It fixes quote escaping, link-destination escaping, ordinary
GFM table spelling, task-list checkbox spelling, and one newline-placement case where the parser already exposes enough
structure. It does not change emphasis resolution or source-position semantics. Full cmark-gfm parser equivalence would
require upstream pulldown changes, a maintained fork, or a backend switch.
Replacement Criteria
Do not replace pulldown-cmark based on event-shape differences alone. A replacement candidate must improve at least
one release-relevant axis without regressing the others:
- fewer unclassified or policy-relevant HTML mismatches against cmark-gfm;
- safer behaviour on malformed/user input;
- stable byte/source coordinates sufficient for formatter rewrite ownership;
- extension coverage at least as good as the current document facts;
- acceptable runtime and dependency footprint.
LaTeX boundary
mdwright needs MathJax-scale TeX math support, Unicode terminal layout, and bidirectional source translation. That language machinery is larger and more volatile than Markdown math-span recognition, so it belongs behind a separate component boundary.
The boundary
mdwright-latex hides the TeX body language: lexer, parser, command registry, Unicode layout, and source translation.
mdwright-math keeps Markdown delimiter and environment recognition and delegates the body string to mdwright-latex
when callers need rendering or translation.
mdwright-latex is not a facade: its public API stays narrower than its implementation. Callers receive
parsed/rendered/translated results and typed errors, not lexer tokens, parser cursors, AST variants, or MathJax table
internals.
mdwright-latexowns TeX math-body lexing, parsing, command vocabulary, Unicode layout, and source translation.mdwright-mathowns Markdown math-span recognition, delimiter policy, and extraction of math body strings.mdwright-lintconsumes vocabulary through narrow lookup APIs.crates/mdwrightowns CLI commands such aspreviewand the math translation surface.- Unsupported TeX is a typed error or visible fallback, never a panic.
Dependency comparison
MathJax is the coverage target because it documents both TeX input behavior and the supported macro table; it is not treated as a TeX-engine equivalence claim. The comparison axes are licence, signal, API fit at the mdwright boundary, and outcome.
| Crate | Version | License | Signal | API fit | Decision |
|---|---|---|---|---|---|
logos | 0.16.1 | MIT OR Apache-2.0 | Mature lexer crate; high crates.io usage; active docs and repository. | Good fit for byte-span tokenisation when the lexer stays policy-free and parser recovery remains separate. | Accept for the lexer spike and later lexer work. |
pulldown-latex | 0.7.1 | MIT | Reachable repository and docs; moderate use. | Pull parser for LaTeX-to-MathML. It does not expose the TeX AST/control needed for Unicode layout and bidirectional source translation. | Reject as a core dependency; keep as a reference. |
tex2math | 1.2.1 | LGPL-3.0-only | Recent crate, but very low crates.io adoption. | LaTeX-to-MathML conversion and CLI/wasm features. License and output-center do not match mdwright's component boundary. | Reject. |
latex2mathml | 0.2.3 | MIT | Older release; moderate total downloads; reachable repository. | Converts equations to MathML. It does not hide the source-translation or Unicode-layout decisions mdwright needs. | Reject as a core dependency; keep as a reference if fixtures are useful. |
math-core | 0.6.1 | MIT | Recent crate with low adoption; Rust 1.91. | Converts LaTeX equations to MathML Core. The crate center is MathML Core, not Unicode layout or source translation. | Reject as a core dependency; revisit only for conformance fixture ideas. |
mathml-latex | 0.0.3 | MPL-2.0 | Early version, low recent usage, reachable repository. | Converts between MathML and LaTeX, but would put MathML at mdwright's internal boundary. | Reject. |
Low-adoption terminal math rendering crates such as term-maths and tui-math remain rejected. Terminal delivery code
belongs in crates/mdwright; TeX body structure belongs in mdwright-latex.
Rejected boundary shapes
- Keep TeX bodies in
mdwright-math. Braids Markdown span recognition (CommonMark + GFM + math-resilience rules) with TeX body support (MathJax input vocabulary, Unicode coverage, layout, translation). The two change for different reasons. - Wrap an existing LaTeX-to-MathML crate. The current Rust crates target MathML output. Wrapping one would either leak MathML as an unwanted intermediate interface or force mdwright to reconstruct TeX structure from MathML.
CLI reference
Auto-generated from clap's --help output by cargo xtask doc-cli. Edit the CLI definition in
crates/mdwright/src/cli.rs (or the rule registry for list-rules); never edit this file by hand.
mdwright
Lints Markdown for stylistic and structural issues, with a public rule trait so projects can extend the standard library, plus a verified round-trip formatter.
Usage: mdwright [OPTIONS] <COMMAND>
Commands:
check Lint Markdown files and report diagnostics
fix Lint and apply safe autofixes in place
fmt Reformat Markdown files
fmt-check Verify formatting without writing
list-rules Print the rule catalogue
explain Print the long-form explanation of one lint rule
render Format the input and emit the rendered HTML to stdout
preview Format the input and render a static terminal Markdown preview
math Translate math source between LaTeX commands and Unicode
config Create mdwright configuration files
lsp Run as a Language Server Protocol server over stdio
help Print this message or the help of the given subcommand(s)
Options:
--config <CONFIG>
Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
-v, --verbose...
Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES>
Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely
[default: 10000000]
--reject-control-chars
Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
mdwright check
Lint Markdown files and report diagnostics
Usage: mdwright check [OPTIONS] [PATHS]...
Arguments:
[PATHS]...
Files and directories to scan. Directories are searched recursively. If omitted, `.` is scanned. A literal `-` reads stdin as `<stdin>`
Options:
--check
Exit with status 1 if any non-advisory diagnostic is found
--config <CONFIG>
Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
--rules <RULES>
Rule-selection spec. If omitted, `[lint] preset`, `select`, `extend-select`, and `ignore` from the config file apply. See module docs for syntax
-v, --verbose...
Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--format <FORMAT>
Output format
Possible values:
- pretty: Human-readable, optionally coloured
- compact: `file:line:col: rule: message` per line
- json: JSON Lines, v2 schema. See `docs/src/reference/diagnostic-schema.md`
- json-v1: JSON Lines, v1 schema. Deprecated; emits a deprecation warning on stderr. Will be removed in a future release
[default: pretty]
--max-input-bytes <BYTES>
Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely
[default: 10000000]
--color <COLOR>
When to colour pretty output. `auto` (default) colours when stdout is a TTY; `always` forces colour; `never` disables it. Compact and JSON output are never coloured regardless
[default: auto]
[possible values: auto, always, never]
--reject-control-chars
Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-j, --jobs <JOBS>
Worker threads; 0 = rayon default (one per logical CPU)
[default: 0]
--no-suppress
Ignore `<!-- mdwright: allow ... -->` suppression comments. Use to audit which diagnostics are silenced and where
-h, --help
Print help (see a summary with '-h')
mdwright fix
Lint and apply safe autofixes in place
Usage: mdwright fix [OPTIONS] [PATHS]...
Arguments:
[PATHS]...
Files and directories to scan. Directories are searched recursively. If omitted, `.` is scanned. A literal `-` reads stdin as `<stdin>`
Options:
--check
Exit with status 1 if any non-advisory diagnostic is found
--config <CONFIG>
Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
--rules <RULES>
Rule-selection spec. If omitted, `[lint] preset`, `select`, `extend-select`, and `ignore` from the config file apply. See module docs for syntax
-v, --verbose...
Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--format <FORMAT>
Output format
Possible values:
- pretty: Human-readable, optionally coloured
- compact: `file:line:col: rule: message` per line
- json: JSON Lines, v2 schema. See `docs/src/reference/diagnostic-schema.md`
- json-v1: JSON Lines, v1 schema. Deprecated; emits a deprecation warning on stderr. Will be removed in a future release
[default: pretty]
--max-input-bytes <BYTES>
Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely
[default: 10000000]
--color <COLOR>
When to colour pretty output. `auto` (default) colours when stdout is a TTY; `always` forces colour; `never` disables it. Compact and JSON output are never coloured regardless
[default: auto]
[possible values: auto, always, never]
--reject-control-chars
Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-j, --jobs <JOBS>
Worker threads; 0 = rayon default (one per logical CPU)
[default: 0]
--no-suppress
Ignore `<!-- mdwright: allow ... -->` suppression comments. Use to audit which diagnostics are silenced and where
-h, --help
Print help (see a summary with '-h')
mdwright fmt
Reformat Markdown files
Usage: mdwright fmt [OPTIONS] [PATHS]...
Arguments:
[PATHS]...
Files and directories to reformat. If omitted, `.` is used. A literal `-` reads from stdin and writes to stdout
Options:
--check
Exit 1 if any file would change; never write. Same shape as `prettier --check` / `rustfmt --check`
--config <CONFIG>
Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
--diff
Write a unified diff to stdout instead of editing files. Mutually exclusive with `--check`
-v, --verbose...
Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES>
Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely
[default: 10000000]
--stdin-filename <STDIN_FILENAME>
File name to report when reading from stdin. Defaults to `<stdin>`. Useful when integrating with editors that pipe the buffer through
--no-validate
Skip the HTML-equivalence safety check that runs by default. The check parses both source and formatted output to HTML and refuses to write when they differ. Use this only if you have independent verification that the formatter is safe for the input, for example, a CI pipeline that already runs the check elsewhere
--reject-control-chars
Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
--explain-divergence
When the HTML-equivalence gate rejects a file, print a unified diff of the source's HTML against the formatted output's HTML to stderr. Diagnostic surface for triaging gate failures; does not change the gate's pass/fail decision
--explain-format
Explain formatter decisions on stderr. Does not change write, check, diff, or validation behavior
--range <LINE:COL-LINE:COL>
Format only the smallest set of whole top-level blocks covering `LINE:COL-LINE:COL` (both ends inclusive of start, exclusive of end; 0-based LSP convention). Reads from stdin only; writes the covering blocks to stdout. Mutually exclusive with `--check` and `--diff`.
Example: `--range 2:0-2:5` formats the block containing columns 0..5 of line 2.
--math-render <MATH_RENDER>
Delimiter rewrite policy for math regions at emit time. `none` (default) passes math through verbatim: today's behaviour. `commonmark-katex` is the same emission as `none` but greppable as an intent signal in build logs. `dollar` rewrites `\[…\]` to `$$ … $$` and `\(…\)` to `$ … $` for downstream renderers that prefer dollar delimiters; LaTeX environments are not rewritten. Overrides `[fmt.math] render` in the config file
[possible values: none, commonmark-katex, dollar]
-h, --help
Print help (see a summary with '-h')
mdwright fmt-check
Verify formatting without writing
Usage: mdwright fmt-check [OPTIONS] [PATHS]...
Arguments:
[PATHS]... Files and directories to check. If omitted, `.` is used. A literal `-` reads stdin and checks whether it would change
Options:
--config <CONFIG>
Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
--diff
Write a unified diff to stdout for files that would change
--stdin-filename <STDIN_FILENAME>
File name to report when reading from stdin. Defaults to `<stdin>`. Useful when integrating with editors that pipe the buffer through
-v, --verbose...
Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES>
Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
--no-validate
Skip the HTML-equivalence safety check that runs by default. The check parses both source and formatted output to HTML and refuses to write when they differ. Use this only if you have independent verification that the formatter is safe for the input, for example, a CI pipeline that already runs the check elsewhere
--explain-divergence
When the HTML-equivalence gate rejects a file, print a unified diff of the source's HTML against the formatted output's HTML to stderr. Diagnostic surface for triaging gate failures; does not change the gate's pass/fail decision
--reject-control-chars
Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
--explain-format
Explain formatter decisions on stderr. Does not change check, diff, or validation behavior
--math-render <MATH_RENDER>
Delimiter rewrite policy for math regions at emit time. Overrides `[fmt.math] render` in the config file [possible values: none, commonmark-katex, dollar]
-h, --help
Print help
mdwright render
Format the input and emit the rendered HTML to stdout.
Pipes the formatted output through the same HTML renderer the `format_validated` gate uses. Captured stdout is raw HTML by default; terminals may request ANSI-highlighted HTML with `--color`, and `--open` writes the HTML to a temporary file before opening it in the system browser.
Usage: mdwright render [OPTIONS] [PATHS]...
Arguments:
[PATHS]...
File to render. A literal `-` (or an empty list) reads from stdin. Multiple paths are concatenated in argument order with a single newline between, then rendered as one document
Options:
--config <CONFIG>
Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
--stdin-filename <STDIN_FILENAME>
File name to report when reading from stdin. Defaults to `<stdin>`. Cosmetic; surfaced in error messages only
--math-render <MATH_RENDER>
Delimiter rewrite policy for math regions. See the corresponding flag on `mdwright fmt` for the modes
[possible values: none, commonmark-katex, dollar]
-v, --verbose...
Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES>
Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely
[default: 10000000]
--render-profile <RENDER_PROFILE>
HTML spelling profile. `pulldown` preserves the default renderer; `cmark-gfm` matches cmark-gfm spelling for renderer differences that do not require changing parser semantics. Overrides `[render] profile` in the config file
[possible values: pulldown, cmark-gfm]
--color <COLOR>
When to colour HTML output. Captured stdout remains raw HTML under `auto`; `always` forces ANSI syntax highlighting and `never` disables it
[default: auto]
[possible values: auto, always, never]
--reject-control-chars
Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
--open
Write rendered HTML to a temporary `.html` file and open it in the system browser. Stdout is left empty; stderr reports the file path
-h, --help
Print help (see a summary with '-h')
mdwright preview
Format the input and render a static terminal Markdown preview
Usage: mdwright preview [OPTIONS] [PATHS]...
Arguments:
[PATHS]...
Files to preview. A literal `-` (or an empty list) reads from stdin. Multiple paths are concatenated in argument order with a single newline between, then previewed as one document
Options:
--config <CONFIG>
Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
--stdin-filename <STDIN_FILENAME>
File name to report when reading from stdin. Defaults to `<stdin>`. Cosmetic; surfaced in error messages only
--color <COLOR>
When to colour terminal output. `auto` colours when stdout is a TTY; `always` forces colour; `never` disables it
[default: auto]
[possible values: auto, always, never]
-v, --verbose...
Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--math <MATH>
How terminal preview handles math regions
Possible values:
- unicode: Render the conservative supported LaTeX subset as Unicode, falling back to source when unsupported
- source: Preserve math source bytes
- off: Disable special terminal math rendering
[default: unicode]
--max-input-bytes <BYTES>
Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely
[default: 10000000]
--reject-control-chars
Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-h, --help
Print help (see a summary with '-h')
mdwright math
Translate math source between LaTeX commands and Unicode
Usage: mdwright math [OPTIONS] [PATHS]...
Arguments:
[PATHS]... Markdown files/directories to translate. Directories are searched recursively. If omitted, stdin is translated. A literal `-` reads stdin
Options:
--config <CONFIG>
Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
--to-unicode
Translate LaTeX math source to editable Unicode math source
--to-latex
Translate Unicode math source to preferred LaTeX math source
-v, --verbose...
Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--check
Exit 1 if any file or stdin payload would change; never write
--max-input-bytes <BYTES>
Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
--diff
Write a unified diff to stdout; never write files
--reject-control-chars
Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
--write
Rewrite Markdown files in place. This is required for file mutation; stdin always writes translated text to stdout
--stdin-filename <STDIN_FILENAME>
File name to report when reading from stdin. Defaults to `<stdin>`. Useful when integrating with editors that pipe the buffer through
-h, --help
Print help
mdwright config
Create mdwright configuration files
Usage: mdwright config [OPTIONS] <COMMAND>
Commands:
init Write a documented `.mdwright.toml` with every option set to its default
help Print this message or the help of the given subcommand(s)
Options:
--config <CONFIG> Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
-v, --verbose... Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES> Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
--reject-control-chars Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-h, --help Print help
mdwright config init
Write a documented `.mdwright.toml` with every option set to its default
Usage: mdwright config init [OPTIONS]
Options:
--config <CONFIG> Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
--path <PATH> Path to write. Defaults to `.mdwright.toml` in the current directory [default: .mdwright.toml]
--force Overwrite an existing file
-v, --verbose... Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES> Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
--reject-control-chars Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-h, --help Print help
mdwright list-rules
Print the rule catalogue
Usage: mdwright list-rules [OPTIONS]
Options:
--config <CONFIG> Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
-v, --verbose... Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES> Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
--reject-control-chars Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-h, --help Print help
mdwright explain
Print the long-form explanation of one lint rule
Usage: mdwright explain [OPTIONS] <RULE>
Arguments:
<RULE> Kebab-case rule name (e.g. `bare-url`, `math/unbalanced-delim`)
Options:
--config <CONFIG> Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
-v, --verbose... Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES> Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
--reject-control-chars Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-h, --help Print help
mdwright lsp
Run as a Language Server Protocol server over stdio
Usage: mdwright lsp [OPTIONS]
Options:
--config <CONFIG> Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
-v, --verbose... Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
--max-input-bytes <BYTES> Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
--reject-control-chars Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
-h, --help Print help
Diagnostic schema
mdwright check --format=json emits one JSON object per line (JSON Lines), one object per
diagnostic. The current schema is version 2, defined formally by
diagnostic-schema.json (JSON Schema draft 2020-12).
The v1 schema remains available under --format=json-v1 for one release cycle and emits a
deprecation warning to stderr.
Example record (pretty-printed)
{
"schema_version": 2,
"path": "docs/note.md",
"severity": "error",
"rule": {
"name": "math/unbalanced-delim",
"description": "TeX-style math open delimiter (`\\[`, `\\(`, `$$`, `$`) with no matching close.",
"url": "docs/rules/math/unbalanced-delim.md"
},
"source": {
"line": 42,
"column": 10,
"span_start": 1037,
"span_end": 1039,
"snippet": "text with \\[ unmatched math"
},
"message": "no matching `\\]` before end of document",
"fix": null
}
In the wire format each record is a single line terminated by \n; the pretty-printed form
above is for human reading only.
Field reference
Top-level
| Field | Type | Required | Notes |
|---|---|---|---|
schema_version | integer (2) | yes | Bumped on incompatible schema changes. |
path | string | yes | File path as given on the CLI; <stdin> for piped input. |
severity | enum | yes | error, warning, or advisory. warning is reserved for future use. |
rule | object | yes | See below. |
source | object | yes | See below. |
message | string | yes | Single-sentence description of the problem. |
fix | object | omitted | no | Present when a replacement is suggested. |
rule
| Field | Type | Notes |
|---|---|---|
name | string | Kebab-case identifier (e.g. bare-url, math/unbalanced-delim). |
description | string | One-line summary; same text as mdwright list-rules. |
url | string | Repository-relative path into docs/rules/. Will become an absolute URL once the mdBook site is published. |
source
| Field | Type | Notes |
|---|---|---|
line | integer (≥ 1) | 1-indexed line of the diagnostic's first byte. |
column | integer (≥ 1) | 1-indexed codepoint column. |
span_start | integer (≥ 0) | Byte offset of the first byte of the offending region. |
span_end | integer (≥ 0) | Byte offset one past the last byte. |
snippet | string | The source line, with trailing newline stripped. Multi-line spans are clipped to the first line; the caret region still starts at column. |
fix
| Field | Type | Notes |
|---|---|---|
replacement | string | Text to substitute for [span_start, span_end). |
safe | boolean | mdwright fix only applies fixes with "safe": true. |
Lifecycle
- v2 is the current default. v1 remains available under
--format=json-v1for one release cycle and is then removed. - New schema versions bump
schema_versionand ship alongside the previous version for at least one cycle.
Validation
The schema is JSON Schema draft 2020-12 ($schema field in
diagnostic-schema.json). Any draft 2020-12-compatible validator
(jsonschema Python package, ajv for JavaScript, etc.) can validate output records against
it.
Performance
mdwright is parallel by default (rayon over the file walk) and free of per-file
interpreter startup. On a multi-thousand-file corpus the practical speedup over
mdformat --check is the dominant cost difference; on small inputs both tools are
sub-second and the comparison is dominated by process startup.
Measurement
| Tool | Wall time | Notes |
|---|---|---|
mdwright fmt-check | 108 ms ± 4 ms | Release build, rayon over 79 files. |
mdformat --check | 5.91 s ± 0.59 s | Default install, single-threaded. |
Reproducer:
hyperfine --warmup 2 --runs 7 -N -i \
'./target/release/mdwright fmt-check <corpus>' \
'mdformat --check <corpus>'
- Corpus: 79 Markdown files, ~34.5k lines of math-heavy technical prose
(a checkout of
gentle-sga). - Host: Apple M4 Pro, macOS 26.4.1.
- Versions:
mdwrightfrom this workspace, release profile;mdformat 0.7(default plugins). - Result: 55× ± 6×. The lede claim of "≥ 50× faster" is the floor of this measurement.
What changes the multiplier
- File count. mdwright's startup is fixed; mdformat re-pays interpreter cost per file when invoked per-file. On directory invocations both tools amortise startup over the walk, but mdformat still single-threads the loop.
- Core count. mdwright scales with rayon's thread pool. On a single-core machine the multiplier drops; on a 16-core CI runner with a large corpus it climbs.
- File size. Per-byte parse cost is closer than the wall-time ratio suggests; on a single very large file, the ratio approaches the per-byte ratio rather than the per-file ratio.
Reproducing locally
The bench harness used in development is Criterion, not hyperfine. See
crates/mdwright/benches/README.md for cargo bench recipes and corpus configuration.
The hyperfine command above is the end-to-end smoke test; the Criterion benches isolate
parse, lint, and format costs separately.
Public API Surface
mdwright is a virtual workspace, not a facade crate. Command users install the mdwright package. Rust library users
depend on the component crate that owns the capability they need.
The API is still pre-1.0. Import paths and operation shapes may change in minor releases under the pre-1.0 caveats.
Use mdwright as a library
A minimal embed that parses Markdown, runs the standard lint catalogue, and formats with
defaults. Add the three crates to Cargo.toml:
[dependencies]
mdwright-document = "0.1"
mdwright-format = "0.1"
mdwright-lint = "0.1"
Then:
use mdwright_document::Document; use mdwright_format::{FmtOptions, format_validated}; use mdwright_lint::{LintOptions, RuleSet}; fn main() -> anyhow::Result<()> { let source = "# Hello\n\nSee https://example.com for the spec.\n"; // Parse once. `Document` holds source coordinates and recognised facts. let doc = Document::parse(source)?; // Lint with the shipped default rule set. let rules = RuleSet::stdlib_defaults(); for diag in rules.check_with(&doc, LintOptions::default()) { println!("{}: {}", diag.rule, diag.message); } // Format. Returns a verified rewrite or a `FormatError` on safety-gate refusal. let formatted = format_validated(&doc, &FmtOptions::default())?; print!("{formatted}"); Ok(()) }
The table below maps every capability to its owning crate. For the surface a particular crate exposes, follow its docs.rs link from the project README.
Common User Surfaces
| Capability | Public surface | Owning crate |
|---|---|---|
| Parse Markdown into stable facts | Document, ParseError, ParseOptions | mdwright-document |
| Configure Markdown recognition | ExtensionOptions, GfmOptions, GfmAutolinkPolicy, MystOptions, PandocOptions | mdwright-document |
| Render Markdown to HTML | RenderOptions, RenderProfile, render_html, render_html_with_options, render_html_with_render_options | mdwright-document |
| Format parsed or source Markdown | FmtOptions, WrapStrategy, FormatError, format_document, format_document_with_report, format_source, format_validated, format_validated_with_report | mdwright-format |
| Format editor ranges | CheckpointTable, format_range, format_range_with_checkpoints | mdwright-format |
| Compare formatter semantics | semantically_equivalent, first_divergence | mdwright-format |
| Represent TeX and Unicode math-body diagnostics, vocabulary, Unicode layout, source translation, and output | LatexError, LatexErrorKind, SourceSpan, CommandInfo, CommandCategory, ArgumentShape, SupportStatus, lookup_command, latex_symbol, unicode_symbol_latex, unicode_super, unicode_sub, RenderedLatex, render_unicode_math, Translation, TranslationStatus, TranslationLoss, translate_latex_to_unicode, translate_unicode_to_latex, translate_latex_ranges_to_unicode, translate_unicode_ranges_to_latex | mdwright-latex |
| Recognise Markdown math regions | scan_math_regions, render::convert_for_dollar, MathBody::source_range | mdwright-math |
| Run lint rules | RuleSet, LintOptions | mdwright-lint |
| Consume lint output | Diagnostic, Fix, Severity, Snippet, DuplicateRuleName | mdwright-lint |
| Apply safe lint fixes | apply_safe_fixes | mdwright-lint |
| Resolve configuration | Config, ConfigError | mdwright-config |
| Start the editor server | serve | mdwright-lsp |
| Build custom command binaries | run_with_rules, discover_markdown | mdwright |
Document is parse/query only. Linting, formatting, safe-fix application, config discovery, command delivery, and
editor delivery stay in their owning crates.
The mdwright-latex surface targets common MathJax-style math bodies where Unicode can represent the source or
terminal output honestly. Unicode-to-LaTeX translation is parser-backed for the supported subset: the crate lexes and
parses Unicode mathematical source before emitting canonical LaTeX. It is not a TeX engine API, a browser layout API, or
a diagram recogniser. Macro expansion, unsupported package commands, layout-heavy source, and unknown Unicode return
typed errors, losses, or visible fallback output rather than hidden approximations.
Extension Surfaces
| Surface | Use |
|---|---|
LintRule | Implement a downstream lint rule over &Document. |
RuleSet::{new, add, remove, by_name, contains, iter, names, check, check_with} | Compose standard and downstream rules. |
mdwright_lint::stdlib::{defaults, all, by_name, names} | Select standard rules for custom binaries or tests. |
Diagnostic, Fix, Severity, Snippet | Report lint findings and optional safe fixes. |
rule_doc_url, docs_url, DOCS_URL_DEFAULT | Attach stable documentation links to diagnostics. |
InfoStringTypo::{new, with_extra} | Extend the standard info-string vocabulary without forking the rule. |
mdwright::run_with_rules | Reuse the command package with a custom RuleSet. |
mdwright::discover_markdown | Reuse command file-discovery policy in a custom command. |
The standard rule structs under mdwright_lint::stdlib are public so callers can build precise RuleSets. Helper
functions inside those rules are not public extension points unless listed above.
Advanced Document Facts
These facts are public because formatter, lint, audit, and custom-rule callers need stable source ranges without learning pulldown event shapes.
| Fact family | Public surface |
|---|---|
| Text and blocks | TextSlice, InlineCode, CodeBlock, HtmlBlock, InlineHtml, Heading |
| Lists and references | ListGroup, ListItem, LinkDef |
| Frontmatter | Frontmatter, FrontmatterDelimiter |
| Autolinks | AutolinkFact, AutolinkOrigin |
| Suppressions | Suppression, SuppressionKind, AllowScope |
| Positions | LineIndex, LineIndexError, BlockCheckpointFact |
| Math | MathRegion, MathSpan, MathError |
| Formatter facts | StructuralSpan, StructuralKind, InlineDelimiterSlot, InlineDelimiterKind, UnorderedListMarkerSite, OrderedListMarkerSite, HeadingAttrSite, InlineLinkDestinationSlot, ReferenceDefinitionSite, TableSite, TableRowSite, TableCellSite, TableAlign, WrappableParagraph, ParagraphHardBreak |
Formatter-facing facts expose accessors instead of public fields where practical. That keeps invalid construction out of downstream code while preserving stable ranges for rule authors and diagnostic tooling.
Not Public Surface
- Root facade exports. There is no root package and no
mdwright::{Document, FmtOptions, RuleSet}import path. - Parser internals, pulldown events, source/canonical byte-map internals, and the private document tree.
Source,CanonicalSource,OffsetMap,ByteSpan,OriginalSpan,NormalisedLabel, and heading trailer scanners.- Top-level block checkpoint parser helpers. Use
mdwright_format::CheckpointTable. - Formatter rewrite candidates, rewrite snapshots, verification signatures, owner IDs, and byte-application internals.
mdwright-latexlexer tokens, parser cursors, AST nodes, command-registry storage, and Unicode layout internals.- Lint suppression maps, diagnostic sorting internals, and stdlib helper functions not listed as extension surfaces.
- TOML raw schema structs and config discovery internals.
- CLI and LSP state machines beyond the documented entry points.
Crates.io release
The release workflow publishes the component crates to crates.io and then lets cargo-dist create the GitHub Release with
binary artifacts. The workflow runs when a v<semver> tag is pushed. A manual dry_run dispatch runs the same gates
but skips crates.io upload and GitHub Release creation.
One-time setup
Create a scoped crates.io token with publish-new, publish-update, and yank permissions. Add it to the GitHub
repository as the Actions secret CARGO_REGISTRY_TOKEN.
Local preflight
Run the gates before tagging:
cargo fmt --check
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace --no-fail-fast
cargo doc --workspace --no-deps
mdbook build docs/
cargo xtask doc-rules --check
cargo xtask doc-cli --check
cargo xtask doc-config --check
python3 scripts/check_package_docsrs.py --allow-dirty
actionlint .github/workflows/*.yml
Check public API drift:
for crate in mdwright-latex mdwright-math mdwright-document mdwright-format mdwright-lint mdwright-config mdwright-lsp mdwright; do
cargo public-api --simplified -p "$crate" > /tmp/"$crate"-public.txt
diff -u "docs/api-review/$crate-public.txt" /tmp/"$crate"-public.txt
done
If a public API change is intentional, regenerate the baselines in the same commit:
scripts/update-api-review.sh
Version and changelog
The tag must match the workspace package version exactly. For version 0.1.0, tag v0.1.0.
CHANGELOG.md must contain a release section named ## [0.1.0]. The release workflow extracts that section before any
crate is published.
Dry run
Before tagging, run the Release workflow manually with dry_run: true. It verifies the workspace, builds the
cargo-dist artifacts, checks package contents, simulates docs.rs from packaged tarballs, and skips live publishing.
Publish
After the release commit is on main, create and push the tag:
git tag -s v0.1.0 -m "mdwright v0.1.0"
git push origin v0.1.0
The workflow publishes crates in dependency order:
mdwright-latex -> mdwright-math -> mdwright-document -> mdwright-format -> mdwright-lint -> mdwright-config -> mdwright-lsp -> mdwright
It waits 90 seconds between crates so crates.io can index each newly published dependency before downstream crates are published.
If publishing fails after a crate has uploaded, stop. Crates.io versions are immutable. Fix the problem, bump the workspace version, update the changelog, and tag a new commit.
Release evidence
mdwright release candidates are judged by local evidence, not by a claim of full cmark-gfm parser equivalence. The release claim is:
mdwright is a round-trip-safe Markdown formatter and linter with classified GFM/parser divergences and an opt-in mdformat-compatible style profile.
The release bundle lives under target/mdwright/release/. It is a local artifact; do not commit it.
Aggregate the evidence
Run:
cargo xtask release-evidence --output target/mdwright/release
The command writes:
target/mdwright/release/release-evidence.jsontarget/mdwright/release/release-evidence.md
The command does not rerun every expensive gate. It records git state and tool versions, reads existing machine reports, points at manual evidence notes, and lists blockers when evidence is missing. This keeps the command narrow: it summarizes release evidence instead of duplicating parser audit, mdformat parity, production soak, fuzzing, packaging, or benchmarks.
Refresh machine reports
Run these before aggregating a release candidate:
cargo check --workspace --all-targets
cargo nextest run --workspace --no-fail-fast
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --check
cargo doc --workspace --no-deps
mdbook build docs/
cargo xtask doc-rules --check
cargo xtask doc-cli --check
cargo xtask doc-config --check
python3 scripts/check_package_docsrs.py --allow-dirty
actionlint .github/workflows/*.yml
cargo xtask parser-audit --case-set all --ensure-tools --include-comrak
cargo xtask mdformat-parity \
--corpus-root docs \
--corpus-name mdwright-docs \
--mdwright-config .mdwright.toml \
--mdformat-config xtask/fixtures/mdformat-parity/mdformat.toml
cargo xtask production-soak \
--corpus-root <external-corpus-path> \
--output target/mdwright/production-soak
<external-corpus-path> is a directory of Markdown files used as the production-soak input; set it to the path of the
external corpus you run releases against (or to MDWRIGHT_CORPUS_ROOT if you have one configured). Record the path
in the release notes.
Record the fast-check result in target/mdwright/release/fast-checks.md. The aggregator treats that file as the manual
proof that the local workspace gate was refreshed.
Refresh packaging evidence
The detailed crates.io release checklist lives at Crates.io release. Before tagging, run the
release workflow manually with dry_run: true.
Package every publishable crate:
cargo package --workspace --exclude xtask --exclude mdwright-extra-example --allow-dirty --no-verify
Install the command package into an isolated root:
tmp="$(mktemp -d)"
CARGO_HOME="$tmp/cargo-home" \
CARGO_TARGET_DIR="$tmp/target" \
cargo install --path crates/mdwright --locked --root "$tmp/install"
"$tmp/install/bin/mdwright" --help
"$tmp/install/bin/mdwright" explain bare-url
"$tmp/install/bin/mdwright" check docs/src/introduction.md
"$tmp/install/bin/mdwright" fmt-check docs/src/introduction.md
"$tmp/install/bin/mdwright" render docs/src/introduction.md >/tmp/mdwright-render.html
"$tmp/install/bin/mdwright" lsp --help
Record the dry-run result in target/mdwright/package-dry-run/report.json. The report can be generated by hand; the
aggregator only requires stable JSON with enough fields for a human to inspect alongside the Markdown report.
Refresh fuzz and benchmark evidence
Replay the fuzz corpora:
cargo +nightly fuzz run fuzz_parse_format -- -runs=0
cargo +nightly fuzz run fuzz_idempotence -- -runs=0
cargo +nightly fuzz run fuzz_structured_idempotence -- -runs=0
cargo +nightly fuzz run fuzz_lint -- -runs=0
cargo +nightly fuzz run fuzz_verbatim_identity -- -runs=0
cargo +nightly fuzz run fuzz_latex_render -- -runs=0
cargo +nightly fuzz run fuzz_latex_translate -- -runs=0
cargo +nightly fuzz run fuzz_markdown_math_translate -- -runs=0
cargo +nightly fuzz run fuzz_unicode_latex_roundtrip -- -runs=0
Write the replay result to target/mdwright/release/fuzz-replay.md.
Run sustained fuzz rounds with the helper script. The standard release target is three clean 10-minute rounds for every fuzz target:
scripts/fuzz-round.sh 600 3
The script writes target/mdwright/release/fuzz-sustained.md and per-target logs under
target/mdwright/release/fuzz-sustained/logs/.
Run the Criterion comparison and write the result to target/mdwright/release/benchmarks.md:
cargo bench -p mdwright --bench format_bench --bench lint_bench -- --baseline pre-parser-boundary
Re-capture on the same hardware before declaring a regression.
Interpret the report
release-evidence.md is ready for review when:
- the worktree is clean;
- every required report is present;
- manual fuzz, benchmark, and fast-check notes are present;
- parser audit has no unclassified differences, mitigation rows, or uncontained panics;
- mdformat parity has no unclassified differences, semantic drift, parse errors, idempotence failures, or open bugs;
- production soak has no parse errors, validation errors, idempotence failures, or fmt-check disagreements;
- packaging and isolated install dry runs passed.
Accepted divergences are documented in:
Semver policy
mdwright follows Semantic Versioning. This page enumerates the public API surface that the version number commits to.
Covered
A change to any of the following is a breaking change and requires a major-version bump (or a minor bump while we are pre-1.0; see Pre-1.0 caveats below):
- Every
pubitem exported from the publishable component crates:mdwright-latex,mdwright-math,mdwright-document,mdwright-format,mdwright-lint,mdwright-config, andmdwright-lsp. - The command-package helpers exported from
mdwright:run_with_rulesanddiscover_markdown. - CLI subcommands, their flags, and their exit codes. The exit-code mapping appears in
reference/cli.md. - The configuration schema for
mdwright.toml,.mdwright.toml, andpyproject.toml [tool.mdwright]. The schema is generated intoconfiguration.mdfrom themdwright-configschema source. - The
--format=json(v2) diagnostic schema atreference/diagnostic-schema.mdand the JSON Schema atdocs/diagnostic-schema.json. New optional fields are non-breaking; renaming or removing a field is breaking. - The
mdwright_lint::LintRuletrait signature. Adding a method with a default body is non-breaking; adding a method without a default, or changing an existing signature, is breaking.
Not covered
The following are free to change in any release, including patch releases:
- Internal items (anything
pub(crate)or private). Refactors that move modules around are not breaking unless they change apubexport. - The on-disk layout of build artifacts (
target/), cached state, and intermediate files. - The prose output of
mdwright explain <rule>. The rule names and their existence are covered; the wording is not. - Performance characteristics. We aim not to regress and track this through Criterion benches, but we do not commit to a wall-clock floor.
- The contents of
docs/,CHANGELOG.md, and other repo metadata. - The format of
tracingoutput and log lines.
Pre-1.0 caveats
Until v1.0, minor versions may include breaking API changes. The 0.x sequence is deliberately permissive so the surface
can settle without dragging compatibility shims forward. The discipline still applies: every break appears in
CHANGELOG.md under Breaking changes in the
relevant version's section, with a migration note where the rewrite is non-obvious.
Patch releases (0.x.Y) never introduce breaking changes.
MSRV (minimum supported Rust version)
The MSRV is rust-version = "1.91", declared in Cargo.toml. Bumping the MSRV is treated as a minor-version bump
pre-1.0; post-1.0 it will be a major bump. CI runs the test suite on both stable and the MSRV floor on every push.
mdwright spec deviations
The mdwright formatter targets the GFM 0.29-gfm spec (crates/mdwright/tests/gfm-spec/spec.txt, vendored from cmark-gfm). Every example
is exercised by crates/mdwright/tests/gfm_spec.rs as a parse → format → parse → format round-trip and compared against the source
HTML and the normalised event stream.
This document is the user-facing index of where mdwright currently does not byte-for-byte round-trip the spec. It is split into two parts because the underlying mechanism does:
- Editorial deviations: choices we have made and intend to keep. Curated in
crates/mdwright/tests/gfm-spec/allowlist.toml. Each entry has a one-line rationale and a pointer to where the decision is documented. - Tracked regressions: known divergences that we intend to fix. Recorded in
crates/mdwright/tests/gfm-spec/snapshot.txt. The snapshot is asserted byte-for-byte, so any drift, whether regression or improvement, fails CI and forces a deliberate update.
The gfm_spec_coverage test prints the live count for both groups; the numbers below are a snapshot of the current main
branch.
Coverage
| Bucket | Examples |
|---|---|
| Spec examples total | 672 |
| Matching | 637 |
| Editorial deviations | 35 |
| Tracked regressions | 0 |
A case may fail more than one comparison kind (semantic, idempotence); the snapshot file is keyed by
(case, kind) and currently lists no tracked regressions.
Parser Backend Drift
The formatter round-trip gate is not the same as cmark-gfm renderer equivalence. cargo xtask parser-audit compares
mdwright's current pulldown-cmark backend with cmark-gfm and renders mdwright through the opt-in cmark-gfm render
profile. The current GFM-spec parser audit has 15 classified HTML differences, 0 source-position differences, and 0
unclassified differences.
The remaining differences are accepted constraints of the current backend:
| Class | Count | Status |
|---|---|---|
| Emphasis delimiter-stack resolution | 9 | accepted parser-backend drift |
| Raw HTML block indentation/newline spelling | 4 | accepted render drift with stable source facts |
| Task-list examples marked disabled by the spec | 2 | accepted spec-fixture drift |
| Contained upstream parser panic | 1 | converted to ParseError |
[render] profile = "cmark-gfm" changes only HTML spelling for mdwright render: quote escaping, link-destination
escaping, ordinary GFM table layout, task-list checkbox spelling, and one raw-HTML newline case where the parser already
exposes enough structure. It does not change emphasis resolution or parser tree semantics. Full cmark-gfm parser
equivalence would require upstream pulldown-cmark changes, a maintained fork, or a parser backend switch.
Editorial deviations
Pulldown text-chunking deviations
35 spec examples currently fail the AST-event comparison only; HTML matches byte-for-byte and round-trip is idempotent.
The mismatch reflects pulldown-cmark's text-run chunking: pulldown splits long runs of text into events at points
cmark-gfm does not, so the normalised Event::Text(…) stream differs even though every other event lines up and every
rendered HTML byte agrees.
The triage rule, applied at the snapshot level, is:
For each (case, kinds) in snapshot.txt:
if kinds == {"ast"} and case has no other entry:
-> allowlist.toml (bucket = "pulldown-text-chunking")
else:
-> stays in snapshot.txt (tracked regression)
Affected cases: 5, 6, 7 (Tabs, CM §2.2); 16, 19 (Thematic breaks, CM §4.1); 61 (Setext headings, CM §4.3); 102, 103 (Fenced code blocks, CM §4.5); 214, 230 (Block quotes, CM §5.1); 232, 242, 248, 249, 251, 252, 256, 264, 265, 266, 268 (List items, CM §5.2); 320 (Backslash escapes, CM §2.4); 321, 324, 330, 333 (Entity refs, CM §2.5); 393, 411 (Emphasis, CM §6.2); 499, 500, 503, 520, 528, 536 (Links, CM §6.3); 640 (Raw HTML, CM §6.8).
The bucket name is load-bearing: if a future per-case investigation disproves the chunking explanation for one of the
cases above, remove its entry from allowlist.toml and let it re-enter the snapshot as a tracked regression.
Tracked regressions
There are currently no tracked GFM-spec formatter regressions. Any future non-allowlisted failure appears in
crates/mdwright/tests/gfm-spec/snapshot.txt and fails the snapshot test until it is fixed or deliberately classified.
mdformat-mkdocs parity deviations
mdwright matches mdformat-mkdocs byte-for-byte for the four Markdown extensions covered in
Markdown extensions. The parity test at crates/mdwright/tests/extension_parity.rs enforces this against five
committed reference fixtures. Known divergences below; each row exists because the upstream pulldown-cmark parser
doesn't surface enough information for mdwright to round-trip the source faithfully.
| Construct | Source pattern that diverges | Why |
|---|---|---|
| Heading attribute, quoted value | # H {title="hello world"} | pulldown-cmark 0.13's heading-attribute parser splits the trailer on whitespace and ignores "…" quoting. Pulldown surfaces two attrs (title="hello, world") instead of one. mdformat-mkdocs (python-markdown's attr_list) handles the quoted form correctly. Tracked upstream; will resolve when pulldown lands the fix. |
The parity test refuses to silently accept new divergences: any byte-for-byte mismatch fails the test and forces a deliberate add to this table (with a rationale and an upstream pointer) or a fix in mdwright's emit path.
MyST + Pandoc directive parity
mdwright preserves MyST directive containers, Pandoc fenced divs, inline roles, MyST substitutions, Pandoc inline
attribute spans, and MyST % line comments byte-verbatim. See MyST + Pandoc directives for
the full scope. The bar is idempotence-on-mode, not byte-equal round-trip with mdformat-mkdocs: mdformat-mkdocs does
not implement these constructs at all, so there is no upstream reference to diff against. The vendored jupyter-book demo
at crates/mdwright/tests/external/jupyter_book_minimal/ plus the per-construct regressions at
crates/mdwright/tests/regressions/{directive_*,inline_role_*,myst_*}.in are the safety net.
| Construct | Source pattern that diverges | Why |
|---|---|---|
Malformed :::{name} source | Bare :::{warning} Experimental with no closer | Pulldown parses the opener as part of a definition-list or paragraph; mdwright's directive overlay matches on byte-range overlap and emits the union of the tree-node range and the directive region, so the bytes survive, but the surrounding misclassified bytes flow through pulldown's normal path. Fix the source by closing the directive. |
How to read the live numbers
cargo test --release --test gfm_spec gfm_spec_coverage -- --nocapture
prints, at the top of its output:
gfm spec coverage:
total cases: <n>
fully matching: <n>
intentional dev: <n>
tracked regression: <n>
unexpected: <n>
These are the source of truth; the table above is a snapshot for the release notes.
Updating the snapshot
After a deliberate fix (or an accepted editorial deviation):
# A fix that removes (case, kind) entries from snapshot.txt:
MDWRIGHT_UPDATE_SNAPSHOT=1 cargo test --release --test gfm_spec gfm_spec_snapshot
# An editorial deviation: add a row to crates/mdwright/tests/gfm-spec/allowlist.toml
# *before* regenerating the snapshot, then run the same command.
The snapshot test fails on any drift; CI will not silently accept a regression that happens to look like an improvement, and an improvement that isn't reflected in the snapshot fails just as loudly.