mdwright

mdwright is a Markdown linter and round-trip formatter for any Markdown project.

Four commitments shape the tool.

Fast. On a 79-file corpus of math-heavy technical prose, mdwright fmt-check runs ≥ 50× faster than mdformat --check. The multiplier scales with file count and core count; see Performance for the measurement, host, and reproducer. Design choices that buy this are in Architecture.

Round-trip safe. mdwright fmt renders to the same HTML before and after; every change in the rendered DOM is treated as a bug. Whitespace inside a paragraph may shift (a b becomes a b), but word boundaries and the rendered tree do not. Where the formatter cannot prove equivalence it refuses to rewrite; the deviation table lists every exception with a reproducer.

Configurable, preserve by default. Source style choices — emphasis delimiters, list markers, thematic breaks, link-destination angle brackets — pass through untouched. Canonicalisation is opt-in one knob at a time in .mdwright.toml, or via fmt.profile = "mdformat" for mdformat-compatible spelling where verified rewrites preserve the parsed document.

Math-resilient. \( … \), \[ … \], and \begin{NAME} … \end{NAME} pass through verbatim. The scanner identifies math regions before any other pass touches the document, so the formatter never reflows \frac{a}{b} into \\frac{a}{b} and the linter never flags a backslash inside \begin{align*}. See Math regions for the design.

Who this site is for

  • Users writing Markdown with math, code, or strict formatting requirements: start with Getting started.
  • CI operators wiring mdwright into pre-commit, GitHub Actions, or other automation: Integration.
  • Rule authors extending mdwright with project-specific lints: Extending → Lint rules.

The narrative pages (concepts, extending) explain the why; the reference pages (rules, CLI, public API, diagnostic schema) are the source-of-truth what.

Stability

mdwright is pre-1.0. The release surface, including public Rust API, CLI, configuration schema, diagnostic JSON, and lint-rule trait, is documented descriptively at Public API; minor versions may include breaking changes until 1.0, patch releases never do. See Semver policy.

Installation

mdwright has no runtime dependencies: it ships as a single binary. Pick whichever channel matches your environment.

No Rust toolchain required. The cargo-dist shell installer pulls the prebuilt binary for your platform from the latest release and places it on your $PATH:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/jcreinhold/mdwright/releases/latest/download/mdwright-installer.sh | sh

Supported targets: Linux x86_64, macOS aarch64 (see Platform support below).

From crates.io

cargo install mdwright

Requires Rust 1.91 or later (the MSRV is enforced in CI). The install drops a single binary, mdwright, on your $PATH. Rust integrations depend on the component crates directly; see Public API surface for the roster and what each owns.

Via cargo-binstall

cargo-binstall pulls the GitHub-release tarball for your target and falls back to a source build if no prebuilt binary is available:

cargo binstall mdwright

Release tarball

Download a .tar.xz directly from the GitHub releases page and place the mdwright binary on your $PATH. Useful for air-gapped environments or when you want to pin a specific build artifact.

Building from a clone

git clone https://github.com/jcreinhold/mdwright
cd mdwright
cargo build --release -p mdwright
./target/release/mdwright --help

cargo nextest run exercises the full test suite (golden snapshots, GFM spec runner, property tests). cargo bench runs the Criterion benches; cargo xtask doc-rules --check and cargo xtask doc-cli --check verify that the auto-generated documentation pages are up to date.

Platform support

TierTargetsCoverage
1x86_64-unknown-linux-gnu, aarch64-apple-darwinCI on every push; prebuilt binary attached to each release
2x86_64-pc-windows-msvc, x86_64-apple-darwin, aarch64-unknown-linux-gnuCI on every push; source build via cargo install

Other targets work in principle but are not tested.

Getting started

This walkthrough takes ten minutes. By the end you will have linted a Markdown file, fixed a diagnostic, reformatted the file, and configured one rule.

Set up

Create a directory with one Markdown file:

mkdir mdwright-demo && cd mdwright-demo

Save the following as README.md:

# Demo

See https://example.com for the spec.

The Euler identity, $e^{i\pi} + 1 = 0$, is famous.

Here is some code:

Lint

mdwright check README.md

You see two diagnostics:

error[bare-url]: bare URL should be wrapped in angle brackets or rendered as a link
  --> README.md:3:5
   |
 3 | See https://example.com for the spec.
   |     ^^^^^^^^^^^^^^^^^^^
   = help: CommonMark autolinks need angle brackets (`<https://example.com>`) to render as a link.
   = fix (safe): <https://example.com>
   = note: see `mdwright explain bare-url`

error[unbalanced-backtick]: unterminated fenced code block
  --> README.md:9:1
...

Every rule has a long-form explanation reachable from the command line:

mdwright explain bare-url

The bottom line is the documentation URL. Open it for the same content rendered with examples.

Fix the easy one

bare-url carries a safe fix. Apply it:

mdwright fix README.md

Re-run mdwright check; the bare-URL diagnostic is gone. The unbalanced-backtick diagnostic remains because closing a fence cannot be inferred safely.

Fix the hard one by hand

Add the closing fence to README.md:

Here is some code:

```sh
echo hello
```

Re-run mdwright check. Output is empty: the file is clean.

Reformat

mdwright fmt README.md

fmt rewrites the file in place. Run git diff (in a real project) to see what changed. The defaults preserve source style, including emphasis delimiters, list markers, thematic breaks, and line wrap, so the diff is usually small. Display math, inline math, and fenced code blocks pass through verbatim. Opt in to canonicalisation per knob in .mdwright.toml; see Formatter policy and Style knobs.

Configure one rule

mdwright reads configuration from the nearest .mdwright.toml, mdwright.toml, or pyproject.toml with a [tool.mdwright] table, walking up from $PWD until it hits a .git/ directory. Create .mdwright.toml:

[lint]
# `default` enables the curated baseline; `ignore` removes rules.
preset = "default"
ignore = ["bare-url"]

Now mdwright check does not flag bare URLs. See Configuration for the complete schema.

Where to go next

Configuration

mdwright reads configuration from (in precedence order):

  1. The file given via --config PATH.
  2. The nearest ancestor config discovered by walking upward from the current directory. At each ancestor, candidates are tried in this order: .mdwright.toml, mdwright.toml, pyproject.toml containing a [tool.mdwright] table. The walk stops at the filesystem root or at the first directory containing .git/ (the workspace boundary).
  3. Built-in defaults.

A pyproject.toml without [tool.mdwright] does not stop the walk; discovery continues to the parent directory. A .mdwright.toml wins over a pyproject.toml in the same directory (matching ruff's "more-specific-name first" rule).

Run mdwright config init to create a documented .mdwright.toml starter file with every option set to its default.

Single-file integration via pyproject.toml

For projects that already use pyproject.toml, the entire mdwright configuration can live there under [tool.mdwright]:

# pyproject.toml
[tool.mdwright]
lint.preset = "default"
lint.extend-select = ["latex-command"]

[tool.mdwright.fmt]
wrap = 100

CLI overrides

The following knobs accept CLI flags that take precedence over the config file:

  • lint.preset, lint.select, lint.extend-select, lint.ignore: --rules
  • render.profile: mdwright render --render-profile
  • --no-suppress toggles whether <!-- mdwright: allow ... --> comments are honoured; there is no config-file equivalent.

All [fmt] knobs are config-file-only.

Schema reference

[lint] and nested tables

KeyTypeDefaultCLI overrideDescription
lint.preset"default" | "all" | "none""default"--rulesBaseline lint rule set. Use default for curated defaults, all for every registered rule, or none with lint.select for an explicit set.
lint.selectarray of string[]--rulesExact lint rule names to enable when lint.preset = "none". Preset names are not valid rule names here.
lint.extend-selectarray of string[]--rulesLint rule names to add on top of lint.preset.
lint.ignorearray of string[]--rulesLint rule names to remove after applying lint.preset, lint.select, and lint.extend-select.
lint.excludearray of string[]noneGitignore-style patterns. Matching files are dropped from lint runs. Patterns are anchored to the directory containing the config file.
lint.info-strings.extraarray of string[]noneProject-specific additions to the info-string-typo allowlist. The stdlib default allowlist still applies.

[fmt] and nested tables

KeyTypeDefaultCLI overrideDescription
fmt.profile"preserve" | "mdformat""preserve"noneFormatter style profile. preserve keeps mdwright's identity-oriented defaults; mdformat applies mdformat-compatible defaults where verified rewrites can preserve semantics. Explicit [fmt] keys override profile defaults.
fmt.wrap"keep" | "no" | int"keep"noneWrap mode for prose paragraphs. keep leaves existing breaks alone; no forbids new breaks; an integer enforces that display-column budget for breakable lines in every formatter profile.
fmt.wrap-strategy"stable" | "balanced""stable"noneReflow strategy used when fmt.wrap is an integer. stable greedily fills soft-break runs and is the default; balanced rebalances paragraphs for more even line lengths.
fmt.italic"asterisk" | "underscore" | "preserve""preserve"noneItalic delimiter canonicalisation. preserve leaves source bytes; asterisk or underscore opts into the post-pass rewrite. See Style knobs.
fmt.strong"asterisk" | "underscore" | "preserve""preserve"noneStrong-emphasis delimiter canonicalisation. Independent of fmt.italic: *italic* with __strong__ is expressible.
fmt.list-marker"dash" | "asterisk" | "plus" | "preserve""preserve"noneUnordered-list bullet canonicalisation. Each marker is rewritten through a marker-local fact and the family commits only after verification.
fmt.ordered-list"one" | "consistent" | "preserve""preserve"noneOrdered-list number canonicalisation. one rewrites markers to 1. only when verification preserves the list start; consistent renumbers each list from the source's first item; preserve keeps source numbering verbatim.
fmt.thematic-break"dash" | "asterisk" | "underscore" | "underscore-70" | "preserve""preserve"noneThematic-break canonicalisation. Fixed character modes preserve the source repeat count and spacing; underscore-70 rewrites the whole break line to mdformat's 70 underscores.
fmt.trailing-newline"preserve" | "strip" | "ensure" | bool"preserve"noneTrailing-newline policy at the document boundary. true is accepted as a synonym for ensure and false for strip.
fmt.end-of-line"lf" | "crlf" | "keep""lf"noneLine-ending normalisation. keep adopts the first newline seen in the source.
fmt.excludearray of string[]noneFormatter-specific exclude globs, independent of [lint] exclude.
fmt.heading-attrs"preserve" | "canonicalise""preserve"noneATX heading {#id .class key=val} trailer emission. preserve emits the source trailer byte-verbatim. canonicalise emits id first, then classes, then key-value pairs.
fmt.refs.placement"end" | "preserve""end"noneWhere reference-link definitions are emitted: gathered and sorted at the end of the document, or kept in source order.
fmt.refs.style"bare" | "angle" | "preserve""preserve"noneDestination style for reference-link and inline-link URLs. preserve keeps each destination's source form; bare strips wrapping <...> where the bare form still parses; angle wraps every destination in <...>.
fmt.footnotes.placement"end" | "preserve""preserve"noneWhere footnote definitions are emitted. Default is preserve because pulldown-cmark's HTML renderer ties footnote position to parse order; moving definitions would change the rendered HTML.
fmt.tables.style"preserve" | "pad""preserve"noneGFM table spacing policy. preserve keeps source cell spacing; pad aligns cells and delimiter rows to mdformat-compatible widths when verification preserves semantics.
fmt.lists.continuation-indent"marker-width" | "four-space""marker-width"noneContinuation indentation for wrapped list-item paragraphs. marker-width aligns to the source marker width; four-space matches mdformat's list continuation spelling.
fmt.frontmatter.preservebooltruenoneWhether to emit document frontmatter byte-verbatim. false strips it.
fmt.math.normaliseboolfalsenoneWhether whole-block math regions are normalised. Off by default because math bytes are opaque to CommonMark.
fmt.math.render"none" | "commonmark-katex" | "dollar""none"noneMath delimiter rendering policy for downstream renderers. none preserves source math regions; commonmark-katex records intent without rewriting; dollar rewrites bracket and paren math to dollar delimiters.

[parse] and nested tables

KeyTypeDefaultCLI overrideDescription
parse.math.delimiters"tex" | "github""tex"noneMath delimiter recognition policy. tex recognises \(...\), \[...\], and LaTeX environments; github also recognises $...$ and $$...$$.
parse.extensions.definition-listsbooltruenoneRecognise Term\n: definition\n definition lists. Turn off on non-mkdocs corpora to suppress recognition.
parse.extensions.abbreviation-listsbooltruenoneRecognise *[ABBR]: definition abbreviation declarations as a scan-and-preserve overlay. mdwright does not expand occurrences; the downstream renderer does.
parse.extensions.heading-attribute-listsbooltruenoneRecognise # Heading {#id .class} trailers via pulldown's heading-attribute extension. When off, the trailer reads as plain text in the heading body.
parse.extensions.block-attribute-listsbooltruenoneRecognise { .class } on a line by itself after a non-empty block as a scan-and-preserve overlay. Inline attribute lists are out of scope.
parse.extensions.gfm.autolinks"disabled" | "urls" | "urls-and-emails""urls-and-emails"noneRecognise GFM bare URL and email autolinks as document facts and render them as links. Use urls to leave bare emails as text or disabled for strict CommonMark-style text treatment.
parse.extensions.gfm.tagfilterbooltruenoneApply GFM tagfiltering when rendering or building semantic signatures. This escapes the raw HTML tags that cmark-gfm filters, without rewriting source bytes.
parse.extensions.myst.directive-containersbooltruenoneRecognise MyST :::{name} directive containers with :KEY: value options as a scan-and-preserve overlay. mdwright does not expand directives; downstream renderers do.
parse.extensions.myst.inline-rolesbooltruenoneRecognise MyST {role}`payload` inline roles as a scan-and-preserve overlay inside paragraph text.
parse.extensions.myst.substitution-referencesbooltruenoneRecognise MyST {{name}} inline substitution references as a scan-and-preserve overlay. Declarations live in YAML frontmatter and round-trip through the frontmatter path.
parse.extensions.myst.commentsbooltruenoneRecognise MyST % line comments at line-start as a scan-and-preserve overlay.
parse.extensions.pandoc.fenced-divsbooltruenoneRecognise Pandoc ::: {.cls} fenced div openers. The closer is a colon-only line of matching count.
parse.extensions.pandoc.short-form-divsbooltruenoneRecognise Pandoc :::name fenced div openers.
parse.extensions.pandoc.inline-attribute-spansbooltruenoneRecognise Pandoc [content]{.cls} inline attribute spans as a scan-and-preserve overlay.

[render] and nested tables

KeyTypeDefaultCLI overrideDescription
render.profile"pulldown" | "cmark-gfm""pulldown"--render-profileHTML spelling profile for mdwright render. pulldown preserves the default renderer; cmark-gfm matches cmark-gfm spelling where parser semantics already agree.

Round-trip safety

mdwright fmt is a semantic rewriter, not a string-level one. The contract: the rendered HTML of the output matches the rendered HTML of the input, modulo whitespace inside a paragraph that does not change word boundaries. The gfm_spec_snapshot test enforces it on every commit. Any input that fails the gate is either fixed at the root or recorded in the deviation table with a one-line reason.

The HTML-equivalence gate

For every document mdwright formats, the gate runs:

  1. Render the input to HTML.
  2. Format the input, then render the output to HTML.
  3. Assert (1) and (2) match, ignoring whitespace-only differences inside text paragraphs.

"Render" here means parse with pulldown-cmark and emit HTML from the event stream, so a parse divergence is caught in the same comparison. If the assertion fails the formatter has changed semantics; there is no exception path.

What "semantic" buys you

Some syntactically-equivalent rewrites are not applied. The clearest case: mdwright leaves a setext heading as-is rather than converting it to ATX when the conversion would change the HTML id-anchor that external links point at. The cost of round-trip safety is that the formatter sometimes declines a clean-up it could otherwise perform.

Reading deviation errors

When the gate trips during development, the test output names the input file, the formatted output, and the divergent line of HTML. The fix lives in one of three places:

  1. Document recognition misclassified a span: fix the document facts, not the formatter.
  2. A rewrite producer proposed a stale or over-broad byte edit: fix the candidate owner/range or the verification signature, not the caller.
  3. A new spec case the existing rules do not handle: extend the recognised facts, then the formatter or linter.

See also

Math regions

This is mdwright's reason for existing. Generic Markdown formatters mangle LaTeX: they reflow \frac{a}{b} into \\frac{a}{b}, collapse the blank line before \begin{align*}, and apply emphasis rules inside \(\alpha\). mdwright treats math as opaque: recognised before any other pass runs, emitted verbatim.

What counts as math

The default math grammar:

  • Inline. \( … \) (paired backslash-paren, single line).
  • Display. \[ … \] (paired backslash-bracket, may span lines).
  • Environments. \begin{NAME} … \end{NAME} for any NAME matching [A-Za-z][A-Za-z0-9*]*, paired with a non-overlapping \end{NAME}.

$ … $ and $$ … $$ are not math by default. Dollar-delimited math is common in academic prose but collides with literal-dollar use (prices, shell prompts). Opt in via configuration:

[lint]
math.dollar = true

The stray-dollar lint flags lone dollar signs when this option is off, so authors migrating from a dollar-delimited dialect catch the change.

How the scanner runs

The math crate recognises candidate math spans over strings and byte ranges. The document crate supplies Markdown exclusion ranges (code, HTML, other opaque regions), then stores the accepted math regions as document facts with stable coordinates back to the original source. The exact source bytes, including whitespace, casing, comment chars, and trailing backslashes, pass through unchanged. The formatter cannot accidentally apply emphasis, escape, or wrap logic inside a math region: rewrite candidates are verified against the document's math-region signature before they commit. Lint rules that match on text see the same opaque region; latex-command, for instance, only fires outside math.

Block-level math

A math environment whose start delimiter sits at column 1 of an otherwise-blank line is a block. The formatter emits blocks with one blank line above and below, never indented inside a list item unless the source already indented it. This avoids the canonical bug:

input:                          generic formatter:              mdwright:
A paragraph.                    A paragraph.                    A paragraph.
                                \begin{align*}                  
\begin{align*}                  E &= mc^2                       \begin{align*}
E &= mc^2                       \end{align*}                    E &= mc^2
\end{align*}                                                    \end{align*}

Stripping the blank line above \begin{align*} rolls the environment into the paragraph and breaks the rendered DOM.

Math-adjacent rules

Three rules check math without parsing it:

Each runs on the recognised region as a string; none of them care about the math semantics.

Math inside code blocks

If \( appears inside a fenced code block or inline code (`\(x\)`), mdwright does not treat it as math; code regions are recognised earlier still. The math scanner consults the same exclusion ranges the formatter does, so it never produces false positives inside code or HTML.

See also

Math rendering

mdwright does not try to be TeX. It shapes math regions so a downstream renderer, such as KaTeX, MathJax, mkdocs-material's math plugin, or jupyter-book, can do browser-quality typesetting. --math-render chooses the source delimiter shape for formatter and HTML-render checks.

For terminal inspection, mdwright preview --math=unicode has a first-party Unicode renderer for a large common subset of MathJax-style TeX math input: symbols, Greek letters, scripts, accents, fractions, roots, delimiters, arrows, relations, operators, and matrix-like environments where Unicode terminal text can represent the result honestly. Unsupported math falls back to source text instead of guessing.

For editable source translation, use mdwright math. It translates math bodies between LaTeX commands and Unicode source while preserving Markdown math delimiters. Unicode-to-LaTeX translation is parser-backed for the supported subset, so scripts, styled alphabets, accents, arrows, and direct symbols are recognised as source structure before canonical LaTeX is emitted. Normal mdwright fmt never rewrites math notation silently.

For what mdwright treats as math, see Math regions. This page is about how those regions are emitted.

The two modes

ModeBehaviour
nonePass math regions through verbatim. Default.
dollarRewrite \[ … \] to $$ … $$ and \( … \) to $ … $. Environments stay.

A third value, commonmark-katex, is a documentation alias: the behaviour matches none exactly, but the name leaves a greppable signal in CI logs that the build expects KaTeX downstream.

When to use which

  • none fits most projects. KaTeX (via auto-render), MathJax v3's auto-renderer, mkdocs-material's math plugin, jupyter-book, and Pelican all recognise \[ … \] and \( … \) out of the box.
  • dollar fits Pandoc-style pipelines that expect $ delimiters. The rewrite is one-directional: \[ becomes $$, \( becomes $, source already in dollar form passes through unchanged, and LaTeX environments stay environments (there is no dollar form of \begin{align*}).

CLI and config

mdwright fmt --math-render=dollar path/to/notes.md
[fmt.math]
render = "dollar"  # or "none", "commonmark-katex"

The CLI flag overrides the config file; both fall back to MathRender::None.

Inspecting the rendered HTML

mdwright render pipes the formatted output through mdwright's HTML renderer to stdout:

mdwright render notes.md > notes.html
mdwright render --math-render=dollar notes.md
mdwright render --render-profile=cmark-gfm notes.md
mdwright render --open notes.md

Captured stdout is raw HTML by default. --color=always highlights the HTML for terminal reading, and --open writes the HTML to a temporary file and opens it in the system browser.

This is a diagnostic surface, not a production renderer. mdwright's HTML emitter does not enable pulldown-cmark's math extension: math regions land in the HTML as plain text in whatever delimiter form the formatter produced. Feed that HTML through KaTeX, MathJax, or your static-site generator's math plugin to see browser-quality typeset output.

--render-profile=cmark-gfm changes HTML spelling only. It is useful when comparing diagnostic HTML with cmark-gfm-based tools, but it does not change parser semantics or formatter source rewrites.

Terminal preview

mdwright preview renders static terminal text:

mdwright preview notes.md
mdwright preview --color=always notes.md
mdwright preview --math=source notes.md

preview is for fast local inspection. It renders headings, lists, block quotes, links, code blocks, tables, and simple math as terminal text. It does not claim CSS layout, images, browser fonts, KaTeX, or MathJax equivalence. Use render --open when the browser view matters.

Source translation

mdwright math is the explicit command for changing math notation:

printf '$\alpha_i$\n' | mdwright math --to-unicode -
printf '$αᵢ$\n' | mdwright math --to-latex -
mdwright math --to-unicode --diff notes.md
mdwright math --to-unicode --write notes.md

File mode translates recognised Markdown math bodies and preserves their delimiters, so \( \alpha_i \) becomes \( αᵢ \). Use --check for CI, --diff to inspect a patch-compatible diff, and --write to mutate files. Stdin without recognised Markdown math delimiters is treated as one math source body.

Translation is conservative. Direct symbols, scripts, styled alphabets, accents, roots, aliases, and other constructs with honest editable Unicode forms are translated. Unsupported Unicode, ambiguous accent/prime ownership, diagrams, fractions, complex environments, macros, colour/style commands, and other constructs without a plain source form remain visible and are reported on stderr rather than being approximated.

The gate under dollar mode

The HTML-equivalence gate in Round-trip safety compares pre-format HTML against post-format HTML. Under --math-render=dollar that comparison would always diverge, because the formatter intentionally rewrites math. The gate's actual contract is idempotence-on-mode: formatting the output a second time with the same options must produce the same canonical event stream. Divergence between the first and second pass is still a hard failure. See mdwright_format::format_validated for the entry point.

Markdown extensions

mdformat-mkdocs (the formatter most mkdocs-material projects reach for today) recognises a few constructs that plain CommonMark / GFM does not. mdwright matches it for each, so a project can swap one tool for the other without visible churn.

Recognition is preservation, not interpretation: mdwright knows the constructs exist, emits them canonically, and gates each via a per-extension toggle. It does not expand abbreviations, render {...} to HTML, or change semantics. The downstream renderer (Python-Markdown, mkdocs-material, jupyter-book) does that work.

GFM URL and email autolinks are recognised by default. mdwright also applies GFM tagfiltering when rendering or building semantic signatures. These behaviours close the cmark-gfm rendering gap while keeping formatter output byte-preserving.

The four extensions

ExtensionSource shapeDefault
Definition listsTerm\n: definition\non
Heading attribute lists# Heading {#id .class key=val}on
Abbreviation lists*[HTML]: Hyper Text Markup Language\non
Non-heading attribute listsParagraph\n{ .note .important }\non

Defaults are on: each recognises something the source is already doing, not a formatter opinion. Turn them off in .mdwright.toml when running mdwright on non-mkdocs corpora where false positives matter more than coverage:

[parse.extensions]
definition-lists = false
abbreviation-lists = false
heading-attribute-lists = false
block-attribute-lists = false

Definition lists

Source:

Term
:   Single-paragraph definition body. Continuation lines are
    indented four spaces and aligned with the body column.

Operating system
:   The software that manages hardware resources. Notable examples:

    - Linux
    - macOS
    - Windows

    Run `uname -a` to see your kernel version.

Canonical emission matches mdformat-mkdocs:

  • Tight form (Term\n: body) for single-paragraph definitions.
  • Loose form (blank line between term and the : marker) when the definition has multiple block children: a paragraph plus a nested list / code block, or multi-paragraph text. The blank line is the syntactic boundary that makes the multi-block body parse correctly.

Multiple definitions for one term emit on consecutive : lines with no blank between them; blank lines separate term groups.

Heading attribute lists

Source:

# Heading {#section-one}

## Multiple classes {.warning .important}

### Mixed shape {#mix .alpha .beta key=val}

The trailer parses through pulldown-cmark's ENABLE_HEADING_ATTRIBUTES flag, lands on the typed Heading, and re-emits based on [fmt] heading-attrs:

ModeBehaviour
preserve (default)Emit the source trailer byte-verbatim between the inline body and the line break.
canonicaliseEmit {#id .class₁ .class₂ k=v}: id first, then classes (source order), then key=value pairs (source order). Values containing whitespace are double-quoted.
[fmt]
heading-attrs = "preserve"  # or "canonicalise"

Pulldown limitation. pulldown-cmark 0.13's heading-attribute parser splits the trailer on whitespace and does not honour double-quoted values. # H {title="hello world"} parses as two attributes, title="hello and world", not one. mdformat-mkdocs (which uses python-markdown's attr_list) handles the quoted form correctly. Until pulldown upstream lands the fix, mdwright's heading-attribute output for quoted values diverges from mdformat-mkdocs; documented in Deviations from spec.

Abbreviation lists

Source:

The HTML standard is maintained by the W3C.

*[HTML]: Hyper Text Markup Language
*[W3C]: World Wide Web Consortium

mdwright recognises the *[TERM]: definition shape and preserves the declarations verbatim. It does not expand occurrences (the downstream renderer wraps them in <abbr title="…">…</abbr>). Each declaration is one source line; continuation lines are not supported, matching python-markdown's abbr extension.

Consecutive abbreviation lines (no blank line between them) are bundled into one source paragraph by pulldown and emitted as one verbatim block. A blank line above the first declaration is conventional but not required.

Non-heading attribute lists

Source:

This paragraph carries a class trailer used by the renderer to style it.
{ .note .important }

The trailer must:

  • sit on the line immediately after a non-empty block (no blank-line separator), and
  • contain only the brace-delimited attribute list and optional surrounding whitespace.

When mdwright recognises the pattern, the entire block (body + trailer) is emitted as a single verbatim source slice. Other paragraph-level rewrites (line wrap, link normalisation, escape rewrites) are skipped for that paragraph, so preservation narrows the formatter's active surface for the formatter on annotated blocks.

Inline attribute lists (some *emphasised* { .em } text mid-paragraph) are explicitly out of scope. mdwright's inline formatter has no overlay mechanism today; adding one is a separate design exercise. Inline {...} tokens flow through as plain text.

Round-trip and idempotence

Reformatting under any combination of these extensions still goes through the HTML-equivalence gate. Verbatim overlays satisfy it trivially, and the canonical emission shape for typed-block constructs is a fixed point of its own parser by construction.

Parity with mdformat-mkdocs

The parity goal is concrete: an mkdocs-material site running mdformat-mkdocs swaps in mdwright with no visible diff. The parity test at tests/extension_parity.rs byte-compares mdwright's output against mdformat-mkdocs reference output for the five extension regression fixtures; any divergence is fixed in mdwright or recorded in Deviations from spec.

MyST + Pandoc directives

MyST (Markedly Structured Text) is the substrate for jupyter-book and Sphinx-MyST. Pandoc has overlapping syntax for the same shapes. mdwright recognises the common constructs from both flavours and preserves their bytes verbatim; it does not expand directives, render roles, or resolve substitutions. The downstream renderer (Sphinx, jupyter-book, Pandoc) does that work.

Like Markdown extensions and math rendering, recognition is preservation, not interpretation. Defaults are on: these recognise what the source already says, not formatter opinion.

What mdwright recognises

ConstructSource shapeDefault
MyST directive container:::{name}\n…\n:::on
Pandoc fenced div (attr form)::: {.warning}\n…\n:::on
Pandoc fenced div (short):::note\n…\n:::on
MyST inline role{term}`Vector Space`on
MyST substitution reference{{name}}on
Pandoc inline attribute span[content]{.cls}on
MyST line comment% comment texton

Turn individual recognisers off in .mdwright.toml when running mdwright on non-MyST corpora:

[parse.extensions.myst]
directive-containers = false
inline-roles = false
substitution-references = false
comments = false

[parse.extensions.pandoc]
fenced-divs = false
short-form-divs = false
inline-attribute-spans = false

Block directive containers

Source:

:::{note}
This is a MyST note. It can contain *inline* and

multiple paragraphs.
:::

Pandoc variants (attr form and short form) are also recognised:

::: {.warning}
Pandoc fenced div, attribute form.
:::

:::note
Pandoc short form.
:::

Directives with options round-trip verbatim:

:::{figure} ./img.png
:alt: A diagram of the system
:width: 300px
:align: center

The figure caption text.
:::

Nested directives use opener / closer counts that increase outward: :::: outside, ::: inside. mdwright preserves the nesting:

::::{note}
Outer body.

:::{tip}
Inner body.
:::
::::

mdwright records the outermost directive's byte range and emits it verbatim; inner directives sit inside that range and are preserved implicitly. Two directives at the same colon count separated by a blank line are sibling regions, not a nested pair.

Inline overlays

Inline roles attach a role name to a backtick-delimited payload. The role name is unrestricted: mdwright does not know what {term} or {download} means; that is downstream's job. The bytes round-trip:

The {term}`Vector Space` is a fundamental concept.

Substitution references look the same but with double braces and no backticks:

Some content with {{my-sub}}.

The declaration lives in YAML frontmatter under myst_substitutions: and round-trips through the same verbatim path mdwright uses for frontmatter:

---
myst_substitutions:
  my-sub: "Replacement text"
  another: "{{my-sub}} again"
---

Body content uses {{my-sub}} and {{another}}.

Pandoc inline attribute spans wrap a fragment in square brackets and follow it with a brace attribute list. mdwright distinguishes them from CommonMark links (where the brackets are followed by () and preserves the byte sequence:

Highlight a [span of text]{.note} in the middle of a paragraph.

Line comments

MyST's % line comment is a line whose first non-whitespace byte is %. mdwright preserves it verbatim:

% This line is dropped by MyST renderers but mdwright keeps it.

Unlike LaTeX, % is only a comment at the start of a line; inline % characters in prose are literal text and survive untouched.

What mdwright does not do

Expansion, role rendering, substitution resolution, and directive-name validation are all the downstream renderer's job. A :::{figure} is emitted as :::{figure}; the image is not inlined and the options are not rendered; {term}`Vector Space` stays as-is; {{my-sub}} is preserved even when the frontmatter declares a replacement; any directive name matching [a-zA-Z0-9_-]+ is accepted, and an unknown name is downstream's problem.

Run mdwright before Sphinx, jupyter-book, or Pandoc: it normalises the surrounding Markdown without touching the MyST / Pandoc constructs the downstream renderer needs.

Round-trip and idempotence

Every MyST / Pandoc construct passes through the same idempotence-on-mode contract as the rest of the formatter; see Round-trip safety. Verbatim preservation overlays satisfy it trivially as long as the recogniser classifies the same bytes the same way on both passes. It does, since the scanner is fully deterministic over source bytes plus the exclusion vectors (fenced code, inline code, HTML, math).

Lint vs. format

mdwright has two pipelines and four subcommands. They share one event walk over pulldown-cmark but otherwise do not interact: a lint diagnostic never blocks a format pass, and the formatter never depends on lint state.

The four subcommands

SubcommandWritesExit non-zero when
mdwright checknothing--check is set and a non-advisory diagnostic fires
mdwright fixfiles (safe fixes only)--check is set and a non-advisory diagnostic still remains
mdwright fmtfiles (every input)parse fails or the safety gate refuses the rewrite (exit 2)
mdwright fmt-checknothingany input would be reformatted (exit 1)

check is the audit; fix is the audit that may mutate; fmt is the unconditional rewrite; fmt-check is the rewrite-or-fail-CI variant. By default check and fix exit 0 even with diagnostics present; pass --check to make them fail CI.

Why the pipelines are separate

The linter answers a local question: does this Markdown have problems? A bare URL, a mismatched code fence, a duplicate heading id. Diagnostics carry locations and optional fixes. Rules implement the LintRule trait and operate on a flat IR (events with byte spans).

The formatter answers a whole-document question: which verified byte rewrites should apply? Structural emit is identity: default formatting preserves source bytes modulo document-boundary normalisation. Canonicalisation and wrapping are proposed as rewrite candidates and committed only after document-level verification.

The two pipelines share a parse but nothing else.

When you want both

Most projects run both in CI; the two are independent. A project can format with mdwright and disable every default-on lint, or run a tight lint set without ever invoking the formatter.

mdwright check . && mdwright fmt-check .

For pre-commit hooks, see Integration → Pre-commit.

What --check means

--check on mdwright check (or mdwright fix) makes the command exit 1 when any non-advisory diagnostic fires. Without it, check prints diagnostics and exits 0, which is useful for tooling that wants to consume the output without aborting.

mdwright fmt-check has no --check flag; it always exits non-zero when any file would be reformatted, matching rustfmt --check's contract.

See also

Suppression comments

A suppression comment silences one lint rule on the next block, the next line, or a range. They look like HTML comments so they are invisible in the rendered document.

Forms

Next block. Silence one rule on the block immediately following:

<!-- mdwright: allow bare-url -->

See https://example.com for the spec.

Next line. Silence on the next non-blank line only:

<!-- mdwright: allow-next-line bare-url -->
See https://example.com for the spec.

Range. Open with allow-begin, close with allow-end. Useful for tables, generated content, or vendored sections:

<!-- mdwright: allow-begin bare-url -->

| Source | URL |
| --- | --- |
| Spec | https://spec.commonmark.org/ |
| GFM | https://github.github.com/gfm/ |

<!-- mdwright: allow-end bare-url -->

Separate multiple rules with commas: <!-- mdwright: allow bare-url, latex-command -->. Use the literal all to silence every rule (rarely the right choice): <!-- mdwright: allow all -->.

Auditing what you have silenced

mdwright check --no-suppress .

ignores every suppression marker and reports the full diagnostic set. Use this to find suppressions that no longer correspond to a real diagnostic.

mdwright check itself reports unused suppressions: a <!-- mdwright: allow bare-url --> whose target block has no bare URLs surfaces as an advisory, so you can delete the marker.

Suppression vs. disabling

Use a suppression marker when a rule is right project-wide but wrong at one location, and add a sibling HTML comment explaining why:

<!-- mdwright: allow bare-url -->
<!-- The renderer in this project linkifies bare URLs itself. -->

See https://example.com for the spec.

When the same suppression appears in dozens of places, disable the rule in configuration instead:

[lint]
preset = "default"
ignore = ["bare-url"]

See Configuration.

See also

  • Lint vs. format: suppression only affects linting; the formatter has no per-document opt-out.
  • Rules catalogue: every rule's kebab-case name (the literal that goes in the suppression comment).

Formatter policy

mdwright's formatter has two responsibilities, in this order:

1. Identity Emit: Preserve

Start with the user's source bytes. With every style knob at its default and wrap = "keep", formatting returns those bytes unchanged except for the document-boundary policies: line endings, trailing newline handling, and end-of-line selection.

This is the load-bearing invariant. Default formatting is idempotent by construction because the formatter does not synthesise Markdown for recognised structures.

You opt out of preservation by setting the rewrite knobs below. There is no "semi-preserve" mode.

2. Verified Rewrite Families: Opt In

The formatter crate runs style-canonicalisation and wrapping through private rewrite families: inline delimiters, list markers, thematic breaks, link destinations, heading attributes, tables, math, frontmatter, and terminal wrap. Each canonical family builds a local normal-form edit plan, proves its edits do not overlap within the family, applies the plan to a scratch buffer, and verifies the result before it can commit.

If verification fails, the whole family skips. The engine never commits half of a family plan. If the family pipeline cannot reach a pass with no commits before its guard trips, mdwright leaves the original source bytes unchanged instead of returning a partial normal form.

Tables are parent normal forms. The table family runs after inline canonicalisers, reads cell contents from the current snapshot, and rewrites each table block only when document-owned table facts account for the full table shape. It does not emit row- or cell-level edits that could race inline rewrites.

Wrap is terminal. It runs only after a full canonical-family scan commits no edits for the current snapshot. If wrap commits paragraph edits, the engine returns to the first canonical family on a fresh parse before wrapping again. Paragraph shapes the wrap pass cannot model stay unchanged and are counted in the formatter report.

An integer wrap setting is a line-budget contract, not a profile-specific preference. With wrap = 120, breakable paragraph lines are kept at or below 120 display columns in both the default formatter profile and the mdformat profile. The only accepted overflow is one indivisible atomic token, such as a code span, URL, math atom, or single long word. The default wrap strategy is stable soft-break reflow: ordinary source newlines inside a paragraph may be joined, hard breaks stay hard boundaries, and overlong breakable runs are wrapped to the configured budget. wrap-strategy = "balanced" opts into a paragraph rebalancer for authors who prefer more even line lengths.

Default: every style knob is Preserve and wrapping is Keep. With the default config the rewrite-family pipeline short-circuits before running. Set per-knob targets in .mdwright.toml to opt in.

Why the separation

Synthesising structural output during canonicalisation creates a bug class where one emit decision perturbs the parse context of another: rewriting _foo_ to *foo* can change an adjacent site's emphasis-flanking class, so the next rewrite reads a different pulldown event stream than the one it planned against.

Identity emit removes that perturbation source. Rewrite families keep the remaining byte changes in formatter-owned normal-form plans, so a stale local string edit cannot commit without reparsing and verification.

How to opt in

In .mdwright.toml:

[fmt]
italic = "asterisk"            # _foo_ → *foo*
strong = "underscore"          # **bar** → __bar__
list-marker = "dash"           # * x   → - x
thematic-break = "dash"        # *** → ---
ordered-list = "consistent"    # 3. a / 5. b / 9. c → 3. a / 4. b / 5. c

[fmt.refs]
style = "angle"                # [ref]: url → [ref]: <url>

Each knob also accepts "preserve" to explicitly disable canonicalisation. See Style knobs for the per-knob reference, including which rewrites might skip verification (e.g. intraword underscore that can't safely become asterisk).

What the canonicalisation pass does NOT do

  • Does not rewrap prose (wrap is a separate knob; see Configuration).
  • Does not change content semantics: every rewrite must reparse to the same canonical event stream as the bytes it replaces, or it is skipped.
  • Does not expose rewrite families, snapshot ownership, or verification signatures as public API. Those details stay private to mdwright-format.

For mdformat-compatible spelling where verified rewrites preserve the parsed document, use [fmt] profile = "mdformat".

Style knobs

This page documents each style knob in [fmt]. Every knob defaults to "preserve", which means the canonicalisation pass leaves source bytes unchanged for that construct. Set a non-preserve value to opt into rewriting.

See Formatter policy for the overall design (structural emit + opt-in canonicalisation) and Configuration for the full .mdwright.toml schema.

[fmt] italic

ValueEffect
"preserve" (default)Emphasis delimiters round-trip from source. _foo_ stays _foo_; *foo* stays *foo*.
"asterisk"Rewrite _…_ to *…* when verification preserves the parse.
"underscore"Rewrite *…* to _…_ when verification preserves the parse.

Verification skips when: the rewrite would change the parse of the enclosing paragraph window. The most common case is intraword underscore (id_S, Hom_{cart}): pulldown already treats these as plain text under CM §6.2 rule 6, so no rewrite is proposed and nothing skips. Where rewrites do skip silently is in dense multi-delimiter runs (*_*…*_*-style chains) whose pairing depends on flanking neighbours; verification catches these and leaves the source bytes in place.

[fmt]
italic = "asterisk"

[fmt] strong

ValueEffect
"preserve" (default)Strong delimiters round-trip from source. **foo** stays **foo**; __foo__ stays __foo__.
"asterisk"Rewrite __…__ to **…**.
"underscore"Rewrite **…** to __…__.

Independent of italic. With italic = "asterisk" and strong = "underscore" you get *italic* alongside __strong__. italic and strong are independent knobs.

[fmt]
italic = "asterisk"
strong = "underscore"

[fmt] list-marker

ValueEffect
"preserve" (default)Each unordered list keeps its source bullet character.
"dash"Rewrite each bullet to -.
"asterisk"Rewrite each bullet to *.
"plus"Rewrite each bullet to +.

Marker-local. The document crate exposes one fact per list-item marker. The formatter rewrites those marker bytes only, then verifies the full document before committing the family plan. Nested list markers are separate facts, so an outer list rewrite cannot cover child markers accidentally.

[fmt]
list-marker = "dash"

[fmt] ordered-list

ValueEffect
"preserve" (default)Each ordered list keeps its source numbering. 3. a / 5. b / 9. c stays.
"one"Rewrite markers to 1. when verification preserves the list start. This matches mdformat's default spelling for ordinary lists that already start at 1..
"consistent"Renumber so item k (0-indexed) becomes start_num + k, where start_num is the source's first item's number. 3. a / 5. b / 9. c3. a / 4. b / 5. c.

Marker-local: each ordered item exposes its digit range, list start, and ordinal. The family plan rewrites those digit ranges and commits only after full-document verification. The starting number is preserved; only the increment is canonicalised.

[fmt]
ordered-list = "consistent"

[fmt] thematic-break

ValueEffect
"preserve" (default)Thematic breaks keep their source character (---, ***, ___).
"dash"Rewrite to ---.
"asterisk"Rewrite to ***.
"underscore"Rewrite to ___.
"underscore-70"Rewrite the whole line to 70 underscores, matching mdformat's default thematic-break spelling.

The repeat count and internal spacing are preserved; only the character changes. So * * * becomes _ _ _ under "underscore", not ___. Use "underscore-70" when you want the mdformat spelling.

[fmt]
thematic-break = "dash"

[fmt.refs] style

ValueEffect
"preserve" (default)Each link destination keeps its source form: [ref]: url or [ref]: <url> survives.
"bare"Strip angle brackets where the bare form would still parse. [ref]: <url>[ref]: url.
"angle"Wrap destinations in angle brackets. [ref]: url[ref]: <url>.

Applies to both reference-link definitions ([ref]: dest) and inline link destinations ([text](dest)). Verification skips when the bare form contains whitespace, unbalanced parentheses, or other bytes that would prevent pulldown from parsing it as a bare destination; the angle-wrapped form is kept in those cases.

[fmt.refs]
style = "angle"

[fmt.tables] style

ValueEffect
"preserve" (default)GFM table spacing round-trips from source.
"pad"Pad cells and delimiter rows to mdformat-compatible widths when verification preserves the parse.

Padding is a table-level operation. Inline delimiter and link destination rewrites run first; table padding then reads the current cell bytes and rewrites the table block as one verified replacement. Tables with source cells the document facts cannot account for are left unchanged rather than partially rewritten.

[fmt.tables]
style = "pad"

[fmt] wrap

ValueEffect
"keep" (default)Preserve existing paragraph line breaks.
"no"Collapse soft line breaks inside paragraphs where verification preserves the parse.
integerWrap breakable prose lines at that display-column width.

wrap = 120 means breakable output lines should fit within 120 columns in every formatter profile. The accepted exception is an indivisible atomic token, such as a long code span, URL, math atom, or single long word. Those tokens are left intact rather than split into invalid Markdown.

The default wrap strategy is "stable": ordinary source newlines inside a paragraph are soft break positions, hard breaks stay hard boundaries, and each hard-break-bounded run is filled greedily up to the configured column. Use wrap-strategy = "balanced" when you want mdwright to rebalance paragraphs for more even line lengths.

[fmt]
wrap = 120
[fmt]
wrap = 120
wrap-strategy = "balanced"

[fmt.lists] continuation-indent

ValueEffect
"marker-width" (default)Continuation lines align under the source list marker width.
"four-space"Continuation lines use four spaces after the containing block prefix.

This setting only affects paragraphs that are wrapped inside list items. It is separate from list-marker because the bullet character and continuation indentation are independent style decisions. The mdformat profile defaults this key to "four-space"; explicit config overrides that default.

[fmt]
wrap = 120

[fmt.lists]
continuation-indent = "four-space"

Combined example

[fmt]
profile = "mdformat"

This keeps mdformat's default wrap = keep, sets list continuation indentation to four spaces, and applies mdformat spelling for supported style knobs. Explicit keys override the profile:

[fmt]
profile = "mdformat"
wrap = 120

[fmt.lists]
continuation-indent = "marker-width"

A per-knob spelling can also be written without the profile:

[fmt]
list-marker = "dash"
thematic-break = "underscore-70"
ordered-list = "one"

[fmt.refs]
style = "angle"

[fmt.tables]
style = "pad"

This is mdformat-compatible where mdwright has verified rewrite support. It does not move orphan footnotes or copy mdformat behaviours that would change the parsed document.

How verification skips become visible

When a rewrite would change the parse of the enclosing paragraph window, the canonicalisation pass logs a tracing::warn! with the byte span and skipped rewrite. Capture these in production with RUST_LOG=mdwright_format=warn. A high skip rate on one document usually points at a structural-emit edge case worth filing as a regression input.

Lint rules

Every rule shipped by mdwright's standard library, grouped by how they behave on a fresh install. Each link points to the rule's long-form explanation; mdwright explain <name> prints the same text from the command line.

Default rules

On by default. A diagnostic from one of these fails mdwright check --check.

RuleFixDescription
unbalanced-backticknoBacktick in prose that could not be paired with a closing fence.
math/unbalanced-delimnoTeX-style math open delimiter (\[, \(, $$, $) with no matching close.
math/unbalanced-envnoLaTeX \begin{env} with no matching \end{env} at the same nesting depth.
math/unbalanced-bracesno{ / } inside a math body do not balance; math body normalisation is skipped for that region.
adjacent-code-no-spacenoInline code span adjacent to a letter without whitespace.
heading-punctuationnoTrailing . or : on a heading.
orphan-reference-linknoReference-style link with no matching [label]: definition.
duplicate-link-labelnoTwo [label]: definitions with the same label.
bare-urlyesBare URL in prose; wrap in <…> for a CommonMark autolink.
trailing-whitespaceyesTrailing whitespace at end of line.
inconsistent-list-markernoMixed - / * / + markers in one bullet list.

Default advisories

On by default but informational: they report but do not fail --check.

RuleFixDescription
duplicate-headingnoTwo headings at the same level under the same parent with the same text.
unicodeable-subscriptyesBraced super/subscript that has a single-codepoint Unicode form.
info-string-typonoFenced code block info string not in the known-languages allowlist.

Opt-in rules

Off by default. Enable with lint.extend-select = ["name"] in configuration.

RuleFixDescription
list-tightness-flippednolist tightness from the tree disagrees with tightness from source bytes
stray-dollaryesLiteral $ in prose (opt-in for projects that don't use $…$ math).
latex-commandyesLaTeX control sequence in prose (opt-in for Unicode-math projects).
escaped-emphasisyesLiteral \_, \*, or \` escape in prose (mdformat damage).
subscript-damageyesIdentifier with * where a _ subscript was expected (formatter damage).

name: adjacent-code-no-space default: true advisory: false fix: false since: 0.1.0

adjacent-code-no-space

Inline code span adjacent to a letter without whitespace.

What it does

Flags inline code spans whose backticks touch an adjacent alphanumeric or backtick character without an intervening space, e.g. `foo`bar or `foobar` ``.

Why

CommonMark renders `foo`bar as <code>foo</code>bar; the visual result runs the code into the prose with no visual break, which is almost always a typo for `foo` bar or `foobar`. Two consecutive code spans with no space between them ( `foobar` ``) is even more ambiguous: it depends on backtick counting and renders inconsistently across implementations.

Example (bad)

Call `vec.push(x)`afterwards.

Example (good)

Call `vec.push(x)` afterwards.

Configuration

  • Disable inline: <!-- mdwright: allow adjacent-code-no-space -->.
  • Disable in config: [lint] ignore = ["adjacent-code-no-space"].
  • Severity: non-advisory.

References


name: bare-url default: true advisory: false fix: true since: 0.1.0

bare-url

Bare URL in prose; wrap in <…> for a CommonMark autolink.

What it does

Flags http:// and https:// URLs that appear in prose without being wrapped in a CommonMark autolink (<https://example.com>) or a [text](url) link.

Why

mdwright recognises GFM bare URL autolinks for rendering, but whether the same source renders as a clickable link still depends on each downstream renderer's extension set. Wrapping the URL in <…> makes the link explicit and portable across CommonMark renderers.

The autofix (safe: true) wraps the URL in angle brackets in place; mdwright fix applies it.

Example (bad)

See https://example.com for details.

Example (good)

See <https://example.com> for details.

Configuration

  • Disable inline: <!-- mdwright: allow bare-url -->.
  • Disable in config: [lint] ignore = ["bare-url"].
  • Severity: non-advisory. Safe autofix available.

References


name: duplicate-heading default: true advisory: true fix: false since: 0.2.0

duplicate-heading

Two headings at the same level under the same parent with the same text.

What it does

Flags two or more headings whose slug (lowercase, hyphenated text) collide within the same document.

Why

Markdown renderers (GitHub, mdBook, GitLab) assign each heading a URL fragment derived from its text. Two headings with the same text collide on the fragment: only one is reachable, and which one depends on whether the renderer disambiguates with a -1 suffix or silently overwrites. External links to the document then drift unpredictably as new sections are added.

Example (bad)

## Examples

…

## Examples

Example (good)

## Examples

…

## More examples

Configuration

  • Disable inline: <!-- mdwright: allow duplicate-heading -->.
  • Disable in config: [lint] ignore = ["duplicate-heading"].
  • Severity: advisory. Math/theorem documents legitimately repeat ### Proof or ### Corollary under one chapter, so the diagnostic surfaces but does not fail mdwright check --check.

References

  • GitHub's slug algorithm: lowercase, replace whitespace with -, strip non-word characters.

name: duplicate-link-label default: true advisory: false fix: false since: 0.1.0

duplicate-link-label

Two [label]: definitions with the same label.

What it does

Flags [label]: … link definitions that share a label (case-insensitive, normalised) with another definition in the same document.

Why

CommonMark says the first definition wins; later duplicates are silently discarded. The author usually intended for one of them to be a different label, so a duplicate is almost always a copy-paste mistake. Worse, the discarded definition often documents the intended target, so the link still resolves, but to the wrong URL.

Example (bad)

See the [docs][readme] and the [tutorial][readme].

[readme]: https://example.com/readme
[readme]: https://example.com/tutorial

Example (good)

See the [docs][readme] and the [tutorial][tutorial].

[readme]: https://example.com/readme
[tutorial]: https://example.com/tutorial

Configuration

  • Disable inline: <!-- mdwright: allow duplicate-link-label -->.
  • Disable in config: [lint] ignore = ["duplicate-link-label"].
  • Severity: non-advisory.

References


name: escaped-emphasis default: false advisory: false fix: true since: 0.1.0

escaped-emphasis

Literal \_, \*, or \` escape in prose (mdformat damage).

What it does

Flags \_ and \* escape sequences in prose that look like a writer trying to escape an emphasis marker, but where the surrounding context confirms the writer meant the emphasis to fire, e.g. \_text\_ (two escapes around a word) reading as a damaged italic.

Why

mdformat and a few other roundtrip tools used to defensively escape _ and * in prose, even where CommonMark would not have parsed them as emphasis. After enough roundtrips, documents accumulate \_word\_ patterns that no longer render as italic; they render as literal _word_. The rule finds these and proposes the unescaped form.

The autofix removes the escapes (\_text\__text_); safe to apply, but review first if the prose genuinely contains literal underscores (filenames, identifiers).

Example (bad)

This is \_actually italic\_, despite the escapes.

Example (good)

This is _actually italic_, despite the escapes.

Configuration

  • This rule is off by default. Enable with [lint] extend-select = ["escaped-emphasis"].
  • Disable inline: <!-- mdwright: allow escaped-emphasis -->.
  • Severity: non-advisory. Safe autofix available.

References


name: heading-punctuation default: true advisory: false fix: false since: 0.1.0

heading-punctuation

Trailing . or : on a heading.

What it does

Flags ATX or setext headings that end with ., !, or ?: terminal sentence punctuation.

Why

A heading is a title, not a sentence; terminal punctuation on titles reads as a typo, breaks heading-anchor slugs in some renderers, and inflates the table of contents with stray characters. The convention is shared by Microsoft, GitHub, and Google's documentation style guides.

Example (bad)

## Configuring the linter.

Example (good)

## Configuring the linter

If the heading genuinely needs to be a question, this rule still fires; either reword it as a declarative title or suppress on that block.

Configuration

  • Disable inline (for one heading): place <!-- mdwright: allow heading-punctuation --> on the line before the heading.
  • Disable in config: [lint] ignore = ["heading-punctuation"].
  • Severity: non-advisory.

References


name: inconsistent-list-marker default: true advisory: false fix: false since: 0.1.0

inconsistent-list-marker

Mixed - / * / + markers in one bullet list.

What it does

Flags bulleted lists whose items use a mix of marker characters (-, *, +) within the same list. Ordered lists are not affected.

Why

CommonMark says any of -, *, + is a valid bullet marker; switching markers between items either (a) reads as a typo, or (b) actually starts a new list under CommonMark's rules, which produces a visible gap in the rendered output that the author rarely intended. Either way the fix is to pick one marker per list and stick to it.

The formatter normalises markers across the whole document; this rule fires when the source as written would render two adjacent lists when one was meant.

Example (bad)

- first
* second
- third

Example (good)

- first
- second
- third

Configuration

  • Disable inline: <!-- mdwright: allow inconsistent-list-marker -->.
  • Disable in config: [lint] ignore = ["inconsistent-list-marker"].
  • Severity: non-advisory.

References


name: info-string-typo default: true advisory: true fix: false since: 0.2.0

info-string-typo

Fenced code block info string not in the known-languages allowlist.

What it does

Flags fenced code blocks whose info string (the word after the opening fence, e.g. rust in ```rust) is not in mdwright's allowlist of known languages and tools.

Why

Renderers ignore unknown info strings (no syntax highlighting); the most common cause is a typo like ```pyhton or ```jsons. Catching them in the linter is faster than spotting the unhighlighted block in a preview. The rule is advisory; projects use long-tail languages mdwright cannot anticipate, so a flagged unknown info string might be legitimate.

The shipped allowlist covers the languages this project uses (Rust, Python, Lean, …) plus the usual web stack (HTML, CSS, JS/TS, JSON, YAML, …) and common shell/console variants. Extend the allowlist via [lint.info-strings] extra = […] in your config rather than disabling the rule.

Example (bad)

```pyhton
def f(): pass
```

Example (good)

```python
def f(): pass
```

Configuration

  • Extend allowlist: [lint.info-strings] extra = ["mycustomlang", "vendor-dsl"].
  • Disable inline: <!-- mdwright: allow info-string-typo -->.
  • Disable in config: [lint] ignore = ["info-string-typo"].
  • Severity: advisory.

References


name: latex-command default: false advisory: false fix: true since: 0.1.0

latex-command

LaTeX control sequence in prose (opt-in for Unicode-math projects).

What it does

Flags TeX-style \command{…} invocations in prose (outside math regions), for example \textbf{important} or \emph{see below}.

Why

LaTeX commands in Markdown prose render literally in most renderers, breaking the visual result. Authors who write \textbf{x} almost always wanted Markdown's **x** instead. Projects targeting Pandoc may legitimately use LaTeX commands; for them, this rule is opt-in and stays off.

The autofix is conservative: it replaces the command with the equivalent Markdown construct where one exists (\textbf{x}**x**, \emph{x}*x*) and skips otherwise.

Example (bad)

This is \textbf{important}.

Example (good)

This is **important**.

Configuration

  • This rule is off by default. Enable with [lint] extend-select = ["latex-command"].
  • Disable inline: <!-- mdwright: allow latex-command -->.
  • Severity: non-advisory. Safe autofix where a Markdown equivalent exists.

References

  • mdwright's command list (src/stdlib/latex_command.rs).

name: list-tightness-flipped default: false advisory: true fix: false since: 0.2.0

list-tightness-flipped

list tightness from the tree disagrees with tightness from source bytes

What it does

Flags lists whose items mix tight (single-paragraph) and loose (blank-line-separated) shapes across the same list, leaving CommonMark's spec-defined "tightness" algorithm to make a surprising choice.

Why

CommonMark decides a list is "loose" if any item has a blank line between it and the next. That single blank line then re-renders every item with <p> wrappers, which adds vertical padding throughout the list. Authors who write one stray blank line frequently don't notice the cascading effect on items above and below.

This rule is advisory because the "wrong" tightness is rarely a bug per se, but the surprise is consistent enough that flagging it is worth a one-line nudge.

Example (bad)

- first
- second

- third

(Loose: every item gets <p> wrappers because of the blank line above third.)

Example (good)

Tight throughout:

- first
- second
- third

Or loose throughout:

- first

- second

- third

Configuration

  • This rule is off by default (opt-in). Enable with [lint] extend-select = ["list-tightness-flipped"].
  • Disable inline: <!-- mdwright: allow list-tightness-flipped -->.
  • Severity: advisory (does not fail --check).

References


name: orphan-reference-link default: true advisory: false fix: false since: 0.1.0

orphan-reference-link

Reference-style link with no matching [label]: definition.

What it does

Flags reference-style links of the form [text][label] or shortcut form [label] where label has no matching [label]: … definition anywhere in the document.

Why

CommonMark renders an unresolved reference link as literal text; [text][label] shows up in the output as [text][label] rather than as a clickable link. This silently breaks navigation, usually because the author renamed a link definition without updating its references (or vice versa).

Example (bad)

See the [installation guide][install] for details.

[setup]: docs/setup.md

Example (good)

See the [installation guide][install] for details.

[install]: docs/install.md

Configuration

  • Disable inline: <!-- mdwright: allow orphan-reference-link -->.
  • Disable in config: [lint] ignore = ["orphan-reference-link"].
  • Severity: non-advisory.

References


name: stray-dollar default: false advisory: false fix: true since: 0.1.0

stray-dollar

Literal $ in prose (opt-in for projects that don't use $…$ math).

What it does

Flags $ characters in prose that are not part of a recognised math region.

Why

Some Markdown renderers (Pandoc with --mathjax, KaTeX-aware mdBook, GitHub) treat $…$ as inline math. Authors who rely on TeX-style \( … \) instead can accidentally produce math output where they wanted a literal $5, and authors who rely on $…$ for math can produce prose where they meant math. Either way, a stray single $ in prose is almost always a typo.

This rule is opt-in because projects that consistently use $…$ for math have no use for it. The linter would flood with false positives. Turn it on in projects that standardise on \( … \) or no inline math at all.

The autofix escapes the dollar (\$); review before applying.

Example (bad)

That costs $5.

Example (good)

That costs \$5.

Configuration

  • This rule is off by default. Enable with [lint] extend-select = ["stray-dollar"].
  • Disable inline: <!-- mdwright: allow stray-dollar -->.
  • Severity: non-advisory. Safe autofix available.

References

  • Pandoc Markdown: math extension.
  • mdwright treats \( … \), \[ … \], and named environments as math regions; $…$ is excluded by default because it conflicts with literal-dollar prose.

name: subscript-damage default: false advisory: false fix: true since: 0.2.0

subscript-damage

Identifier with * where a _ subscript was expected (formatter damage).

What it does

Flags damaged subscript notation produced by older roundtrip Markdown tools: patterns like x\_i (escaped underscore that was supposed to be a TeX subscript) where the surrounding context confirms a math intent (digits, single-letter identifiers, sign-and-digit pairs).

Why

Older mdformat versions and a few other tools defensively escaped _ inside what looked like prose, even when the underscore was a math subscript. The result is x\_i in the source, which renders as x_i literally, not as a subscript. The rule finds these and proposes either the TeX form (x_i inside math) or the Unicode form (xᵢ) depending on context.

The autofix is conservative: it removes the backslash if the surrounding context is unambiguously math; otherwise it leaves the source untouched and prints the diagnostic only.

Example (bad)

Take the i-th element x\_i.

Example (good)

Take the i-th element xᵢ.

(Or, inside math: Take the i-th element $x_i$.)

Configuration

  • This rule is off by default. Enable with [lint] extend-select = ["subscript-damage"].
  • Disable inline: <!-- mdwright: allow subscript-damage -->.
  • Severity: non-advisory. Safe autofix where context is unambiguous.

References


name: trailing-whitespace default: true advisory: false fix: true since: 0.1.0

trailing-whitespace

Trailing whitespace at end of line.

What it does

Flags lines that end with one or more trailing space or tab characters. The exception is the CommonMark hard-break convention: exactly two trailing spaces followed by a newline introduces a <br> inside a paragraph, and that case is left alone.

Why

Trailing whitespace is invisible noise that survives copy-paste, complicates diffs (one-byte changes that touch every line), and frequently triggers spurious changes when collaborators have different editor settings. The autofix strips the trailing run while preserving the two-space hard break form.

Example (bad)

A paragraph.···
Another line.·

(· represents a stray trailing space.)

Example (good)

A paragraph.
Another line.

Configuration

  • Disable inline: <!-- mdwright: allow trailing-whitespace -->.
  • Disable in config: [lint] ignore = ["trailing-whitespace"].
  • Severity: non-advisory. Safe autofix available.

References


name: unbalanced-backtick default: true advisory: false fix: false since: 0.1.0

unbalanced-backtick

Backtick in prose that could not be paired with a closing fence.

What it does

Flags inline code spans whose backtick fence was not closed before the end of a paragraph or the end of the document, e.g. `foo with no matching `.

Why

CommonMark's inline code rule requires the same number of opening and closing backticks. An unclosed opener silently turns the rest of the paragraph into prose (it is not rendered as a code span), so the visual result drifts from what the author meant. Worse, the unclosed run often eats markup intended for later constructs (links, emphasis), producing a cascade of silently broken rendering.

Example (bad)

Run `cargo build to compile.

Example (good)

Run `cargo build` to compile.

Configuration

  • Disable inline: <!-- mdwright: allow unbalanced-backtick -->.
  • Disable in config: [lint] ignore = ["unbalanced-backtick"].
  • Severity: non-advisory.

References


name: unicodeable-subscript default: true advisory: true fix: true since: 0.2.0

unicodeable-subscript

Braced super/subscript that has a single-codepoint Unicode form.

What it does

Flags math-region subscripts whose contents are simple enough to express as Unicode subscript characters: single digits, single letters that have a Unicode subscript codepoint, and short sign/digit pairs, when the surrounding context is prose rather than display math.

Why

In running prose like the i-th element x_i, the TeX form x_i renders as a code-styled fragment in most renderers; the Unicode form xᵢ is cleaner, screen-reader-friendly, and copyable. The rule fires only when the substitution is unambiguous and the surrounding context does not already use TeX-style display math.

The autofix substitutes the Unicode form (safe: false); review the change before applying.

Example (bad)

Take the i-th component, x_i.

Example (good)

Take the i-th component, xᵢ.

Configuration

  • Disable inline: <!-- mdwright: allow unicodeable-subscript -->.
  • Disable in config: [lint] ignore = ["unicodeable-subscript"].
  • Severity: advisory.

References

  • Unicode subscript block: U+2080U+209F.
  • mdwright's substitution table (mdwright-math/src/unicode.rs).

name: math/unbalanced-braces default: true advisory: false fix: false since: 0.1.0

math/unbalanced-braces

{ / } inside a math body do not balance; math body normalisation is skipped for that region.

What it does

Inside a math region, flags { and } whose depths do not balance: either an extra opening brace with no matching close, or a stray }.

Why

Math renderers (KaTeX, MathJax, pdflatex) all reject unbalanced braces, but they fail with opaque messages far from the source location. Catching the imbalance in the linter pinpoints the offending region in the markdown source, before the math reaches the renderer.

The check only runs inside math regions identified by [math/unbalanced-delim] and [math/unbalanced-env]; balanced-brace checking in prose would be noise, since prose { and } are not paired.

Example (bad)

$$\sum_{i=1^n i$$

Example (good)

$$\sum_{i=1}^n i$$

Configuration

  • Disable inline: <!-- mdwright: allow math/unbalanced-braces -->.
  • Disable in config: [lint] ignore = ["math/unbalanced-braces"].
  • Severity: non-advisory.

References

  • mdwright's brace scanner (src/stdlib/math_unbalanced_braces.rs).
  • TeX: braces group arguments to commands such as \frac{a}{b} and \sum_{i}.

name: math/unbalanced-delim default: true advisory: false fix: false since: 0.1.0

math/unbalanced-delim

TeX-style math open delimiter (\[, \(, $$, $) with no matching close.

What it does

Flags display-math openers (\[) and inline-math openers (\() that have no matching closer (\] / \)) before the end of the document.

Why

\[ … \] and \( … \) are TeX-style math delimiters. mdwright treats the region between an opener and its closer as math: it suspends prose lint rules inside, and the formatter passes the bytes through verbatim. An unbalanced opener means we cannot tell where math ends; every following prose rule misreads the rest of the document, and the formatter might break the content's rendering.

The check runs before any prose rule fires, so this is the first diagnostic you should fix in a document.

Example (bad)

The Laplacian is \[ \Delta f = \sum_i \partial_i^2 f

Example (good)

The Laplacian is \[ \Delta f = \sum_i \partial_i^2 f \].

If you wanted a literal \[ in prose, escape it: \\[.

Configuration

  • Disable inline: <!-- mdwright: allow math/unbalanced-delim -->.
  • Disable in config: [lint] ignore = ["math/unbalanced-delim"].
  • Severity: non-advisory (fails mdwright check --check).

References

  • mdwright's math-region recogniser (src/stdlib/math_unbalanced_delim.rs).
  • LaTeX: \[ … \] is the unnumbered display-math environment; \( … \) is the inline form.

name: math/unbalanced-env default: true advisory: false fix: false since: 0.1.0

math/unbalanced-env

LaTeX \begin{env} with no matching \end{env} at the same nesting depth.

What it does

Flags TeX \begin{env} blocks that have no matching \end{env} (or vice versa), where env is one of the math environments mdwright tracks (equation, align, aligned, cases, matrix, pmatrix, bmatrix, vmatrix, Vmatrix, gather, multline, split, and their starred variants).

Why

\begin{align} … \end{align} and friends are math regions. Like \[ … \], they suspend prose lint rules and are passed through the formatter verbatim. An unmatched \begin leaves the parser unable to tell where math ends; an unmatched \end is almost always a copy-paste error that will silently break rendering in any math-aware renderer.

Example (bad)

\begin{align}
  a + b &= c \\
  d - e &= f

Example (good)

\begin{align}
  a + b &= c \\
  d - e &= f
\end{align}

Configuration

  • Disable inline: <!-- mdwright: allow math/unbalanced-env -->.
  • Disable in config: [lint] ignore = ["math/unbalanced-env"].
  • Severity: non-advisory.

References

  • mdwright's environment recogniser (src/stdlib/math_unbalanced_env.rs).
  • amsmath user's guide for the canonical environment list.

Pre-commit

mdwright ships a .pre-commit-hooks.yaml at its repo root, so adding it to a project that uses the pre-commit framework is a single repos: entry.

Quickest path: prebuilt binary

If contributors already have mdwright on their $PATH (e.g. via cargo binstall mdwright or a GitHub release tarball), the -system variants avoid any toolchain dance:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/jcreinhold/mdwright
    rev: v0.1.0
    hooks:
      - id: mdwright-check-system
      - id: mdwright-fmt-check-system

mdwright-check-system runs mdwright check --check; mdwright-fmt-check-system runs mdwright fmt-check. Both exit non-zero on issues, blocking the commit.

Letting pre-commit build mdwright

If you don't want to require an out-of-band install, the source-build hooks invoke cargo run -p mdwright from the checked-out repository. First commit after a clean cache takes ~30 s; subsequent runs reuse Cargo's cache.

repos:
  - repo: https://github.com/jcreinhold/mdwright
    rev: v0.1.0
    hooks:
      - id: mdwright-check
      - id: mdwright-fmt-check

Each contributor needs a Rust toolchain on the machine running the hook.

Available hook IDs

IDEquivalent CLILanguage
mdwright-checkmdwright check --checksystem via Cargo
mdwright-fmtmdwright fmtsystem via Cargo
mdwright-fmt-checkmdwright fmt-checksystem via Cargo
mdwright-check-systemmdwright check --checksystem
mdwright-fmt-systemmdwright fmtsystem
mdwright-fmt-check-systemmdwright fmt-checksystem

The mdwright-fmt / mdwright-fmt-system hooks rewrite files in place; combine with git add in a post-formatting workflow, or prefer mdwright-fmt-check in CI gates that should never auto-commit.

Performance notes

pre-commit invokes hooks once per batch of matching files, not once per file, so per-invocation startup cost is paid once per git commit (not once per changed file). The binary's cold-start is well under 50 ms on Linux release builds.

See also

GitHub Actions

Lint and format-check Markdown on every push and pull request.

Quickest path: the bundled composite action

mdwright publishes a composite action at the repo root (action.yml). It fetches the prebuilt binary from the matching GitHub release and runs whatever mdwright command you pass:

name: markdown
on:
  push:
    branches: [main]
  pull_request:

jobs:
  mdwright:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: jcreinhold/mdwright@v0.1.0
        with:
          args: check --check .
      - uses: jcreinhold/mdwright@v0.1.0
        with:
          args: fmt-check .

args defaults to check --check .. Pin the version to a tag (@v0.1.0) rather than @main so upstream releases don't silently rebreak your CI.

The action ships prebuilt binaries for ubuntu-latest (x86_64-unknown-linux-gnu) and macos-latest (aarch64-apple-darwin). Other targets fall back to the source-build recipe below.

Source-build fallback

For Windows runners or any platform we don't ship a prebuilt for, install from source:

jobs:
  mdwright:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - uses: Swatinem/rust-cache@v2
      - run: cargo install mdwright --locked
      - run: mdwright check --check .
      - run: mdwright fmt-check .

The Swatinem/rust-cache@v2 step keeps subsequent runs around 5 s once the cache is warm; cold builds take a couple of minutes.

cargo-binstall

If you want the binary speed of the composite action without depending on the action's action.yml contract, run cargo-binstall directly:

      - uses: cargo-bins/cargo-binstall@main
      - run: cargo binstall --no-confirm mdwright
      - run: mdwright check --check .

This resolves the same release artifacts and skips the compile step.

Reading the output in PR annotations

mdwright's pretty output is human-readable in the Actions log. For PR annotations (squiggles in the GitHub UI), pipe JSON v2 through a converter; there is no first-class action yet, but the schema is documented at Diagnostic schema and stable across 0.x.

See also

Editor integration

mdwright ships a built-in language server. Point your editor at mdwright lsp and you get diagnostics, hover docs, quick-fixes, and on-save / on-type formatting without configuring an external formatter command.

The smallest possible config is Helix:

[language-server.mdwright]
command = "mdwright"
args = ["lsp"]

[[language]]
name = "markdown"
language-servers = ["mdwright"]

Position encoding gotcha

mdwright advertises UTF-8 position encoding. Clients that negotiate UTF-8 (VS Code 1.74+, Helix, Zed, neovim 0.10+) get the full surface: diagnostics, hover, formatting, range formatting, on-type formatting, and code actions. Clients that only support UTF-16 get diagnostics + hover; formatting and code-action providers are withdrawn rather than risk corrupting non-ASCII sources, and a warning is logged via window/logMessage. Check your editor's LSP log if formatting unexpectedly does nothing, it usually means the client never granted UTF-8.

VS Code

mdwright does not publish a dedicated VS Code extension. Install a generic LSP-client extension that lets you point at an arbitrary LSP binary, then configure it to launch mdwright lsp:

{
  "[markdown]": {
    "editor.defaultFormatter": "<your-lsp-client-extension-id>"
  },
  "yourLspClient.servers": [
    {
      "command": "mdwright",
      "args": ["lsp"],
      "languages": ["markdown"]
    }
  ]
}

Helix

Add to ~/.config/helix/languages.toml (or the workspace .helix/languages.toml):

[language-server.mdwright]
command = "mdwright"
args = ["lsp"]

[[language]]
name = "markdown"
language-servers = ["mdwright"]
auto-format = true

:lsp-restart after editing. Helix's space-a opens the code-action menu; pick Fix `bare-url`: … for a single diagnostic or Apply all mdwright safe fixes to run every safe quick-fix at once.

Zed

Add to ~/.config/zed/settings.json:

{
  "lsp": {
    "mdwright": {
      "binary": {
        "path": "mdwright",
        "arguments": ["lsp"]
      }
    }
  },
  "languages": {
    "Markdown": {
      "language_servers": ["mdwright"],
      "format_on_save": "on"
    }
  }
}

Neovim

Using nvim-lspconfig on neovim 0.10+:

local lspconfig = require("lspconfig")
local configs = require("lspconfig.configs")

if not configs.mdwright then
  configs.mdwright = {
    default_config = {
      cmd = { "mdwright", "lsp" },
      filetypes = { "markdown" },
      root_dir = lspconfig.util.find_git_ancestor,
      settings = {},
    },
  }
end

lspconfig.mdwright.setup({
  on_attach = function(_, bufnr)
    vim.api.nvim_create_autocmd("BufWritePre", {
      buffer = bufnr,
      callback = function() vim.lsp.buf.format({ async = false }) end,
    })
  end,
})

Configuration

The server discovers .mdwright.toml, mdwright.toml, or pyproject.toml's [tool.mdwright] table by walking up from the workspace root, exactly like the CLI. Edit one of those files and the server re-lints every open buffer on the next file-watcher event; workspace/didChangeConfiguration triggers the same refresh.

The LSP server keeps the same default input-size boundary as the CLI: a single open buffer above 10 MB stays open, but mdwright publishes one document-level diagnostic and suppresses linting, formatting, range formatting, and code actions for that version.

Range-formatting caveats

textDocument/rangeFormatting and textDocument/onTypeFormatting snap the requested range out to the nearest whole top-level block before formatting. For sources without document-scope reorderable constructs the snapped output is a verbatim substring of the whole-document format; link definitions ([label]: dest) and footnote definitions ([^label]: …) are document-scope, so a range format may leave them in place where a whole-document format would have moved them to the canonical location. Save the file (or invoke whole-document formatting) periodically to reconcile.

Smoke test

Before publishing an editor integration, run this manual check:

  1. Start the server with mdwright lsp.
  2. Open a Markdown file that contains https://example.com and confirm the bare-url diagnostic appears.
  3. Insert - [n]:Z followed by a carriage return, newline, and two tabs. The server should publish one parser diagnostic at the start of the file and keep running.
  4. Replace the file contents with valid Markdown. Normal diagnostics should return without restarting the server.
  5. Run whole-document formatting and range formatting on a paragraph that mdwright changes.
  6. Edit .mdwright.toml and trigger your editor's LSP config reload or file-watcher refresh. Open buffers should be re-linted with the new policy.
  7. Check the editor's LSP log if formatting is unavailable; the common cause is a client that did not negotiate UTF-8 positions.

See also

CI recipes

Snippets for CI providers other than GitHub Actions. All assume mdwright is on $PATH.

GitLab CI

mdwright:
  image: rust:1.91
  cache:
    paths:
      - .cargo/
  script:
    - cargo install --root .cargo mdwright --locked
    - ./.cargo/bin/mdwright check --check .
    - ./.cargo/bin/mdwright fmt-check .
  rules:
    - changes:
        - "**/*.md"

CircleCI

version: 2.1
jobs:
  mdwright:
    docker:
      - image: cimg/rust:1.91
    steps:
      - checkout
      - run: cargo install mdwright --locked
      - run: mdwright check --check .
      - run: mdwright fmt-check .
workflows:
  docs:
    jobs:
      - mdwright

Buildkite

steps:
  - label: ":memo: mdwright"
    command: |
      cargo install mdwright --locked
      mdwright check --check .
      mdwright fmt-check .
    plugins:
      - docker#v5.10.0:
          image: rust:1.91

Drone

kind: pipeline
name: mdwright

steps:
  - name: mdwright
    image: rust:1.91
    commands:
      - cargo install mdwright --locked
      - mdwright check --check .
      - mdwright fmt-check .

Bare-metal / cron

A nightly job that lints a docs corpus and posts a report:

#!/usr/bin/env bash
set -euo pipefail
cd "$DOCS_REPO"
git pull --quiet
mdwright check --format=json . > /tmp/mdwright-report.jsonl
jq -s 'length' /tmp/mdwright-report.jsonl | xargs -I {} \
  echo "mdwright: {} diagnostics in $DOCS_REPO"

The JSON v2 schema is stable; consume it programmatically (see Diagnostic schema).

See also

Writing a lint rule

A lint rule is a type that implements LintRule. Rules see the parsed document via a curated query surface and emit Diagnostic values. mdwright ships nineteen stdlib rules; this page shows how to add an external twentieth without forking the binary.

The trait

#![allow(unused)]
fn main() {
pub trait LintRule: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    fn check(&self, doc: &Document, out: &mut Vec<Diagnostic>);

    fn is_default(&self) -> bool { true }
    fn is_advisory(&self) -> bool { false }
    fn produces_fix(&self) -> bool { false }
    fn explain(&self) -> &str { "" }
}
}
  • name is the kebab-case identifier ("no-todo-in-prose"); the dispatcher stamps it onto each emitted diagnostic.
  • description is the one-line summary shown by mdwright list-rules.
  • check reads the Document and pushes Diagnostic values.
  • is_default controls whether the rule fires under --rules default and under the config default lint.preset = "default" when --rules is omitted.
  • is_advisory makes diagnostics informational; they do not fail mdwright check --check.
  • produces_fix claims that at least one diagnostic carries a Fix.
  • explain is the long-form markdown shown by mdwright explain <name>.

Worked example: no-todo-in-prose

A rule that flags TODO (case-sensitive) inside paragraph text but not inside code blocks, inline code, math regions, or HTML blocks: Document::prose_chunks() handles every skip for you.

#![allow(unused)]
fn main() {
use mdwright_document::Document;
use mdwright_lint::{Diagnostic, LintRule};

pub struct NoTodoInProse;

impl LintRule for NoTodoInProse {
    fn name(&self) -> &str {
        "no-todo-in-prose"
    }

    fn description(&self) -> &str {
        "Literal TODO in paragraph text"
    }

    fn explain(&self) -> &str {
        "TODOs in user-facing documentation are usually accidents. \
         Track pending work in an issue tracker, or suppress this \
         rule with `<!-- mdwright: allow no-todo-in-prose -->`."
    }

    fn check(&self, doc: &Document, out: &mut Vec<Diagnostic>) {
        for slice in doc.prose_chunks() {
            for (offset, _) in slice.text.match_indices("TODO") {
                if let Some(d) = Diagnostic::at(
                    doc,
                    slice.byte_offset,
                    offset..offset + "TODO".len(),
                    "literal `TODO` in prose".to_owned(),
                    None,
                ) {
                    out.push(d);
                }
            }
        }
    }
}
}

Diagnostic::at performs the byte-offset arithmetic and line-index lookup. It returns Option because pathological offsets could fall outside the source; on failure the diagnostic is dropped rather than the rule panicking.

Registering the rule

Add it to a RuleSet and lint:

#![allow(unused)]
fn main() {
use mdwright_document::Document;
use mdwright_lint::RuleSet;
use mdwright_lint::{Diagnostic, LintRule};
struct NoTodoInProse;
impl LintRule for NoTodoInProse {
    fn name(&self) -> &str { "no-todo-in-prose" }
    fn description(&self) -> &str { "" }
    fn check(&self, _: &Document, _: &mut Vec<Diagnostic>) {}
}
let mut rules = RuleSet::stdlib_defaults();
rules.add(Box::new(NoTodoInProse)).expect("unique name");

let doc = Document::parse("My TODO: write the docs.")?;
let diagnostics = rules.check(&doc);
}

RuleSet::add returns Result<&mut Self, DuplicateRuleName> so two rules with the same name fail fast.

Shipping a custom binary

The CLI crate entry point mdwright::run_with_rules takes your assembled RuleSet and runs the whole CLI on top of it; clap parsing, config discovery, every output format, the LSP server, the suppression machinery. Your main is ten lines:

use mdwright_lint::stdlib;
struct NoTodoInProse;
impl mdwright_lint::LintRule for NoTodoInProse {
    fn name(&self) -> &str { "no-todo-in-prose" }
    fn description(&self) -> &str { "" }
    fn check(&self, _: &mdwright_document::Document, _: &mut Vec<mdwright_lint::Diagnostic>) {}
}

fn main() -> std::process::ExitCode {
    let mut rules = stdlib::all();
    rules.add(Box::new(NoTodoInProse)).expect("unique name");
    mdwright::run_with_rules(rules)
}

Pass stdlib::all() (not stdlib::defaults()) so every opt-in stdlib rule remains selectable via --rules. run_with_rules filters down from this pool based on the user's --rules argument and the active configuration file.

A complete working sample lives at examples/extending/ in the mdwright repository. The same crate has integration tests that prove the rule fires end-to-end.

Publishing your custom binary

Your binary is just a crate. Push to crates.io with cargo publish (we recommend a name like <org>-mdwright so users distinguish it from the official binary), or distribute the compiled artifact directly. Downstream users install your binary and run it in place of mdwright; the command-line interface is identical.

Caveats

  • Config-driven rule reconfiguration applies to stdlib rules only. The [lint.info-strings] extra option, for example, mutates the stdlib info-string-typo rule even when a downstream binary has registered its own implementation under that name. Downstream rules read their own configuration; mdwright does not route config keys into them.
  • mdwright does not load lint rules at runtime. See Plugin loading for the rationale and the comparison of dynamic-loading alternatives we considered.

See also

Plugin loading

mdwright does not load lint rules at runtime. The supported extension path is the one Writing a lint rule describes: depend on mdwright-document and mdwright-lint, implement LintRule, call mdwright::run_with_rules, and ship your own binary. This page explains why: what dynamic loading would buy, what it would cost, and what would have to change for the decision to flip.

The decision

ArchitectureVerdictAvailable in
A. Component crates + custom binarySupportedtoday
B. Dynamic cdylib loading via libloadingRejectednever
C. WASM plugins via wasmtimeNot planned

The same trio shipped in ruff, which is mdwright's closest analogue in spirit. ruff thrives without a plugin runtime; the trait surface plus a documented "ship your own binary" path covers every adopter who hits the limits of the stdlib.

Architecture A: Supported

A user writes a rule in their own crate, depends on mdwright-document, mdwright-lint, and mdwright from crates.io, and ships a small binary:

use mdwright_lint::stdlib;
struct MyRule;
impl mdwright_lint::LintRule for MyRule {
    fn name(&self) -> &str { "my-rule" }
    fn description(&self) -> &str { "" }
    fn check(&self, _: &mdwright_document::Document, _: &mut Vec<mdwright_lint::Diagnostic>) {}
}

fn main() -> std::process::ExitCode {
    let mut rules = stdlib::all();
    rules.add(Box::new(MyRule)).expect("unique name");
    mdwright::run_with_rules(rules)
}
CapabilityFull library access. Any rule the stdlib could write, an external rule can write.
ComplexityOne CLI-crate function (run_with_rules). The rest is the trait that already shipped.
Cost to userThey ship a Rust binary. CI needs cargo. They pin a major version of mdwright.
Cost to maintainerNone new. The LintRule trait and the mdwright::run_with_rules signature are the surface; semver protects them.
Semver implicationsLintRule is 1.0-grade. cli::run_with_rules is a fn(RuleSet) -> ExitCode; that signature is stable.

This is what mdwright ships, in the mdwright crate and the examples/extending/ workspace member.

Architecture B: Dynamic Loading via libloading (Rejected)

.mdwright.toml:

[plugins]
my_rules = "./target/release/libmy_rules.dylib"

mdwright would load each cdylib at startup and look up a extern "Rust" fn mdwright_register(&mut Registry) symbol.

CapabilityAnything Rust can express.
ComplexityA libloading integration, a Registry shim, a plugin ABI versioning story.
Cost to userThey build a cdylib and put it in a path. First-run UX is opaque when the path is wrong.
Cost to maintainerSubstantial. The ABI surface is every type a plugin touches, including Diagnostic, Document, and every accessor. Rust has no stable ABI. Every Rust release risks breaking every plugin.
Semver implicationsrepr(Rust) types cross the boundary; layout is unspecified. Every release becomes an ABI compatibility check.

Verdict: rejected. The maintenance burden is high, the gain over Architecture A is small (a cargo build versus an in-process load), and Rust's lack of a stable ABI makes the contract perpetually fragile. Linking a single cdylib into the official binary buys nothing a custom binary doesn't already give you.

Architecture C: WASM via wasmtime (Not planned)

.mdwright.toml:

[plugins]
my_rules = "./my-rules.wasm"

The plugin compiles to WebAssembly; mdwright runs it in a wasmtime sandbox, serialising documents and diagnostics across the boundary.

CapabilityRestricted to whatever API mdwright exposes through the host bindings.
ComplexityDefine and document a sandbox API; write host bindings; serialise Document and Diagnostic (no zero-copy across the boundary); manage WASM startup cost per file.
Cost to userPlugin authors learn wasm-bindgen-style discipline; the trait is harder to use than the native one.
Cost to maintainerMaintain the WASM API forever, plus a reference implementation, plus a performance story (parsing each file twice, once natively and once through the boundary, is not free).
Semver implicationsThe WASM API is its own semver surface, parallel to the native LintRule trait.

Verdict: not planned. The cost is real and the demand is hypothetical. Revisit only when a concrete adopter has tried Architecture A, hit a specific limit (sandbox isolation, language diversity, hot reload), and articulated what the WASM contract would need to address.

What would change this decision

Architecture B is unlikely to ever become attractive: Rust would have to grow a stable ABI, which is not on any horizon, and even then the cost-benefit against custom binaries would barely move.

Architecture C is more interesting in principle. If you have a use case where:

  • the rule body is in a language other than Rust, or
  • you need to load and unload rules without restarting mdwright, or
  • a sandboxed evaluation model is a hard requirement (e.g. running rules submitted by untrusted contributors),

please open an issue describing the concrete adoption story. A motivated maintainer behind a real need is the precondition for revisiting this.

Until then: depend on the library, write the rule, ship a binary. The example at examples/extending/ in the repository is ready to fork.

Architecture

The design intent. Read this before you change document recognition, linting, or formatting.

Workspace boundaries

Each crate hides a different kind of knowledge. Read the layers top-down as a dependency stack: each depends only on layers below it:

Surfaces      mdwright (CLI)        mdwright-lsp
Engines       mdwright-format       mdwright-lint
Glue          mdwright-config
Document      mdwright-document
Math spans    mdwright-math
TeX bodies    mdwright-latex
  • mdwright-latex: TeX math-body parsing, Unicode layout, command vocabulary, and source translation.
  • mdwright-math: Markdown math-span scanning and normalisation.
  • mdwright-document: source coordinates, pulldown invocation, parse options, recognised Markdown facts.
  • mdwright-config: interprets user config files into document, format, and lint policy.
  • mdwright-format: formatting options, rewrite-family planning, and verification.
  • mdwright-lint: diagnostics, rule execution, suppression, safe fixes.
  • mdwright: the command-line binary: file discovery, terminal output, process exit policy.
  • mdwright-lsp: editor-state delivery over LSP.

The repository root is a virtual workspace. There is no facade crate; library users depend directly on the crate that owns the capability they need.

Document facts

Document is parse/query only. It wraps the original source, canonical source mapping, line index, pulldown-derived events, references, lists, code and HTML exclusion ranges, heading attributes, frontmatter, and math regions. Lint rules and formatter rewrite producers consume these immutable facts instead of invoking pulldown independently.

Recognition policy lives in ParseOptions. Formatting policy lives in FmtOptions.

Math regions

The Markdown math scanner lives in mdwright-math and knows only about strings and byte ranges. The document crate supplies Markdown exclusion ranges, stores accepted math regions, and gives downstream crates one stable inventory to query. TeX body parsing and Unicode rendering belong in mdwright-latex, so Markdown delimiter policy does not leak into the TeX parser.

This is the design choice that makes mdwright math-resilient. See Math regions for the user-facing view.

Formatting

Default formatting is identity emit: source bytes survive unchanged except for document-boundary policies. Opt-in style canonicalisation and wrapping run through private rewrite families owned by the current document snapshot.

Only private rewrite-family code in mdwright-format may apply formatter byte edits. It runs families in a fixed order, rejects local overlaps within each family, applies a family plan to a scratch buffer, and verifies Markdown and math signatures before committing the whole plan. It does not expose partial family progress as successful formatting.

Linting

RuleSet owns rule execution. Callers parse a Document, then call rules.check(&doc) or rules.check_with(&doc, opts). Suppressions, diagnostic sorting, standard-rule registry construction, and safe-fix application are lint-crate details.

Doc tests

The crates/mdwright/tests/docs_examples.rs suite walks docs/src/**/*.md and validates every fenced code block:

  • ```markdown / ```md → must parse with pulldown-cmark (no panic; non-empty event stream for non-empty input).
  • ```toml → must parse with Config::load_explicit.
  • ```toml,no-check → skipped. Use this fence for non-config TOML (e.g. book.toml, pyproject.toml excerpts that show structure but are not valid config payloads).

A PR that introduces a broken example fails CI. The convention is invisible to mdBook (which treats the language tag as a CSS class) but the test sees it.

Where to look

Want to change…Edit…
A lint rulecrates/mdwright-lint/src/stdlib/<rule>.rs + its explanation
Document recognitioncrates/mdwright-document/src/
Math body languagecrates/mdwright-latex/src/
Math span recognitioncrates/mdwright-math/src/
Formatter rewritescrates/mdwright-format/src/format/
Wrap algorithmcrates/mdwright-format/src/format/wrap_pass.rs
Config schemacrates/mdwright-config/src/config.rs + xtask/src/config_docs.rs
CLI surfacecrates/mdwright/src/cli.rs
LSP surfacecrates/mdwright-lsp/src/lsp.rs

Crate boundaries

mdwright is a virtual workspace. Each crate hides a different volatile decision; library users depend directly on the component crate they need. The repository root is a workspace manifest, not a package—there is no facade library and no root binary.

Cargo.toml               # virtual workspace root, no package targets
crates/mdwright          # command-line package and `mdwright` binary
crates/mdwright-document # parsed Markdown facts with stable source coords
crates/mdwright-latex    # TeX and Unicode math-body lexing, parsing, layout, translation
crates/mdwright-math     # Markdown math-span recognition and normalisation
crates/mdwright-format   # formatter policy, rewrite-family planning, oracles
crates/mdwright-lint     # diagnostics, rule execution, suppression, safe fixes
crates/mdwright-config   # TOML schema, discovery, resolved option construction
crates/mdwright-lsp      # tower-lsp server and editor-state bridge

Dependency direction:

       mdwright-latex
          │  │
          │  └─────────────┐
          │                │
       mdwright-math
            │
       mdwright-document
         │         │
mdwright-format   mdwright-lint
         │         │
         mdwright-config
            │
   mdwright / mdwright-lsp

What each crate owns

CrateHides
mdwright-latexTeX/LaTeX and Unicode math-body lexing, parsing, command vocabulary, Unicode layout, and source translation.
mdwright-mathMarkdown math delimiter and environment recognition; extraction of math bodies from source.
mdwright-documentCommonMark/pulldown quirks, GFM extension overlays, source-coordinate invariants, parser-panic containment. Owns the only production pulldown-cmark chokepoint.
mdwright-formatFormatter style policy, rewrite-family planning, local ownership checks, semantic verification.
mdwright-lintRule dispatch, suppressions, diagnostic shape, safe-fix edit ordering, standard-rule registry.
mdwright-configTOML schema and discovery rules; resolves into the per-crate option types.
mdwrightFile discovery, argument parsing, terminal output, parallel execution, exit policy.
mdwright-lspEditor-state delivery over LSP.

The document crate is a parse/query abstraction; formatting and linting are operations owned by the crates that hide their algorithms. Other crates consume document facts as domain records (structural spans, paragraphs, list-marker sites, inline delimiter slots, heading attribute trailers, link destination slots, math regions, frontmatter, code/HTML exclusions, top-level checkpoints) and do not couple to pulldown's event vocabulary, offset iterator, panic payloads, or backtraces. Markdown math-region recognition and TeX math-body parsing are separate boundaries: mdwright-math recognises where math lives in Markdown, while mdwright-latex owns the language inside those regions. That ownership includes parser-backed Unicode-to-LaTeX translation: unsupported Unicode remains visible and records diagnostics or losses instead of being silently guessed. Lint rules that need LaTeX vocabulary facts depend on mdwright-latex directly rather than copying command tables or asking mdwright-math to pass them through. pulldown_model tests may import pulldown directly because they deliberately probe upstream drift.

Public API entry points

  • mdwright_document::Document::parse_with_options(source, ParseOptions) -> Result<Document, ParseError> parses fallibly at the parser trust boundary. The returned Document stores its ParseOptions; formatter entry points read that policy from the Document so every rewrite, semantic signature, verification reparse, and range-format checkpoint uses the same recognition.
  • mdwright_format::{format_document, format_validated} over a parsed Document.
  • format_source(source, opts) is the convenience path for default parse policy.
  • mdwright_lint::RuleSet::{check, check_with} for lint dispatch; mdwright_lint::apply_safe_fixes for safe-fix application.
  • The mdwright package exposes command-extension helpers such as run_with_rules but is otherwise a binary, not a library.

ExtensionOptions, MystOptions, and PandocOptions are document parse policy under ParseOptions. GFM extension policy is parse.extensions.gfm.autolinks and parse.extensions.gfm.tagfilter; the document crate exposes autolinks as general AutolinkFact values rather than URL-specific GFM facts. HTML render spelling is document-owned policy exposed as [render] profile.

Dependency fences

Enforced by crates/mdwright/tests/dependency_fences.rs via cargo tree:

  • mdwright-latex depends on no other mdwright-* crate and has no terminal, browser, Markdown parser, formatter, lint, config, CLI, or LSP dependencies.
  • mdwright-math may depend on mdwright-latex; it must not depend on document, format, lint, config, CLI, or LSP.
  • mdwright-document may depend on mdwright-math; it must not depend on latex body parsing directly, format, lint, config, CLI, LSP, clap, ignore, rayon, serde, toml, tokio, tower-lsp, owo-colors, or anyhow.
  • mdwright-format may depend on mdwright-document and mdwright-math; it must not depend on lint, CLI, LSP, clap, tokio, or tower-lsp. It does not import pulldown-cmark or mdwright_document::parse in production code.
  • mdwright-lint depends on mdwright-document and may depend directly on mdwright-latex for command vocabulary; it must not depend on format, CLI, LSP, clap, tokio, or tower-lsp, and it must not depend directly on mdwright-math for vocabulary.
  • mdwright-config may depend on document/format/lint option types and on the lint rule registry for resolving configured rule selection; it must not depend on CLI or LSP.
  • mdwright and mdwright-lsp are delivery crates; heavy delivery dependencies belong there.
  • mdwright-document does not publicly export parser helpers.
  • mdwright-math does not publicly re-export mdwright-latex as a pass-through facade.
  • Config schema and docs hold recognition keys under [parse.extensions], not formatter policy.
  • Workspace-internal dependencies carry both path and version.

Packaging

Every publishable crate uses versioned internal dependencies with local path entries for development. Cargo strips paths during packaging while local workspace builds keep using the checked-out crates. The repository .cargo/config.toml patches internal packages to local paths so local cargo package --no-verify checks run before those packages exist on crates.io. Publishing order:

  1. mdwright-latex
  2. mdwright-math
  3. mdwright-document
  4. mdwright-format, mdwright-lint
  5. mdwright-config
  6. mdwright-lsp
  7. mdwright

Why these crates, not others

  • No mdwright-source / mdwright-source-map / mdwright-text: source canonicalisation and byte mapping are part of the document abstraction. Callers want a recognised document whose spans map back to user bytes, not a separate coordinate package.
  • mdwright-latex is a real boundary, not a facade. TeX math-body parsing, Unicode math-source parsing, command vocabulary, Unicode layout, and source translation share grammar knowledge and should change behind one narrow API. mdwright-math remains separate because Markdown delimiter recognition changes for different reasons and has different callers. See latex-boundary-and-dependency-audit.md for the design comparison and dependency audit. The release claim for this crate is evidence-backed common MathJax-style coverage where Unicode has honest representations, plus parser-backed Unicode-to-LaTeX source translation for the supported subset. It is not TeX macro expansion, browser-grade MathJax layout, or diagram interpretation.
  • No mdwright-util: a utility crate has no domain responsibility and becomes a junk drawer.
  • No mdwright-rules: standard rules and rule dispatch share suppression, diagnostic, and registry semantics; separating them would mirror an old directory layout shallowly.
  • No root facade package and no root Document newtype to preserve doc.format() / doc.lint(): either wrapper would expose the same abstraction as mdwright-document::Document and add no hiding.

The alternative was a larger mdwright-engine crate that owned Document, lint, format, and safe-fix operations. That keeps formatter and linter complected with document recognition: one central crate would know parser byte ranges, lint suppression semantics, formatter transactional verification, standard-rule registration, and safe-fix edit ordering. The current split makes a new formatter rewrite touch mdwright-format plus tests; a new lint rule touch mdwright-lint plus docs; a new config key touch mdwright-config plus the option type it resolves; a new CLI flag not drag parser, formatter, or lint internals into the CLI surface.

Parser Boundary

mdwright-document is the only production crate that invokes pulldown-cmark.

Markdown is semantically total after mdwright canonicalises source bytes, but the parser implementation can still panic on malformed edge cases. The document crate contains that implementation risk:

  • source bytes are canonicalised into one parser input;
  • parser iteration is collected eagerly inside one private catch_unwind boundary;
  • parser panics become mdwright_document::ParseError;
  • document facts, signatures, HTML rendering, and block checkpoints are built only after successful collection.

Callers do not catch parser panics. They parse source with:

#![allow(unused)]
fn main() {
let doc = mdwright_document::Document::parse(source)?;
}

Operations over an existing Document stay pure over recognised facts. Formatting a parsed document remains infallible:

#![allow(unused)]
fn main() {
let formatted = mdwright_format::format_document(&doc, &opts);
}

Source-convenience APIs are fallible because they cross the parser boundary:

#![allow(unused)]
fn main() {
let formatted = mdwright_format::format_source(source, &opts)?;
let html = mdwright_document::render_html(source)?;
}

The transactional formatter uses the same policy for verification reparses. If a candidate output cannot be parsed, the candidate is rejected and the unverified bytes are not committed. CLI and LSP delivery turn ParseError into controlled file/editor diagnostics; they do not install parser-specific panic handlers.

Formatter rewrite boundary

The formatter starts from identity emit. Opt-in style and wrap changes run through private rewrite families in mdwright-format; each family builds a locally non-overlapping plan, verifies the resulting document, and commits the whole plan or none of it. Verification is a safety gate. It is not a convergence strategy.

Parser facts stay in mdwright-document. Rewrite policy stays in mdwright-format. The document crate tells the formatter where the syntactic slots are; the formatter decides whether a configured style should rewrite those slots.

Adopted design

The rewrite subsystem uses ordered families:

  1. inline delimiters;
  2. list markers;
  3. thematic breaks;
  4. link destinations;
  5. heading attributes;
  6. table normal forms;
  7. math;
  8. frontmatter;
  9. terminal wrap.

Each canonical family sees a parsed snapshot of the current bytes. If it produces edits, the family plan checks that those edits do not overlap within the family. A local overlap rejects the family; it does not drop one edit and keep another. If the plan verifies, the whole plan commits and the pipeline starts again from the first canonical family on a fresh parse. If verification fails, the family skips; verification never repairs an incomplete plan.

Terminal wrap is not a peer canonical family. It runs only after a full canonical-family scan commits nothing for the current snapshot. If wrap commits paragraph edits, the pipeline starts again from the first canonical family so any newly exposed syntactic slots are normalized before wrap runs again.

The successful terminal state is a full pass with no family commits. If the guard pass count trips before that state, the formatter leaves the original source bytes unchanged. It does not return the last verified partial output as successful formatting.

Design comparison

DesignResult
Typed candidates in one global listRejected. Enriching the old candidate type still leaves one shared selector that has to compare unrelated edits. It can express "keep this parent edit, drop that child edit" even when neither producer meant to own that relationship.
Ordered rewrite familiesChosen. Each family owns one style decision and must prove local non-overlap before commit. Cross-family order is explicit, and a family cannot silently steal ownership from another family through a range sort.

The old global model was shallow: callers supplied a phase, owner, byte range, replacement, verification mode, and label, then relied on a common engine to interpret those fields correctly. The family pipeline hides that coordination in the formatter implementation. Producers no longer compete in one phase/range list.

Ownership rules

An edit must be created for the owner kind the producer intends. There is no fallback from a requested owner to the smallest containing owner. A list-marker edit asks for a list item; a thematic-break edit asks for a thematic break; a math edit asks for a math region. If the matching owner does not contain the range, no edit exists.

This follows the pattern established by list marker and inline slot facts. mdwright-document exposes marker-local facts, delimiter slots, and link destination slots, so nested constructs cannot be represented as one enclosing rewrite that accidentally covers child bytes.

Table Normal Forms

Table padding is a parent operation. It runs after inline delimiter and link-destination families, reads cell bytes from the current snapshot, and rewrites the whole table block as one verified operation. Row and cell edits are not exposed as candidates.

The table family uses document-owned table facts: source ranges for the table, rows, cells, and alignments. If a row has source cells beyond the recognised table column count, or a cell range is not contained in its row, the table family skips that table instead of dropping bytes it cannot model.

Terminal Wrap

Paragraph wrapping is a terminal operation. It reads document-owned paragraph facts from the current snapshot: line ranges, content ranges, prefixes, hard breaks, and inline atomics. It computes paragraph replacements after all earlier canonicalizers have reached local normal form, verifies the paragraph batch, and commits the batch or none of it.

Unsupported paragraph shapes stay unchanged. They are counted in the formatter report rather than widened into paragraph edits whose safety depends on later passes.

Pulldown-cmark model

Reference for the per-construct behaviours of pulldown-cmark 0.13 that mdwright depends on. Every emit-site decision in crates/mdwright-format either matches a rule on this page or contradicts pulldown. A contradiction is a bug.

This file is paired with crates/mdwright/tests/pulldown_model.rs. Each rule below has one test in that file that feeds the documented example to pulldown and asserts the documented event-stream shape. When pulldown changes upstream (a release bump, a bug fix on their side), the test fails and this document must be updated before any mdwright code is changed in response.

Every production parse flows through private helpers in crates/mdwright-document/src/parse.rs, which take a private CanonicalSource<'_>. Construction routes through the document crate's source canonicalisation, so pulldown's input is always CR-free and NUL-free in production. Rules below assume that pre-condition.

§1 Line endings

Source::canonicalise strips CR / CRLF → LF and NUL → U+FFFD before pulldown sees the buffer (CM §2.1, §2.3). Inside HTML blocks, code blocks, math regions, and inline code, pulldown preserves the (now-LF) bytes verbatim in the CowStr payload. In prose, a single \n between non-blank content lines becomes Event::SoftBreak; two consecutive \ns end the current block.

Consequence: no CowStr produced by Event::Text, Event::Code, Event::Html, Event::InlineHtml, Event::InlineMath, or Event::DisplayMath can ever contain a CR byte in production. The semantic-equivalence walker in crates/mdwright-format relies on this; there is no per-event CR scrub.

Test: line_endings_softbreak_between_lines.

§2 Trailing blank lines in containers

Pulldown strips trailing blank lines from indented code blocks before emitting the final Event::Text. A whitespace-only line is "blank."

The source "\t|\n\t" produces a single Event::Text("|\n") inside the indented code block: the trailing tab-only line is consumed as a blank line, but the terminating \n of the content line stays in the payload. The formatter's normalize_trailing_newline consumes that trailing LF when re-emitting; without it the formatter would emit one trailing LF too many.

Cite: regression fixture crates/mdwright/tests/regressions/fuzz_indented_code_trailing_ws_drop.in.

Test: indented_code_keeps_content_terminating_newline.

§3 Emphasis pairing scope

CM §6.2 / §6.3: emphasis delimiters pair within their enclosing pairing container. The set of pairing containers pulldown observes: paragraph, heading, table cell, link body, image body, footnote definition.

Strikethrough (~~…~~) is not a pairing container: emphasis delimiters can open inside one strikethrough run and close inside another, or across a strikethrough boundary entirely. The canonicalisation pass's per-rewrite verification window includes surrounding bytes so a candidate that would re-pair across a strikethrough boundary is rejected.

Link bodies are a pairing boundary because CM §6.5 gives link text grouping higher precedence than emphasis grouping. The two are not symmetric: *[foo*](bar) parses with the * not pairing (it's outside the link, the link doesn't enclose it), but the link text [foo*] does not contribute to an outer *…* pair either.

Test: emphasis_pairs_within_paragraph and emphasis_pairs_across_strikethrough and link_body_breaks_emphasis_pairing.

§4 Reference label normalisation

CM §4.7: trim leading and trailing whitespace; collapse internal runs of whitespace to a single U+0020; case-fold via Unicode default case folding. Two labels resolve to the same definition iff their normalised forms agree.

Pulldown 0.13 does not emit a LinkReferenceDefinition event. Definitions are resolved internally during parse, and reference uses surface as Tag::Link { id: ".." } where id is the raw label bytes the source used (not the normalised form). The mdwright-side authoritative scan for definitions lives in crates/mdwright-document/src/refs.rs::build_reference_table; that module is the sole site that runs CM §4.7 normalisation.

Test: reference_label_normalisation_matches.

§5 HTML block boundaries

CM §4.6 defines seven HTML block types, each with its own start / end conditions. Two of the important asymmetries:

  • Type 2 (<!-- … --> or <?…?> style with a multi-char end marker): the block ends at the line containing the matching end marker (or EOF). The block's events are a sequence of Event::Html(line) per source line, each payload including its trailing newline, except possibly the last, which can omit the newline if the source did.
  • Type 6 (recognised tag names like <table>): the block ends at the first blank line after the start (or EOF). Recognition is by tag name, not by close-tag matching: <table> opens a type-6 block; the close </table> does not by itself end it. A blank line does.

The block's payload bytes round-trip verbatim (modulo §1 canonicalisation), so the formatter emits HTML blocks by stamping the captured source slice rather than reconstructing from events.

Test: html_block_type2_emits_per_line_events.

§6 Emphasis-event range semantics

Event::Start(Tag::Emphasis) and Event::End(TagEnd::Emphasis) ranges in the offset iterator cover the entire run, from the byte position of the first character of the opening delimiter, to the byte position after the last character of the closing delimiter.

  • range.start of Start(Emphasis): index of the first * or _ of the opening run.
  • range.end of End(Emphasis): index after the last * or _ of the closing run.
  • The body bytes occupy [start_range.end, end_range.start).

Same convention for Strong. mdwright-document turns these ranges into inline delimiter-slot facts that name only the opening and closing delimiter bytes. A pulldown change to either range convention would silently change those facts; the model test catches the drift first.

Test: emphasis_event_range_spans_delimiters.

§7 Strong vs nested emphasis disambiguation

CM §6.5 disambiguates runs of two through six * / _ characters:

  • **foo**Start(Strong), Text("foo"), End(Strong). Not emphasis-of-emphasis.
  • ***foo***Start(Strong), Start(Emphasis), Text("foo"), End(Emphasis), End(Strong) (the nesting order depends on pairing direction; pulldown's left-flank rule decides).
  • *_foo_*Start(Emphasis), Start(Emphasis), Text("foo"), End(Emphasis), End(Emphasis). Two distinct delimiter characters pair independently.

Canonicalisation must keep these distinct. Inline delimiter families edit only delimiter slots and verify the resulting document before commit; a rewrite that would let pulldown re-segment the construct differently is skipped.

Test: strong_distinct_from_nested_emphasis.

§8 Definition-list event shape

With Options::ENABLE_DEFINITION_LIST set on the parser, the source

Term
: defn

emits the nested triple Start(DefinitionList)Start(DefinitionListTitle) → … → End(DefinitionListTitle)Start(DefinitionListDefinition) → … → End(DefinitionListDefinition)End(DefinitionList). Each definition's body is opened/closed independently, so a definition containing multiple paragraphs emits multiple Start(Paragraph) / End(Paragraph) pairs inside one DefinitionListDefinition.

The private document tree relies on this nesting shape to construct definition-list nodes in crates/mdwright-document/src/tree.rs. Public callers consume document facts and signatures; they do not see pulldown's event nesting directly.

Test: definition_list_emits_tag_triple.

§9 Heading attribute fields

With Options::ENABLE_HEADING_ATTRIBUTES set, the trailing { #id .class₁ .class₂ key=val } on an ATX heading populates the id: Option<CowStr>, classes: Vec<CowStr>, and attrs: Vec<(CowStr, Option<CowStr>)> fields on Tag::Heading. With the flag unset, those fields are None / empty regardless of source content (the trailer remains in the heading text).

mdwright-document records the parsed trailer as a HeadingAttrSite. The mdwright-format heading-attribute family emits the canonical trailer (#id first, then classes in source order, then key=val pairs in source order) when FmtOptions::heading_attrs is Canonicalise. Under Preserve (the default), the source bytes round-trip unchanged.

Test: heading_attributes_populate_tag_fields.

§10 MyST / Pandoc directives, roles, substitutions, comments

pulldown-cmark v0.13.3 emits no events for any of the following constructs; mdwright treats them as source-owned extension regions under document parse policy:

ConstructOwning policy
MyST / Pandoc directive containersParseOptions::extensions.myst.directive_containers
MyST % line commentsParseOptions::extensions.myst.comments
MyST inline rolesParseOptions::extensions.myst.inline_roles
MyST substitution referencesParseOptions::extensions.myst.substitution_references
Pandoc inline attribute spansParseOptions::extensions.pandoc.inline_attribute_spans

Pulldown sees these as plain paragraph / text events. mdwright therefore treats their source bytes as opaque unless a document-owned fact proves a narrower rewrite slot nearby.

For directive containers, an opener whose colon count is n matches the next colon-only line of count ≥ n. Nested directive bytes are preserved by source identity.

The formatter starts from source bytes, so unknown extension syntax is preserved by default. Opt-in rewrite families must use document-owned facts and exclusion regions before touching bytes near these constructs.

There is no drift test for these constructs because pulldown emits nothing to drift on. Per-fixture regression coverage in crates/mdwright/tests/regressions/{directive_*,inline_role_*,myst_*}.in plus the vendored jupyter-book round trip at crates/mdwright/tests/external_corpora.rs is the safety net.

Test matrix

mdwright's correctness sits on these test surfaces. For each: the invariant it defends, where it lives, and what it does NOT cover. Use this to decide which gate(s) a change to the formatter or canonicalisation pass needs to clear.

Per-construct golden suites

Location: crates/mdwright/tests/golden_inline/, crates/mdwright/tests/golden_block/, crates/mdwright/tests/golden_frontmatter/.

Each fixture is an *.in / *.out pair. Optional *.config.toml overrides FmtOptions::default(). The driver tests live at crates/mdwright/tests/golden_inline.rs, crates/mdwright/tests/golden_block.rs, crates/mdwright/tests/golden_frontmatter.rs and assert byte equality of the formatted input against .out.

Invariant: structural emit and canonicalisation produce the expected bytes for the exact shapes the project cares about. This is where new features and bugfixes land their single load-bearing example.

Does NOT cover: behaviour on random inputs (property tests do that), behaviour under options not represented by a *.config.toml (the matrix is per-fixture, not per-mode).

Property tests

Location: crates/mdwright/tests/properties.rs, generators at crates/mdwright/tests/common/proptest_gen.rs.

Four families:

FamilyPropertiesCasesSweep gate
Whole-document, default optsidempotent, html_preserving, lint_preserving, reference_resolver_round_trips256*_sweep at 4096, #[ignore]
Per-construct, default opts<construct>_fragments_idempotent, <construct>_fragments_html_preserving for emphasis, strong, link-inline, link-reference, autolink, code-span, heading, fenced-code, quote, list, table, thematic, footnote256 eachnone
Canonicalisation, 15 modescanonicalise_<construct>_semantic_equivalence, canonicalise_<construct>_idempotent, canonicalise_document_*. Each iterates canon_opts() (preserve + per-knob × variants + 2 all-knobs-together).256 × 15 modescanonicalise_document_*_sweep at 4096, #[ignore]
Rewrite-law interactions*_interactions_are_profile_idempotent for nested lists, nested inline slots, tables with inline content, wrapped paragraphs with atomics, link destinations, math, and frontmatter. Each iterates preserve, mdformat, known fuzz profiles, and an all-family profile.96 × 5 profilesnone

Invariants tested:

  • Idempotence: format(format(s)) == format(s): strict byte equality.
  • Rewrite-law completion: the second pass over generated rewrite-interaction inputs commits no rewrites; family planning must reach its normal form in the first public format call.
  • HTML preservation / semantic equivalence: semantically_equivalent(s, format(s)): canonical pulldown event streams agree.
  • Lint preservation: format does not introduce new default-on diagnostics (modulo bare-url, which the formatter is allowed to fix into <...> autolinks).

Does NOT cover: option combinations beyond canon_opts(). The two "all-knobs" modes (opts_all_asterisk, opts_all_underscore_or_dash) are the cross-knob coverage; a full Cartesian product would be 4·3·4·3·2·3 = 864 modes and is not pulled in here.

Regression suite

Location: crates/mdwright/tests/regressions/, driver at crates/mdwright/tests/regressions.rs.

Each *.in file is a minimal failing input committed in the same change as its fix. Two gates per fixture:

  • regression_inputs_preserve_html: format_validated must succeed (HTML equivalent to source). Skipped for fixtures whose stem ends in .idem.
  • regression_inputs_are_idempotent: byte equality across two format passes. Applied to every fixture.

Invariant: previously-broken shapes do not re-regress.

Does NOT cover: anything not in the file list. Adding a fixture is the way to lock in a new invariant.

GFM spec snapshot

Location: crates/mdwright/tests/gfm_spec.rs, vendored spec at crates/mdwright/tests/gfm-spec/spec.txt, snapshot at crates/mdwright/tests/gfm-spec/snapshot.txt.

Two tests:

  • gfm_spec_snapshot: runs every spec case and compares the residual allowlist against snapshot.txt. Update with MDWRIGHT_UPDATE_SNAPSHOT=1.
  • gfm_spec_coverage: asserts the bucketing (fully matching / intentional dev / tracked regression / unexpected) and refuses any unexpected count.

Invariant: the formatter's GFM conformance is stable; the snapshot only changes when intentionally rebaselined.

Does NOT cover: behaviour outside the GFM-spec cases. Project-specific extensions (admonitions, frontmatter, math regions) live in their own golden suites.

Parser backend audit

Location: cargo xtask parser-audit, classifications in docs/architecture/parser-backend-audit.md.

The audit compares mdwright's pulldown-cmark backend against the vendored cmark-gfm expected HTML and a pinned cmark-gfm binary. It renders mdwright through the cmark-gfm render profile so parser drift is not hidden by HTML serializer spelling. Optional comrak output is reported as diagnostic evidence, not as a release gate. The audit also performs risk-gated source-position checks for constructs that mdwright uses as formatter or lint facts.

Invariant: parser-backend differences are explicit. Unclassified pulldown HTML mismatches, unclassified source-position risks, uncontained parser panics, rows marked fixed, and rows marked needs-mdwright-mitigation fail the command.

Does NOT cover: formatter idempotence or rewrite safety; those remain covered by the GFM snapshot, property tests, fuzz, and production soak.

Fuzz oracles

Location: fuzz/fuzz_targets/.

TargetOracleOption byte
fuzz_idempotenceformat(format(s)) == format(s)Yes; drives wrap × mode × math × canonicalisation
fuzz_parse_formatsemantically_equivalent(s, format(s))Yes; same allocation as fuzz_idempotence
fuzz_structured_idempotenceStructured-document idempotence over generated MarkdownYes
fuzz_verbatim_identityDefault options are identity modulo document-boundary normalisationsNo
fuzz_lintStandard lint rules do not panic and diagnostics are deterministic/in-boundsNo
fuzz_latex_renderTeX math-body parse plus Unicode render never panics; malformed or unsupported input returns typed errorsNo
fuzz_latex_translateLaTeX-to-Unicode and Unicode-to-LaTeX source translation never panic; diagnostic/loss spans stay in boundsNo
fuzz_markdown_math_translateMarkdown math-span scanning plus body-only translation never panics and preserves valid span accountingNo
fuzz_unicode_latex_roundtripSupported Unicode math source reaches the public translation fixed point L(U(L(y))) == L(y)No

Option byte allocation (fuzz_idempotence and fuzz_parse_format, identical):

BitsField
0–1wrap (Keep, No, At(80), At(120))
2math.normalise
3reserved for corpus continuity
4–7Canonicalisation mode (16 enumerated: preserve, one per style knob, two combined)

Invariant: no input causes a panic or property violation in 10 minutes. Parser implementation panics are converted to ParseError at the mdwright-document boundary, so fuzz targets discard parse errors through normal Result handling rather than wrapping product calls in catch_unwind. TeX math-body failures return LatexError or translation diagnostics through mdwright-latex; fuzz treats those as normal product output and checks that reported spans are valid. Unicode-to-LaTeX fuzzing exercises the parser-backed public translator rather than private lexer or AST APIs. Findings are committed to crates/mdwright/tests/regressions/ or to mdwright-latex coverage fixtures as appropriate.

Production soak

cargo xtask production-soak --corpus-root <path> runs parser, lint, format-validation, idempotence, and fmt-check checks over the corpus enumerated by crates/mdwright/benches/corpus.list plus representative external Markdown fixtures. The command reports parse errors, validation failures, idempotence failures, fmt-check disagreements, rewrite candidate totals, maximum file size, and slowest files.

Does NOT cover: behaviour beyond MAX_INPUT = 65 536 bytes; the libFuzzer harness skips bigger inputs. The CLI enforces the same shape via --max-input-bytes.

mdformat parity

cargo xtask mdformat-parity --corpus-root <path> --corpus-name <name> --mdwright-config <path> --mdformat-config xtask/fixtures/mdformat-parity/mdformat.toml copies a corpus into isolated temp roots, runs mdwright and mdformat, and writes JSON / Markdown reports under target/mdwright/parity/. The command compares changed file sets, line-diff stats, idempotence, mdBook buildability when applicable, and semantic equivalence of each formatter output to the original.

The mdformat config is checked in as an xtask fixture so mdformat does not look like the repository's own formatter.

The parity gate is intentionally not byte-equality with mdformat. Differences are allowed only when docs/architecture/mdformat-parity.md classifies them as configured, intentional, or upstream-owned. The command fails on unclassified differences, mdwright semantic drift, parser errors, idempotence failures, mdBook failures, rows marked fixed that still appear, and rows marked open-bug.

Release evidence

cargo xtask release-evidence --output target/mdwright/release aggregates local release-candidate evidence into release-evidence.json and release-evidence.md. The command records git state and tool versions, reads existing parser-audit, mdformat-parity, production-soak, and package/install reports, and points at manual notes for fast checks, fuzz rounds, and benchmarks.

Invariant: the release candidate has one inspectable evidence bundle that states the current claim, lists accepted divergences, and names missing evidence as blockers.

Does NOT cover: running expensive gates. The command summarizes evidence; it does not replace parser-audit, mdformat-parity, production-soak, fuzzing, packaging, or Criterion.

How to choose what to add when

SymptomRight surface
One specific fixture or shape misbehavesGolden suite (add an *.in / *.out pair)
A bug class spans many inputs of one constructPer-construct property (a new <construct>_fragments_* pair, or strengthen the existing one)
A canonicalisation mode misbehavesCanonicalisation property (extend canon_opts())
A minimal counterexample of a property failure surfacesRegression suite (*.in next to the fix commit)
GFM conformance shiftsAudit gfm_spec_coverage first, then rebaseline the snapshot with a comment line above each new entry
Pathological inputs reach a panic / property violationAdd the input as a regression fixture; libFuzzer will not re-find it once it round-trips

What this matrix does NOT include

Lint-rule coverage lives with each rule under crates/mdwright-lint/src/stdlib/* and its tests/; that's a parallel matrix and isn't summarised here. CLI-surface tests live at crates/mdwright/tests/cli_*.rs. The diagnostic JSON v2 schema is gated by crates/mdwright/tests/diagnostic_json_v2.rs.

Stability charter

Invariant. Formatting a parsed document preserves Markdown meaning, or refuses the rewrite that would change it. Default formatting is identity emit modulo document-boundary normalisation; opt-in style and wrap changes are transactional byte rewrites, each verified against document-owned parser facts.

mdwright's correctness rests on three deep modules in mdwright-document and mdwright-format, not on layered agreements between consumers:

  1. One pulldown chokepoint in mdwright-document. Every production pulldown_cmark::Parser invocation goes through private helpers in crates/mdwright-document/src/parse.rs that take the private CanonicalSource<'_> newtype. Construction routes through source canonicalisation, so the type system enforces the chokepoint. Upstream parser panics convert to ParseError at this boundary.
  2. Structural emit is identity. format_document starts from the parsed document's canonical source bytes; default formatting reaches only document-boundary normalisation.
  3. Style canonicalisation and wrapping are rewrite-family operations. Opt-in rewrites run as ordered families. Each family builds a locally non-overlapping normal-form plan, verifies the whole plan, and commits all edits or none.

The bug class that motivated this design—formatter mutations that perturb their own parse context—survives only as private rewrite-family edits. A family cannot commit unless the document-level verification predicate accepts it.

The bug class

As long as any emit site reads source bytes to choose its representation, perturbation is possible. The bugs that drove this design all share one shape: a downstream pass predicted what pulldown would do, instead of asking pulldown what it does. Two examples:

  • _*/*_ (5 bytes). Pulldown sees nested emphasis; a predictive formatter emitted *\*/\**, which re-parses to a single emphasis.
  • **u*~***~. Pulldown sees one Strong wrapping Emphasis-and-text plus trailing literals; a predictive formatter oscillated between **u*~*\*\*~ and **u*~~\*\*\*~~ on successive passes.

Removing the read site—preserving source representation byte-for-byte— removes the bug class. Style canonicalisations that do need to choose a representation move into a separate pass where each rewrite family verifies before committing.

The pipeline

source → CanonicalSource → pulldown::Parser → typed IR
       → structural emit (source-preserving)
       → normalize_line_endings_lf
       → [if opts enables rewrites: rewrite-family pipeline]
       → normalize_trailing_newline → apply_end_of_line → out

Only document-owned canonicalisation can produce a CanonicalSource; only mdwright-document invokes pulldown-cmark. Parser panics become ParseError at that boundary. The rewrite-family pipeline reparses after each committed family so later families see current document facts. Success means a full pass over enabled families commits nothing. If the guard pass count trips first, mdwright leaves the original source bytes unchanged rather than returning a partially normalized buffer as success.

Public API

SymbolBehaviour
Document::parse(&str) -> Result<Document, ParseError>Fallible at the parser trust boundary.
format_document(&doc, opts) -> StringInfallible over an already-parsed document.
format_validated(&doc, opts) -> Result<String, FormatError>Carries parse failures and semantic divergence.
semantically_equivalent(a, b) -> Result<bool, ParseError>Reparses both inputs to build semantic signatures.

FmtOptions style knobs default to Preserve. Fluent setters (with_italic, with_strong, with_list_marker, with_ordered_list, with_thematic_break, with_link_def_style) cover programmatic callers; the TOML keys are [fmt] strong, [fmt] thematic-break, and the existing per-knob spellings. User-facing surfaces are documented in docs/src/format/policy.md and docs/src/format/style.md.

Risk register

RiskBoundEvidence
A rewrite family contains overlapping local edits.The family plan rejects before verification; no individual edit is selected out of the overlap.Unit tests in mdwright-format cover local-overlap rejection.
The rewrite-family pipeline never reaches a no-commit pass.The guard pass count logs tracing::warn! and returns the original source bytes unchanged.Idempotence regressions and fuzz replay cover known sustained-fuzz failures.
Verification misses a cross-paragraph effect.Families verify the whole document and skip if the document or math signature diverges.Skips are logged; high-skip-rate documents surface in production traces.
Structural emit edge cases the 4096-case sweep doesn't reach.Two accepted FmtOptions::default() regressions: an empty list item at EOF, and an ATX heading with a trailing hash.Both reproduce as pre-existing structural-emit bugs surfaced by broader option-space fuzz coverage.
Pulldown behaviour drifts between releases.docs/architecture/pulldown-model.md documents the invariants; tests/pulldown_model.rs fails when pulldown disagrees.One chokepoint at crates/mdwright-document/src/parse.rs is the single site any drift mitigation lands.

Out of scope

  • Replacing pulldown-cmark. The bug class is about agreement with pulldown; a different parser trades one disagreement surface for another.
  • AST-level structural diff in the verification gate. Event-stream equivalence is sufficient and cheap; AST diff amplifies position-noise into false divergence.
  • A custom emphasis tokeniser. CM §6.2 is correct; mdwright's job is to produce output that lets pulldown's tokeniser reach the same answer as it did on the source.
  • Cross-knob canonicalisation modes beyond what FmtOptions exposes. For aggressive cross-knob normalisation, use mdformat; see the README.

What the bar is now

Two rg invariants guard against regression of the design above:

  • rg 'opts\.(italic|strong|list_marker|thematic|link_def|ordered_list)' crates/mdwright-format/src/ returns only the style-policy call sites in crates/mdwright-format/src/format/canonicalise.rs. Structural emit does not read style knobs.
  • Every production pulldown_cmark::Parser invocation routes through the document parse boundary; #[cfg(test)] exceptions carry an inline justification.

The normalize_* post-passes (normalize_trailing_newline, source_has_effective_trailing_newline, normalize_line_endings_lf, apply_end_of_line) live in crates/mdwright-format/src/format/mod.rs and are wired through the public formatting entry points. They are boundary-policy transforms, not perturbation sources: normalize_trailing_newline reads source bytes to decide whether the output ends with \n; the LF normaliser checks the invariant carried by document construction.

mdformat parity

cargo xtask mdformat-parity compares mdwright against mdformat (with the GFM, frontmatter, footnote, and MkDocs plugins) over an isolated corpus copy. The goal is classified compatibility, not byte identity. Every mdwright/mdformat output difference is either fixed, configured, or recorded below as intentional; otherwise the command fails as a release gate.

Use [fmt] profile = "mdformat" to ask "how close can mdwright get to mdformat while keeping verified rewrites?" The profile keeps mdformat's default wrap = keep; a project that wants mdformat with a column limit must set wrap explicitly. When wrap is an integer, mdwright enforces that line budget for breakable prose in every profile. The default stable wrap strategy uses mdformat-compatible soft-break reflow. The mdformat profile also defaults list continuation indentation to four spaces.

Status values

  • open-bug: known unresolved gap; reported as a failing release gate.
  • intentional-divergence: mdwright deliberately keeps a different byte style while preserving semantics.
  • upstream-parser-limitation: difference pinned to parser behaviour outside mdwright.
  • configured: caused by mdwright project configuration, usually generated-doc excludes.
  • fixed: should no longer appear; the xtask fails if it does.

Class is free-text and groups rows by root cause. style-option-mismatch covers remaining wrap or indentation policy differences; mdformat-semantic-drift covers cases where mdformat's output is not semantically equivalent to the source; intentional-policy covers generated files excluded by configuration.

Classifications

The table below is parsed by xtask::mdformat_parity::load_classifications: each row must have exactly seven cells. Path patterns support *, **, and prefix/** globs; find_classification returns the first matching row, so specific paths come first and catch-all ** rows last. Formatter divergences are owned by the formatter team; generated-doc exclusions are owned by docs.

CorpusPathConstructClassStatusOwnerResolution
externaljupyter_book_minimal/admonitions.mdMyST directivesmdformat-semantic-driftintentional-divergenceformattermdwright preserves MyST directive structure; mdformat with --no-validate rewrites this fixture in a way mdwright's semantic oracle rejects.
externaljupyter_book_minimal/asides.mdMyST directivesmdformat-semantic-driftintentional-divergenceformatterSame shape as admonitions.md.
externaljupyter_book_minimal/directives.mdMyST directivesmdformat-semantic-driftintentional-divergenceformatterSame shape as admonitions.md.
externaljupyter_book_minimal/blocks.mdMyST and Pandoc blocksmdformat-semantic-driftintentional-divergenceformattermdwright preserves MyST and Pandoc block structure; mdformat with --no-validate rewrites this fixture in a way mdwright's semantic oracle rejects.
mdwright-docssrc/SUMMARY.mdnested list indentationstyle-option-mismatchintentional-divergenceformattermdwright preserves the existing two-space mdBook summary nesting; mdformat rewrites nested bullets to four spaces.
mdwright-docssrc/extending/lint-rules.mdlist continuation indentationstyle-option-mismatchintentional-divergenceformatterThe repository policy keeps marker-width continuation; fmt.lists.continuation-indent = "four-space" provides the mdformat spelling when requested.
mdwright-docssrc/configuration.mdgenerated docsintentional-policyconfigureddocsGenerated by cargo xtask doc-config; excluded so source docs and generator drift checks do not fight.
mdwright-docssrc/reference/cli.mdgenerated docsintentional-policyconfigureddocsGenerated by cargo xtask doc-cli.
mdwright-docssrc/reference/diagnostic-schema.mdgenerated docsintentional-policyconfigureddocsGenerated from diagnostic schema tests.
mdwright-docssrc/rules/**generated rule docsintentional-policyconfigureddocsGenerated by cargo xtask doc-rules; rule pages intentionally contain lint violations.
mdwright-docs**prose wrapstyle-option-mismatchintentional-divergenceformatterInteger wrap now enforces a line budget for breakable prose in every profile. mdwright may wrap lines that mdformat leaves above the configured width.
release-prose-corpus**prose wrap line budgetstyle-option-mismatchintentional-divergenceformattermdwright enforces wrap = 120 for breakable prose lines. mdformat 1.0.0 leaves the observed over-budget source lines unchanged.
release-math-corpus**/*-template.mdmdformat semantic driftmdformat-semantic-driftintentional-divergenceformattermdwright preserves the source semantics; mdformat changes the rendered HTML on some template files in this corpus.
release-math-corpus**math-heavy prose and list reflowstyle-option-mismatchintentional-divergenceformattermdwright treats ordinary paragraph newlines as soft breaks and enforces over-budget breakable lines. mdformat leaves some over-budget lines unchanged.
external**prose wrapstyle-option-mismatchintentional-divergenceformatterSame as the mdwright-docs catch-all. Single oversized atomics may still exceed the target by policy.

Release use

Run against the pinned mdformat baseline:

cargo xtask mdformat-parity \
  --corpus-root docs \
  --corpus-name mdwright-docs \
  --mdwright-config .mdwright.toml \
  --mdformat-config xtask/fixtures/mdformat-parity/mdformat.toml

The mdformat config lives under xtask/fixtures/ because it is an oracle fixture, not the repository's own formatter. Output lands at target/mdwright/parity/mdformat-parity.{json,md}. A clean release run has no unclassified differences, no semantic drift, no parse errors, no idempotence failures, and no rows marked open-bug.

Parser Backend Audit

cargo xtask parser-audit compares mdwright's production pulldown-cmark backend with cmark-gfm, using the vendored GFM spec expected HTML as the primary oracle. The audit renders mdwright through the opt-in cmark-gfm render profile so renderer spelling drift is separated from parser-tree drift. It does not replace mdwright-document as the production parser boundary.

cmark-gfm is the primary oracle because crates/mdwright/tests/gfm-spec/spec.txt is vendored from cmark-gfm and the GFM ecosystem treats its rendered HTML as the reference. comrak is optional diagnostic evidence for rendered HTML and source-position behaviour; it is not a release gate unless a future audit shows it catches mdwright-relevant risks that cmark-gfm cannot expose.

Running

cargo xtask parser-audit \
  --case-set all \
  --output target/mdwright/parser-audit \
  --ensure-tools \
  --include-comrak

The command builds a pinned cmark-gfm under target/mdwright/tools/ when --ensure-tools is passed. To use an already-built binary explicitly, pass --cmark-gfm-bin <path>.

Reports are written to:

  • target/mdwright/parser-audit/parser-audit.json
  • target/mdwright/parser-audit/parser-audit.md

Examples marked disabled in the vendored GFM spec are still reported, but cmark-gfm binary drift from the expected HTML for those cases is not a command failure because the upstream spec does not treat the rendered checkbox spelling as a strict conformance assertion.

The audit also checks source-position envelopes for constructs mdwright uses as formatter/linter facts. It maps cmark-gfm data-sourcepos line/column ranges back to source bytes and compares them against mdwright document facts by construct kind. This is a risk gate, not exact AST equality: a difference is reported only when mdwright has no overlapping fact for a rewrite/lint-owned construct.

Status Values

  • pulldown-html-mismatch: mdwright's pulldown-backed HTML differs from cmark-gfm expected HTML.
  • mdwright-policy: mdwright intentionally differs from the cmark-gfm oracle for a documented parser policy.
  • extension-gap: the compared parser does not implement the construct.
  • sourcepos-risk: rendered output matches, but coordinate facts may affect formatter/lint safety.
  • event-only: internal event/AST shape differs while rendered HTML and semantic signatures match.
  • upstream-panic: parser panic or crash contained by mdwright-document.
  • needs-mdwright-mitigation: upstream behaviour is unsafe for mdwright and still needs a fix.
  • fixed: the difference should no longer appear; the audit fails if it does.

Classifications

Current gfm-spec audit snapshot with mdwright's cmark-gfm render profile:

MetricCount
Cases673
HTML mismatches15
Sourcepos envelopes checked1071
Sourcepos differences0
Unclassified differences0

Observed difference classes:

ObservedCount
pulldown-html-mismatch:emphasis-resolution9
pulldown-html-mismatch:html-block-rendering3
pulldown-html-mismatch:tasklist-rendering2
pulldown-html-mismatch:table-rendering1
upstream-panic1
Case SetKeyObservedStatusOwnerResolution
**mdwright-policy:gfm-bare-autolinks-enabledfixeddocumentParser-audit now mirrors the cmark-gfm extension set per spec case, so default production GFM policy no longer creates non-extension CommonMark audit drift.
**mdwright-policy:gfm-email-autolinks-disabledfixeddocumentGFM email autolinks are recognised by mdwright-document's source-positioned GFM overlay.
**mdwright-policy:gfm-tagfilter-disabledfixeddocumentGFM tagfiltering is enabled by default in mdwright-document's render/signature policy.
**pulldown-html-mismatch:gfm-autolinkfixeddocumentGFM URL and email autolink mismatches should be handled by mdwright-document's GFM autolink overlay.
**pulldown-html-mismatch:gfm-tagfilterfixeddocumentGFM tagfilter mismatches should be handled by mdwright-document's GFM tagfilter overlay.
**pulldown-html-mismatch:quote-escapingfixeddocumentThe cmark-gfm render profile escapes double quotes in text/code contexts where cmark-gfm emits &quot;.
**pulldown-html-mismatch:href-escapingfixeddocumentThe cmark-gfm render profile percent-encodes link destinations where cmark-gfm percent-encodes them.
gfm-specTables (extension)pulldown-html-mismatch:table-renderingfixeddocumentThe cmark-gfm render profile spells ordinary GFM table markup with cmark-gfm row, alignment, and body layout.
gfm-speccase-160pulldown-html-mismatch:table-renderingpulldown-html-mismatchdocumentThis is a raw HTML table containing indented code, not a GFM table. The remaining drift is parser/backend handling of blank raw-HTML text around child blocks, not formatter rewrite risk.
gfm-speccase-279, case-280pulldown-html-mismatch:tasklist-renderingpulldown-html-mismatchdocumentThese spec examples are marked disabled; cmark-gfm's binary output and mdwright's cmark-gfm profile match, while the vendored expected HTML intentionally does not assert the checkbox spelling.
**extension-gap:myst-definition-listextension-gapdocumentcmark-gfm does not own MyST directive syntax; mdwright's default definition-list recognition can make directive-heavy fixtures render differently through pulldown HTML, while formatter preservation is handled by mdwright document facts.
corpusexternal:jupyter_book_minimal/admonitions.md, external:jupyter_book_minimal/asides.md, external:jupyter_book_minimal/blocks.md, external:jupyter_book_minimal/directives.mdsourcepos-risk:paragraphextension-gapdocumentcmark-gfm reports MyST directive/admonition syntax as ordinary paragraph source ranges, while mdwright treats the same bytes as extension-owned containers or preservation facts. The corpus rows pin that non-GFM coordinate drift so it cannot silently expand.
gfm-speccase-120, case-152, case-153pulldown-html-mismatch:html-block-renderingpulldown-html-mismatchdocumentpulldown's event stream omits leading indentation on raw HTML blocks that cmark-gfm preserves in rendered HTML. mdwright accepts this as backend render drift because source-coordinate facts remain stable.
gfm-speccase-144pulldown-html-mismatch:html-block-renderingfixeddocumentThe cmark-gfm render profile now matches cmark-gfm's newline placement for this list/raw-HTML case.
gfm-speccase-398, case-426, case-434, case-435, case-436, case-473, case-474, case-475, case-477pulldown-html-mismatch:emphasis-resolutionpulldown-html-mismatchdocumentpulldown's emphasis resolution differs from cmark-gfm on these delimiter-stack edge cases; mdwright currently treats this as a parser-backend conformance gap, not a formatter-local bug.
operationalknown-pulldown-link-ref-tab-panicupstream-panicupstream-panicdocumentpulldown-cmark issue 1095 is contained by mdwright-document::ParseError; product paths do not panic.

The cmark-gfm render profile is an HTML spelling profile. It fixes quote escaping, link-destination escaping, ordinary GFM table spelling, task-list checkbox spelling, and one newline-placement case where the parser already exposes enough structure. It does not change emphasis resolution or source-position semantics. Full cmark-gfm parser equivalence would require upstream pulldown changes, a maintained fork, or a backend switch.

Replacement Criteria

Do not replace pulldown-cmark based on event-shape differences alone. A replacement candidate must improve at least one release-relevant axis without regressing the others:

  • fewer unclassified or policy-relevant HTML mismatches against cmark-gfm;
  • safer behaviour on malformed/user input;
  • stable byte/source coordinates sufficient for formatter rewrite ownership;
  • extension coverage at least as good as the current document facts;
  • acceptable runtime and dependency footprint.

LaTeX boundary

mdwright needs MathJax-scale TeX math support, Unicode terminal layout, and bidirectional source translation. That language machinery is larger and more volatile than Markdown math-span recognition, so it belongs behind a separate component boundary.

The boundary

mdwright-latex hides the TeX body language: lexer, parser, command registry, Unicode layout, and source translation. mdwright-math keeps Markdown delimiter and environment recognition and delegates the body string to mdwright-latex when callers need rendering or translation.

mdwright-latex is not a facade: its public API stays narrower than its implementation. Callers receive parsed/rendered/translated results and typed errors, not lexer tokens, parser cursors, AST variants, or MathJax table internals.

  • mdwright-latex owns TeX math-body lexing, parsing, command vocabulary, Unicode layout, and source translation.
  • mdwright-math owns Markdown math-span recognition, delimiter policy, and extraction of math body strings.
  • mdwright-lint consumes vocabulary through narrow lookup APIs.
  • crates/mdwright owns CLI commands such as preview and the math translation surface.
  • Unsupported TeX is a typed error or visible fallback, never a panic.

Dependency comparison

MathJax is the coverage target because it documents both TeX input behavior and the supported macro table; it is not treated as a TeX-engine equivalence claim. The comparison axes are licence, signal, API fit at the mdwright boundary, and outcome.

CrateVersionLicenseSignalAPI fitDecision
logos0.16.1MIT OR Apache-2.0Mature lexer crate; high crates.io usage; active docs and repository.Good fit for byte-span tokenisation when the lexer stays policy-free and parser recovery remains separate.Accept for the lexer spike and later lexer work.
pulldown-latex0.7.1MITReachable repository and docs; moderate use.Pull parser for LaTeX-to-MathML. It does not expose the TeX AST/control needed for Unicode layout and bidirectional source translation.Reject as a core dependency; keep as a reference.
tex2math1.2.1LGPL-3.0-onlyRecent crate, but very low crates.io adoption.LaTeX-to-MathML conversion and CLI/wasm features. License and output-center do not match mdwright's component boundary.Reject.
latex2mathml0.2.3MITOlder release; moderate total downloads; reachable repository.Converts equations to MathML. It does not hide the source-translation or Unicode-layout decisions mdwright needs.Reject as a core dependency; keep as a reference if fixtures are useful.
math-core0.6.1MITRecent crate with low adoption; Rust 1.91.Converts LaTeX equations to MathML Core. The crate center is MathML Core, not Unicode layout or source translation.Reject as a core dependency; revisit only for conformance fixture ideas.
mathml-latex0.0.3MPL-2.0Early version, low recent usage, reachable repository.Converts between MathML and LaTeX, but would put MathML at mdwright's internal boundary.Reject.

Low-adoption terminal math rendering crates such as term-maths and tui-math remain rejected. Terminal delivery code belongs in crates/mdwright; TeX body structure belongs in mdwright-latex.

Rejected boundary shapes

  • Keep TeX bodies in mdwright-math. Braids Markdown span recognition (CommonMark + GFM + math-resilience rules) with TeX body support (MathJax input vocabulary, Unicode coverage, layout, translation). The two change for different reasons.
  • Wrap an existing LaTeX-to-MathML crate. The current Rust crates target MathML output. Wrapping one would either leak MathML as an unwanted intermediate interface or force mdwright to reconstruct TeX structure from MathML.

CLI reference

Auto-generated from clap's --help output by cargo xtask doc-cli. Edit the CLI definition in crates/mdwright/src/cli.rs (or the rule registry for list-rules); never edit this file by hand.

mdwright

Lints Markdown for stylistic and structural issues, with a public rule trait so projects can extend the standard library, plus a verified round-trip formatter.

Usage: mdwright [OPTIONS] <COMMAND>

Commands:
  check       Lint Markdown files and report diagnostics
  fix         Lint and apply safe autofixes in place
  fmt         Reformat Markdown files
  fmt-check   Verify formatting without writing
  list-rules  Print the rule catalogue
  explain     Print the long-form explanation of one lint rule
  render      Format the input and emit the rendered HTML to stdout
  preview     Format the input and render a static terminal Markdown preview
  math        Translate math source between LaTeX commands and Unicode
  config      Create mdwright configuration files
  lsp         Run as a Language Server Protocol server over stdio
  help        Print this message or the help of the given subcommand(s)

Options:
      --config <CONFIG>
          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply

  -v, --verbose...
          Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set

      --max-input-bytes <BYTES>
          Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely

          [default: 10000000]

      --reject-control-chars
          Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

mdwright check

Lint Markdown files and report diagnostics

Usage: mdwright check [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...
          Files and directories to scan. Directories are searched recursively. If omitted, `.` is scanned. A literal `-` reads stdin as `<stdin>`

Options:
      --check
          Exit with status 1 if any non-advisory diagnostic is found

      --config <CONFIG>
          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply

      --rules <RULES>
          Rule-selection spec. If omitted, `[lint] preset`, `select`, `extend-select`, and `ignore` from the config file apply. See module docs for syntax

  -v, --verbose...
          Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set

      --format <FORMAT>
          Output format

          Possible values:
          - pretty:  Human-readable, optionally coloured
          - compact: `file:line:col: rule: message` per line
          - json:    JSON Lines, v2 schema. See `docs/src/reference/diagnostic-schema.md`
          - json-v1: JSON Lines, v1 schema. Deprecated; emits a deprecation warning on stderr. Will be removed in a future release

          [default: pretty]

      --max-input-bytes <BYTES>
          Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely

          [default: 10000000]

      --color <COLOR>
          When to colour pretty output. `auto` (default) colours when stdout is a TTY; `always` forces colour; `never` disables it. Compact and JSON output are never coloured regardless

          [default: auto]
          [possible values: auto, always, never]

      --reject-control-chars
          Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection

  -j, --jobs <JOBS>
          Worker threads; 0 = rayon default (one per logical CPU)

          [default: 0]

      --no-suppress
          Ignore `<!-- mdwright: allow ... -->` suppression comments. Use to audit which diagnostics are silenced and where

  -h, --help
          Print help (see a summary with '-h')

mdwright fix

Lint and apply safe autofixes in place

Usage: mdwright fix [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...
          Files and directories to scan. Directories are searched recursively. If omitted, `.` is scanned. A literal `-` reads stdin as `<stdin>`

Options:
      --check
          Exit with status 1 if any non-advisory diagnostic is found

      --config <CONFIG>
          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply

      --rules <RULES>
          Rule-selection spec. If omitted, `[lint] preset`, `select`, `extend-select`, and `ignore` from the config file apply. See module docs for syntax

  -v, --verbose...
          Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set

      --format <FORMAT>
          Output format

          Possible values:
          - pretty:  Human-readable, optionally coloured
          - compact: `file:line:col: rule: message` per line
          - json:    JSON Lines, v2 schema. See `docs/src/reference/diagnostic-schema.md`
          - json-v1: JSON Lines, v1 schema. Deprecated; emits a deprecation warning on stderr. Will be removed in a future release

          [default: pretty]

      --max-input-bytes <BYTES>
          Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely

          [default: 10000000]

      --color <COLOR>
          When to colour pretty output. `auto` (default) colours when stdout is a TTY; `always` forces colour; `never` disables it. Compact and JSON output are never coloured regardless

          [default: auto]
          [possible values: auto, always, never]

      --reject-control-chars
          Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection

  -j, --jobs <JOBS>
          Worker threads; 0 = rayon default (one per logical CPU)

          [default: 0]

      --no-suppress
          Ignore `<!-- mdwright: allow ... -->` suppression comments. Use to audit which diagnostics are silenced and where

  -h, --help
          Print help (see a summary with '-h')

mdwright fmt

Reformat Markdown files

Usage: mdwright fmt [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...
          Files and directories to reformat. If omitted, `.` is used. A literal `-` reads from stdin and writes to stdout

Options:
      --check
          Exit 1 if any file would change; never write. Same shape as `prettier --check` / `rustfmt --check`

      --config <CONFIG>
          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply

      --diff
          Write a unified diff to stdout instead of editing files. Mutually exclusive with `--check`

  -v, --verbose...
          Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set

      --max-input-bytes <BYTES>
          Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely

          [default: 10000000]

      --stdin-filename <STDIN_FILENAME>
          File name to report when reading from stdin. Defaults to `<stdin>`. Useful when integrating with editors that pipe the buffer through

      --no-validate
          Skip the HTML-equivalence safety check that runs by default. The check parses both source and formatted output to HTML and refuses to write when they differ. Use this only if you have independent verification that the formatter is safe for the input, for example, a CI pipeline that already runs the check elsewhere

      --reject-control-chars
          Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection

      --explain-divergence
          When the HTML-equivalence gate rejects a file, print a unified diff of the source's HTML against the formatted output's HTML to stderr. Diagnostic surface for triaging gate failures; does not change the gate's pass/fail decision

      --explain-format
          Explain formatter decisions on stderr. Does not change write, check, diff, or validation behavior

      --range <LINE:COL-LINE:COL>
          Format only the smallest set of whole top-level blocks covering `LINE:COL-LINE:COL` (both ends inclusive of start, exclusive of end; 0-based LSP convention). Reads from stdin only; writes the covering blocks to stdout. Mutually exclusive with `--check` and `--diff`.

          Example: `--range 2:0-2:5` formats the block containing columns 0..5 of line 2.

      --math-render <MATH_RENDER>
          Delimiter rewrite policy for math regions at emit time. `none` (default) passes math through verbatim: today's behaviour. `commonmark-katex` is the same emission as `none` but greppable as an intent signal in build logs. `dollar` rewrites `\[…\]` to `$$ … $$` and `\(…\)` to `$ … $` for downstream renderers that prefer dollar delimiters; LaTeX environments are not rewritten. Overrides `[fmt.math] render` in the config file

          [possible values: none, commonmark-katex, dollar]

  -h, --help
          Print help (see a summary with '-h')

mdwright fmt-check

Verify formatting without writing

Usage: mdwright fmt-check [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...  Files and directories to check. If omitted, `.` is used. A literal `-` reads stdin and checks whether it would change

Options:
      --config <CONFIG>
          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
      --diff
          Write a unified diff to stdout for files that would change
      --stdin-filename <STDIN_FILENAME>
          File name to report when reading from stdin. Defaults to `<stdin>`. Useful when integrating with editors that pipe the buffer through
  -v, --verbose...
          Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
      --max-input-bytes <BYTES>
          Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
      --no-validate
          Skip the HTML-equivalence safety check that runs by default. The check parses both source and formatted output to HTML and refuses to write when they differ. Use this only if you have independent verification that the formatter is safe for the input, for example, a CI pipeline that already runs the check elsewhere
      --explain-divergence
          When the HTML-equivalence gate rejects a file, print a unified diff of the source's HTML against the formatted output's HTML to stderr. Diagnostic surface for triaging gate failures; does not change the gate's pass/fail decision
      --reject-control-chars
          Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
      --explain-format
          Explain formatter decisions on stderr. Does not change check, diff, or validation behavior
      --math-render <MATH_RENDER>
          Delimiter rewrite policy for math regions at emit time. Overrides `[fmt.math] render` in the config file [possible values: none, commonmark-katex, dollar]
  -h, --help
          Print help

mdwright render

Format the input and emit the rendered HTML to stdout.

Pipes the formatted output through the same HTML renderer the `format_validated` gate uses. Captured stdout is raw HTML by default; terminals may request ANSI-highlighted HTML with `--color`, and `--open` writes the HTML to a temporary file before opening it in the system browser.

Usage: mdwright render [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...
          File to render. A literal `-` (or an empty list) reads from stdin. Multiple paths are concatenated in argument order with a single newline between, then rendered as one document

Options:
      --config <CONFIG>
          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply

      --stdin-filename <STDIN_FILENAME>
          File name to report when reading from stdin. Defaults to `<stdin>`. Cosmetic; surfaced in error messages only

      --math-render <MATH_RENDER>
          Delimiter rewrite policy for math regions. See the corresponding flag on `mdwright fmt` for the modes

          [possible values: none, commonmark-katex, dollar]

  -v, --verbose...
          Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set

      --max-input-bytes <BYTES>
          Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely

          [default: 10000000]

      --render-profile <RENDER_PROFILE>
          HTML spelling profile. `pulldown` preserves the default renderer; `cmark-gfm` matches cmark-gfm spelling for renderer differences that do not require changing parser semantics. Overrides `[render] profile` in the config file

          [possible values: pulldown, cmark-gfm]

      --color <COLOR>
          When to colour HTML output. Captured stdout remains raw HTML under `auto`; `always` forces ANSI syntax highlighting and `never` disables it

          [default: auto]
          [possible values: auto, always, never]

      --reject-control-chars
          Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection

      --open
          Write rendered HTML to a temporary `.html` file and open it in the system browser. Stdout is left empty; stderr reports the file path

  -h, --help
          Print help (see a summary with '-h')

mdwright preview

Format the input and render a static terminal Markdown preview

Usage: mdwright preview [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...
          Files to preview. A literal `-` (or an empty list) reads from stdin. Multiple paths are concatenated in argument order with a single newline between, then previewed as one document

Options:
      --config <CONFIG>
          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply

      --stdin-filename <STDIN_FILENAME>
          File name to report when reading from stdin. Defaults to `<stdin>`. Cosmetic; surfaced in error messages only

      --color <COLOR>
          When to colour terminal output. `auto` colours when stdout is a TTY; `always` forces colour; `never` disables it

          [default: auto]
          [possible values: auto, always, never]

  -v, --verbose...
          Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set

      --math <MATH>
          How terminal preview handles math regions

          Possible values:
          - unicode: Render the conservative supported LaTeX subset as Unicode, falling back to source when unsupported
          - source:  Preserve math source bytes
          - off:     Disable special terminal math rendering

          [default: unicode]

      --max-input-bytes <BYTES>
          Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely

          [default: 10000000]

      --reject-control-chars
          Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection

  -h, --help
          Print help (see a summary with '-h')

mdwright math

Translate math source between LaTeX commands and Unicode

Usage: mdwright math [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...  Markdown files/directories to translate. Directories are searched recursively. If omitted, stdin is translated. A literal `-` reads stdin

Options:
      --config <CONFIG>
          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
      --to-unicode
          Translate LaTeX math source to editable Unicode math source
      --to-latex
          Translate Unicode math source to preferred LaTeX math source
  -v, --verbose...
          Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
      --check
          Exit 1 if any file or stdin payload would change; never write
      --max-input-bytes <BYTES>
          Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
      --diff
          Write a unified diff to stdout; never write files
      --reject-control-chars
          Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
      --write
          Rewrite Markdown files in place. This is required for file mutation; stdin always writes translated text to stdout
      --stdin-filename <STDIN_FILENAME>
          File name to report when reading from stdin. Defaults to `<stdin>`. Useful when integrating with editors that pipe the buffer through
  -h, --help
          Print help

mdwright config

Create mdwright configuration files

Usage: mdwright config [OPTIONS] <COMMAND>

Commands:
  init  Write a documented `.mdwright.toml` with every option set to its default
  help  Print this message or the help of the given subcommand(s)

Options:
      --config <CONFIG>          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
  -v, --verbose...               Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
      --max-input-bytes <BYTES>  Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
      --reject-control-chars     Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
  -h, --help                     Print help

mdwright config init

Write a documented `.mdwright.toml` with every option set to its default

Usage: mdwright config init [OPTIONS]

Options:
      --config <CONFIG>          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
      --path <PATH>              Path to write. Defaults to `.mdwright.toml` in the current directory [default: .mdwright.toml]
      --force                    Overwrite an existing file
  -v, --verbose...               Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
      --max-input-bytes <BYTES>  Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
      --reject-control-chars     Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
  -h, --help                     Print help

mdwright list-rules

Print the rule catalogue

Usage: mdwright list-rules [OPTIONS]

Options:
      --config <CONFIG>          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
  -v, --verbose...               Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
      --max-input-bytes <BYTES>  Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
      --reject-control-chars     Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
  -h, --help                     Print help

mdwright explain

Print the long-form explanation of one lint rule

Usage: mdwright explain [OPTIONS] <RULE>

Arguments:
  <RULE>  Kebab-case rule name (e.g. `bare-url`, `math/unbalanced-delim`)

Options:
      --config <CONFIG>          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
  -v, --verbose...               Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
      --max-input-bytes <BYTES>  Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
      --reject-control-chars     Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
  -h, --help                     Print help

mdwright lsp

Run as a Language Server Protocol server over stdio

Usage: mdwright lsp [OPTIONS]

Options:
      --config <CONFIG>          Explicit path to a config file. When omitted, mdwright walks up from `$PWD` looking, at each ancestor, for `.mdwright.toml`, `mdwright.toml`, or `pyproject.toml` containing a `[tool.mdwright]` table (in that precedence). The walk stops at the filesystem root or the first directory containing `.git/` (the workspace boundary). If nothing matches, built-in defaults apply
  -v, --verbose...               Increase log verbosity. `-v` = info, `-vv` = debug, `-vvv` = trace. `RUST_LOG` overrides this when set
      --max-input-bytes <BYTES>  Refuse to read any single file (or stdin payload) larger than this many bytes. mdwright treats its input as untrusted; this cap bounds memory use against pathological inputs. Default 10 MB is generous enough that no real Markdown document trips it. Pass `0` to disable the cap entirely [default: 10000000]
      --reject-control-chars     Refuse files (or stdin payloads) that contain C0 control bytes other than TAB, LF, FF, and CR. `CommonMark` accepts these verbatim (it only substitutes NUL with U+FFFD), but their presence is usually evidence the input is not Markdown, and pulldown's silent NUL rewrite makes round-trip idempotence undefined on such inputs. Off by default; opt-in for callers (CI gates, docs pipelines) that prefer hard rejection
  -h, --help                     Print help

Diagnostic schema

mdwright check --format=json emits one JSON object per line (JSON Lines), one object per diagnostic. The current schema is version 2, defined formally by diagnostic-schema.json (JSON Schema draft 2020-12).

The v1 schema remains available under --format=json-v1 for one release cycle and emits a deprecation warning to stderr.

Example record (pretty-printed)

{
  "schema_version": 2,
  "path": "docs/note.md",
  "severity": "error",
  "rule": {
    "name": "math/unbalanced-delim",
    "description": "TeX-style math open delimiter (`\\[`, `\\(`, `$$`, `$`) with no matching close.",
    "url": "docs/rules/math/unbalanced-delim.md"
  },
  "source": {
    "line": 42,
    "column": 10,
    "span_start": 1037,
    "span_end": 1039,
    "snippet": "text with \\[ unmatched math"
  },
  "message": "no matching `\\]` before end of document",
  "fix": null
}

In the wire format each record is a single line terminated by \n; the pretty-printed form above is for human reading only.

Field reference

Top-level

FieldTypeRequiredNotes
schema_versioninteger (2)yesBumped on incompatible schema changes.
pathstringyesFile path as given on the CLI; <stdin> for piped input.
severityenumyeserror, warning, or advisory. warning is reserved for future use.
ruleobjectyesSee below.
sourceobjectyesSee below.
messagestringyesSingle-sentence description of the problem.
fixobject | omittednoPresent when a replacement is suggested.

rule

FieldTypeNotes
namestringKebab-case identifier (e.g. bare-url, math/unbalanced-delim).
descriptionstringOne-line summary; same text as mdwright list-rules.
urlstringRepository-relative path into docs/rules/. Will become an absolute URL once the mdBook site is published.

source

FieldTypeNotes
lineinteger (≥ 1)1-indexed line of the diagnostic's first byte.
columninteger (≥ 1)1-indexed codepoint column.
span_startinteger (≥ 0)Byte offset of the first byte of the offending region.
span_endinteger (≥ 0)Byte offset one past the last byte.
snippetstringThe source line, with trailing newline stripped. Multi-line spans are clipped to the first line; the caret region still starts at column.

fix

FieldTypeNotes
replacementstringText to substitute for [span_start, span_end).
safebooleanmdwright fix only applies fixes with "safe": true.

Lifecycle

  • v2 is the current default. v1 remains available under --format=json-v1 for one release cycle and is then removed.
  • New schema versions bump schema_version and ship alongside the previous version for at least one cycle.

Validation

The schema is JSON Schema draft 2020-12 ($schema field in diagnostic-schema.json). Any draft 2020-12-compatible validator (jsonschema Python package, ajv for JavaScript, etc.) can validate output records against it.

Performance

mdwright is parallel by default (rayon over the file walk) and free of per-file interpreter startup. On a multi-thousand-file corpus the practical speedup over mdformat --check is the dominant cost difference; on small inputs both tools are sub-second and the comparison is dominated by process startup.

Measurement

ToolWall timeNotes
mdwright fmt-check108 ms ± 4 msRelease build, rayon over 79 files.
mdformat --check5.91 s ± 0.59 sDefault install, single-threaded.

Reproducer:

hyperfine --warmup 2 --runs 7 -N -i \
  './target/release/mdwright fmt-check <corpus>' \
  'mdformat --check <corpus>'
  • Corpus: 79 Markdown files, ~34.5k lines of math-heavy technical prose (a checkout of gentle-sga).
  • Host: Apple M4 Pro, macOS 26.4.1.
  • Versions: mdwright from this workspace, release profile; mdformat 0.7 (default plugins).
  • Result: 55× ± 6×. The lede claim of "≥ 50× faster" is the floor of this measurement.

What changes the multiplier

  • File count. mdwright's startup is fixed; mdformat re-pays interpreter cost per file when invoked per-file. On directory invocations both tools amortise startup over the walk, but mdformat still single-threads the loop.
  • Core count. mdwright scales with rayon's thread pool. On a single-core machine the multiplier drops; on a 16-core CI runner with a large corpus it climbs.
  • File size. Per-byte parse cost is closer than the wall-time ratio suggests; on a single very large file, the ratio approaches the per-byte ratio rather than the per-file ratio.

Reproducing locally

The bench harness used in development is Criterion, not hyperfine. See crates/mdwright/benches/README.md for cargo bench recipes and corpus configuration. The hyperfine command above is the end-to-end smoke test; the Criterion benches isolate parse, lint, and format costs separately.

Public API Surface

mdwright is a virtual workspace, not a facade crate. Command users install the mdwright package. Rust library users depend on the component crate that owns the capability they need.

The API is still pre-1.0. Import paths and operation shapes may change in minor releases under the pre-1.0 caveats.

Use mdwright as a library

A minimal embed that parses Markdown, runs the standard lint catalogue, and formats with defaults. Add the three crates to Cargo.toml:

[dependencies]
mdwright-document = "0.1"
mdwright-format = "0.1"
mdwright-lint = "0.1"

Then:

use mdwright_document::Document;
use mdwright_format::{FmtOptions, format_validated};
use mdwright_lint::{LintOptions, RuleSet};

fn main() -> anyhow::Result<()> {
    let source = "# Hello\n\nSee https://example.com for the spec.\n";

    // Parse once. `Document` holds source coordinates and recognised facts.
    let doc = Document::parse(source)?;

    // Lint with the shipped default rule set.
    let rules = RuleSet::stdlib_defaults();
    for diag in rules.check_with(&doc, LintOptions::default()) {
        println!("{}: {}", diag.rule, diag.message);
    }

    // Format. Returns a verified rewrite or a `FormatError` on safety-gate refusal.
    let formatted = format_validated(&doc, &FmtOptions::default())?;
    print!("{formatted}");
    Ok(())
}

The table below maps every capability to its owning crate. For the surface a particular crate exposes, follow its docs.rs link from the project README.

Common User Surfaces

CapabilityPublic surfaceOwning crate
Parse Markdown into stable factsDocument, ParseError, ParseOptionsmdwright-document
Configure Markdown recognitionExtensionOptions, GfmOptions, GfmAutolinkPolicy, MystOptions, PandocOptionsmdwright-document
Render Markdown to HTMLRenderOptions, RenderProfile, render_html, render_html_with_options, render_html_with_render_optionsmdwright-document
Format parsed or source MarkdownFmtOptions, WrapStrategy, FormatError, format_document, format_document_with_report, format_source, format_validated, format_validated_with_reportmdwright-format
Format editor rangesCheckpointTable, format_range, format_range_with_checkpointsmdwright-format
Compare formatter semanticssemantically_equivalent, first_divergencemdwright-format
Represent TeX and Unicode math-body diagnostics, vocabulary, Unicode layout, source translation, and outputLatexError, LatexErrorKind, SourceSpan, CommandInfo, CommandCategory, ArgumentShape, SupportStatus, lookup_command, latex_symbol, unicode_symbol_latex, unicode_super, unicode_sub, RenderedLatex, render_unicode_math, Translation, TranslationStatus, TranslationLoss, translate_latex_to_unicode, translate_unicode_to_latex, translate_latex_ranges_to_unicode, translate_unicode_ranges_to_latexmdwright-latex
Recognise Markdown math regionsscan_math_regions, render::convert_for_dollar, MathBody::source_rangemdwright-math
Run lint rulesRuleSet, LintOptionsmdwright-lint
Consume lint outputDiagnostic, Fix, Severity, Snippet, DuplicateRuleNamemdwright-lint
Apply safe lint fixesapply_safe_fixesmdwright-lint
Resolve configurationConfig, ConfigErrormdwright-config
Start the editor serverservemdwright-lsp
Build custom command binariesrun_with_rules, discover_markdownmdwright

Document is parse/query only. Linting, formatting, safe-fix application, config discovery, command delivery, and editor delivery stay in their owning crates.

The mdwright-latex surface targets common MathJax-style math bodies where Unicode can represent the source or terminal output honestly. Unicode-to-LaTeX translation is parser-backed for the supported subset: the crate lexes and parses Unicode mathematical source before emitting canonical LaTeX. It is not a TeX engine API, a browser layout API, or a diagram recogniser. Macro expansion, unsupported package commands, layout-heavy source, and unknown Unicode return typed errors, losses, or visible fallback output rather than hidden approximations.

Extension Surfaces

SurfaceUse
LintRuleImplement a downstream lint rule over &Document.
RuleSet::{new, add, remove, by_name, contains, iter, names, check, check_with}Compose standard and downstream rules.
mdwright_lint::stdlib::{defaults, all, by_name, names}Select standard rules for custom binaries or tests.
Diagnostic, Fix, Severity, SnippetReport lint findings and optional safe fixes.
rule_doc_url, docs_url, DOCS_URL_DEFAULTAttach stable documentation links to diagnostics.
InfoStringTypo::{new, with_extra}Extend the standard info-string vocabulary without forking the rule.
mdwright::run_with_rulesReuse the command package with a custom RuleSet.
mdwright::discover_markdownReuse command file-discovery policy in a custom command.

The standard rule structs under mdwright_lint::stdlib are public so callers can build precise RuleSets. Helper functions inside those rules are not public extension points unless listed above.

Advanced Document Facts

These facts are public because formatter, lint, audit, and custom-rule callers need stable source ranges without learning pulldown event shapes.

Fact familyPublic surface
Text and blocksTextSlice, InlineCode, CodeBlock, HtmlBlock, InlineHtml, Heading
Lists and referencesListGroup, ListItem, LinkDef
FrontmatterFrontmatter, FrontmatterDelimiter
AutolinksAutolinkFact, AutolinkOrigin
SuppressionsSuppression, SuppressionKind, AllowScope
PositionsLineIndex, LineIndexError, BlockCheckpointFact
MathMathRegion, MathSpan, MathError
Formatter factsStructuralSpan, StructuralKind, InlineDelimiterSlot, InlineDelimiterKind, UnorderedListMarkerSite, OrderedListMarkerSite, HeadingAttrSite, InlineLinkDestinationSlot, ReferenceDefinitionSite, TableSite, TableRowSite, TableCellSite, TableAlign, WrappableParagraph, ParagraphHardBreak

Formatter-facing facts expose accessors instead of public fields where practical. That keeps invalid construction out of downstream code while preserving stable ranges for rule authors and diagnostic tooling.

Not Public Surface

  • Root facade exports. There is no root package and no mdwright::{Document, FmtOptions, RuleSet} import path.
  • Parser internals, pulldown events, source/canonical byte-map internals, and the private document tree.
  • Source, CanonicalSource, OffsetMap, ByteSpan, OriginalSpan, NormalisedLabel, and heading trailer scanners.
  • Top-level block checkpoint parser helpers. Use mdwright_format::CheckpointTable.
  • Formatter rewrite candidates, rewrite snapshots, verification signatures, owner IDs, and byte-application internals.
  • mdwright-latex lexer tokens, parser cursors, AST nodes, command-registry storage, and Unicode layout internals.
  • Lint suppression maps, diagnostic sorting internals, and stdlib helper functions not listed as extension surfaces.
  • TOML raw schema structs and config discovery internals.
  • CLI and LSP state machines beyond the documented entry points.

Crates.io release

The release workflow publishes the component crates to crates.io and then lets cargo-dist create the GitHub Release with binary artifacts. The workflow runs when a v<semver> tag is pushed. A manual dry_run dispatch runs the same gates but skips crates.io upload and GitHub Release creation.

One-time setup

Create a scoped crates.io token with publish-new, publish-update, and yank permissions. Add it to the GitHub repository as the Actions secret CARGO_REGISTRY_TOKEN.

Local preflight

Run the gates before tagging:

cargo fmt --check
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace --no-fail-fast
cargo doc --workspace --no-deps
mdbook build docs/
cargo xtask doc-rules --check
cargo xtask doc-cli --check
cargo xtask doc-config --check
python3 scripts/check_package_docsrs.py --allow-dirty
actionlint .github/workflows/*.yml

Check public API drift:

for crate in mdwright-latex mdwright-math mdwright-document mdwright-format mdwright-lint mdwright-config mdwright-lsp mdwright; do
  cargo public-api --simplified -p "$crate" > /tmp/"$crate"-public.txt
  diff -u "docs/api-review/$crate-public.txt" /tmp/"$crate"-public.txt
done

If a public API change is intentional, regenerate the baselines in the same commit:

scripts/update-api-review.sh

Version and changelog

The tag must match the workspace package version exactly. For version 0.1.0, tag v0.1.0.

CHANGELOG.md must contain a release section named ## [0.1.0]. The release workflow extracts that section before any crate is published.

Dry run

Before tagging, run the Release workflow manually with dry_run: true. It verifies the workspace, builds the cargo-dist artifacts, checks package contents, simulates docs.rs from packaged tarballs, and skips live publishing.

Publish

After the release commit is on main, create and push the tag:

git tag -s v0.1.0 -m "mdwright v0.1.0"
git push origin v0.1.0

The workflow publishes crates in dependency order:

mdwright-latex -> mdwright-math -> mdwright-document -> mdwright-format -> mdwright-lint -> mdwright-config -> mdwright-lsp -> mdwright

It waits 90 seconds between crates so crates.io can index each newly published dependency before downstream crates are published.

If publishing fails after a crate has uploaded, stop. Crates.io versions are immutable. Fix the problem, bump the workspace version, update the changelog, and tag a new commit.

Release evidence

mdwright release candidates are judged by local evidence, not by a claim of full cmark-gfm parser equivalence. The release claim is:

mdwright is a round-trip-safe Markdown formatter and linter with classified GFM/parser divergences and an opt-in mdformat-compatible style profile.

The release bundle lives under target/mdwright/release/. It is a local artifact; do not commit it.

Aggregate the evidence

Run:

cargo xtask release-evidence --output target/mdwright/release

The command writes:

  • target/mdwright/release/release-evidence.json
  • target/mdwright/release/release-evidence.md

The command does not rerun every expensive gate. It records git state and tool versions, reads existing machine reports, points at manual evidence notes, and lists blockers when evidence is missing. This keeps the command narrow: it summarizes release evidence instead of duplicating parser audit, mdformat parity, production soak, fuzzing, packaging, or benchmarks.

Refresh machine reports

Run these before aggregating a release candidate:

cargo check --workspace --all-targets
cargo nextest run --workspace --no-fail-fast
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --check
cargo doc --workspace --no-deps
mdbook build docs/
cargo xtask doc-rules --check
cargo xtask doc-cli --check
cargo xtask doc-config --check
python3 scripts/check_package_docsrs.py --allow-dirty
actionlint .github/workflows/*.yml

cargo xtask parser-audit --case-set all --ensure-tools --include-comrak
cargo xtask mdformat-parity \
  --corpus-root docs \
  --corpus-name mdwright-docs \
  --mdwright-config .mdwright.toml \
  --mdformat-config xtask/fixtures/mdformat-parity/mdformat.toml
cargo xtask production-soak \
  --corpus-root <external-corpus-path> \
  --output target/mdwright/production-soak

<external-corpus-path> is a directory of Markdown files used as the production-soak input; set it to the path of the external corpus you run releases against (or to MDWRIGHT_CORPUS_ROOT if you have one configured). Record the path in the release notes.

Record the fast-check result in target/mdwright/release/fast-checks.md. The aggregator treats that file as the manual proof that the local workspace gate was refreshed.

Refresh packaging evidence

The detailed crates.io release checklist lives at Crates.io release. Before tagging, run the release workflow manually with dry_run: true.

Package every publishable crate:

cargo package --workspace --exclude xtask --exclude mdwright-extra-example --allow-dirty --no-verify

Install the command package into an isolated root:

tmp="$(mktemp -d)"
CARGO_HOME="$tmp/cargo-home" \
CARGO_TARGET_DIR="$tmp/target" \
cargo install --path crates/mdwright --locked --root "$tmp/install"
"$tmp/install/bin/mdwright" --help
"$tmp/install/bin/mdwright" explain bare-url
"$tmp/install/bin/mdwright" check docs/src/introduction.md
"$tmp/install/bin/mdwright" fmt-check docs/src/introduction.md
"$tmp/install/bin/mdwright" render docs/src/introduction.md >/tmp/mdwright-render.html
"$tmp/install/bin/mdwright" lsp --help

Record the dry-run result in target/mdwright/package-dry-run/report.json. The report can be generated by hand; the aggregator only requires stable JSON with enough fields for a human to inspect alongside the Markdown report.

Refresh fuzz and benchmark evidence

Replay the fuzz corpora:

cargo +nightly fuzz run fuzz_parse_format -- -runs=0
cargo +nightly fuzz run fuzz_idempotence -- -runs=0
cargo +nightly fuzz run fuzz_structured_idempotence -- -runs=0
cargo +nightly fuzz run fuzz_lint -- -runs=0
cargo +nightly fuzz run fuzz_verbatim_identity -- -runs=0
cargo +nightly fuzz run fuzz_latex_render -- -runs=0
cargo +nightly fuzz run fuzz_latex_translate -- -runs=0
cargo +nightly fuzz run fuzz_markdown_math_translate -- -runs=0
cargo +nightly fuzz run fuzz_unicode_latex_roundtrip -- -runs=0

Write the replay result to target/mdwright/release/fuzz-replay.md.

Run sustained fuzz rounds with the helper script. The standard release target is three clean 10-minute rounds for every fuzz target:

scripts/fuzz-round.sh 600 3

The script writes target/mdwright/release/fuzz-sustained.md and per-target logs under target/mdwright/release/fuzz-sustained/logs/.

Run the Criterion comparison and write the result to target/mdwright/release/benchmarks.md:

cargo bench -p mdwright --bench format_bench --bench lint_bench -- --baseline pre-parser-boundary

Re-capture on the same hardware before declaring a regression.

Interpret the report

release-evidence.md is ready for review when:

  • the worktree is clean;
  • every required report is present;
  • manual fuzz, benchmark, and fast-check notes are present;
  • parser audit has no unclassified differences, mitigation rows, or uncontained panics;
  • mdformat parity has no unclassified differences, semantic drift, parse errors, idempotence failures, or open bugs;
  • production soak has no parse errors, validation errors, idempotence failures, or fmt-check disagreements;
  • packaging and isolated install dry runs passed.

Accepted divergences are documented in:

Semver policy

mdwright follows Semantic Versioning. This page enumerates the public API surface that the version number commits to.

Covered

A change to any of the following is a breaking change and requires a major-version bump (or a minor bump while we are pre-1.0; see Pre-1.0 caveats below):

  • Every pub item exported from the publishable component crates: mdwright-latex, mdwright-math, mdwright-document, mdwright-format, mdwright-lint, mdwright-config, and mdwright-lsp.
  • The command-package helpers exported from mdwright: run_with_rules and discover_markdown.
  • CLI subcommands, their flags, and their exit codes. The exit-code mapping appears in reference/cli.md.
  • The configuration schema for mdwright.toml, .mdwright.toml, and pyproject.toml [tool.mdwright]. The schema is generated into configuration.md from the mdwright-config schema source.
  • The --format=json (v2) diagnostic schema at reference/diagnostic-schema.md and the JSON Schema at docs/diagnostic-schema.json. New optional fields are non-breaking; renaming or removing a field is breaking.
  • The mdwright_lint::LintRule trait signature. Adding a method with a default body is non-breaking; adding a method without a default, or changing an existing signature, is breaking.

Not covered

The following are free to change in any release, including patch releases:

  • Internal items (anything pub(crate) or private). Refactors that move modules around are not breaking unless they change a pub export.
  • The on-disk layout of build artifacts (target/), cached state, and intermediate files.
  • The prose output of mdwright explain <rule>. The rule names and their existence are covered; the wording is not.
  • Performance characteristics. We aim not to regress and track this through Criterion benches, but we do not commit to a wall-clock floor.
  • The contents of docs/, CHANGELOG.md, and other repo metadata.
  • The format of tracing output and log lines.

Pre-1.0 caveats

Until v1.0, minor versions may include breaking API changes. The 0.x sequence is deliberately permissive so the surface can settle without dragging compatibility shims forward. The discipline still applies: every break appears in CHANGELOG.md under Breaking changes in the relevant version's section, with a migration note where the rewrite is non-obvious.

Patch releases (0.x.Y) never introduce breaking changes.

MSRV (minimum supported Rust version)

The MSRV is rust-version = "1.91", declared in Cargo.toml. Bumping the MSRV is treated as a minor-version bump pre-1.0; post-1.0 it will be a major bump. CI runs the test suite on both stable and the MSRV floor on every push.

mdwright spec deviations

The mdwright formatter targets the GFM 0.29-gfm spec (crates/mdwright/tests/gfm-spec/spec.txt, vendored from cmark-gfm). Every example is exercised by crates/mdwright/tests/gfm_spec.rs as a parse → format → parse → format round-trip and compared against the source HTML and the normalised event stream.

This document is the user-facing index of where mdwright currently does not byte-for-byte round-trip the spec. It is split into two parts because the underlying mechanism does:

  • Editorial deviations: choices we have made and intend to keep. Curated in crates/mdwright/tests/gfm-spec/allowlist.toml. Each entry has a one-line rationale and a pointer to where the decision is documented.
  • Tracked regressions: known divergences that we intend to fix. Recorded in crates/mdwright/tests/gfm-spec/snapshot.txt. The snapshot is asserted byte-for-byte, so any drift, whether regression or improvement, fails CI and forces a deliberate update.

The gfm_spec_coverage test prints the live count for both groups; the numbers below are a snapshot of the current main branch.

Coverage

BucketExamples
Spec examples total672
Matching637
Editorial deviations35
Tracked regressions0

A case may fail more than one comparison kind (semantic, idempotence); the snapshot file is keyed by (case, kind) and currently lists no tracked regressions.

Parser Backend Drift

The formatter round-trip gate is not the same as cmark-gfm renderer equivalence. cargo xtask parser-audit compares mdwright's current pulldown-cmark backend with cmark-gfm and renders mdwright through the opt-in cmark-gfm render profile. The current GFM-spec parser audit has 15 classified HTML differences, 0 source-position differences, and 0 unclassified differences.

The remaining differences are accepted constraints of the current backend:

ClassCountStatus
Emphasis delimiter-stack resolution9accepted parser-backend drift
Raw HTML block indentation/newline spelling4accepted render drift with stable source facts
Task-list examples marked disabled by the spec2accepted spec-fixture drift
Contained upstream parser panic1converted to ParseError

[render] profile = "cmark-gfm" changes only HTML spelling for mdwright render: quote escaping, link-destination escaping, ordinary GFM table layout, task-list checkbox spelling, and one raw-HTML newline case where the parser already exposes enough structure. It does not change emphasis resolution or parser tree semantics. Full cmark-gfm parser equivalence would require upstream pulldown-cmark changes, a maintained fork, or a parser backend switch.

Editorial deviations

Pulldown text-chunking deviations

35 spec examples currently fail the AST-event comparison only; HTML matches byte-for-byte and round-trip is idempotent. The mismatch reflects pulldown-cmark's text-run chunking: pulldown splits long runs of text into events at points cmark-gfm does not, so the normalised Event::Text(…) stream differs even though every other event lines up and every rendered HTML byte agrees.

The triage rule, applied at the snapshot level, is:

For each (case, kinds) in snapshot.txt:
  if kinds == {"ast"} and case has no other entry:
    -> allowlist.toml (bucket = "pulldown-text-chunking")
  else:
    -> stays in snapshot.txt (tracked regression)

Affected cases: 5, 6, 7 (Tabs, CM §2.2); 16, 19 (Thematic breaks, CM §4.1); 61 (Setext headings, CM §4.3); 102, 103 (Fenced code blocks, CM §4.5); 214, 230 (Block quotes, CM §5.1); 232, 242, 248, 249, 251, 252, 256, 264, 265, 266, 268 (List items, CM §5.2); 320 (Backslash escapes, CM §2.4); 321, 324, 330, 333 (Entity refs, CM §2.5); 393, 411 (Emphasis, CM §6.2); 499, 500, 503, 520, 528, 536 (Links, CM §6.3); 640 (Raw HTML, CM §6.8).

The bucket name is load-bearing: if a future per-case investigation disproves the chunking explanation for one of the cases above, remove its entry from allowlist.toml and let it re-enter the snapshot as a tracked regression.

Tracked regressions

There are currently no tracked GFM-spec formatter regressions. Any future non-allowlisted failure appears in crates/mdwright/tests/gfm-spec/snapshot.txt and fails the snapshot test until it is fixed or deliberately classified.

mdformat-mkdocs parity deviations

mdwright matches mdformat-mkdocs byte-for-byte for the four Markdown extensions covered in Markdown extensions. The parity test at crates/mdwright/tests/extension_parity.rs enforces this against five committed reference fixtures. Known divergences below; each row exists because the upstream pulldown-cmark parser doesn't surface enough information for mdwright to round-trip the source faithfully.

ConstructSource pattern that divergesWhy
Heading attribute, quoted value# H {title="hello world"}pulldown-cmark 0.13's heading-attribute parser splits the trailer on whitespace and ignores "…" quoting. Pulldown surfaces two attrs (title="hello, world") instead of one. mdformat-mkdocs (python-markdown's attr_list) handles the quoted form correctly. Tracked upstream; will resolve when pulldown lands the fix.

The parity test refuses to silently accept new divergences: any byte-for-byte mismatch fails the test and forces a deliberate add to this table (with a rationale and an upstream pointer) or a fix in mdwright's emit path.

MyST + Pandoc directive parity

mdwright preserves MyST directive containers, Pandoc fenced divs, inline roles, MyST substitutions, Pandoc inline attribute spans, and MyST % line comments byte-verbatim. See MyST + Pandoc directives for the full scope. The bar is idempotence-on-mode, not byte-equal round-trip with mdformat-mkdocs: mdformat-mkdocs does not implement these constructs at all, so there is no upstream reference to diff against. The vendored jupyter-book demo at crates/mdwright/tests/external/jupyter_book_minimal/ plus the per-construct regressions at crates/mdwright/tests/regressions/{directive_*,inline_role_*,myst_*}.in are the safety net.

ConstructSource pattern that divergesWhy
Malformed :::{name} sourceBare :::{warning} Experimental with no closerPulldown parses the opener as part of a definition-list or paragraph; mdwright's directive overlay matches on byte-range overlap and emits the union of the tree-node range and the directive region, so the bytes survive, but the surrounding misclassified bytes flow through pulldown's normal path. Fix the source by closing the directive.

How to read the live numbers

cargo test --release --test gfm_spec gfm_spec_coverage -- --nocapture

prints, at the top of its output:

gfm spec coverage:
  total cases:        <n>
  fully matching:     <n>
  intentional dev:    <n>
  tracked regression: <n>
  unexpected:         <n>

These are the source of truth; the table above is a snapshot for the release notes.

Updating the snapshot

After a deliberate fix (or an accepted editorial deviation):

# A fix that removes (case, kind) entries from snapshot.txt:
MDWRIGHT_UPDATE_SNAPSHOT=1 cargo test --release --test gfm_spec gfm_spec_snapshot

# An editorial deviation: add a row to crates/mdwright/tests/gfm-spec/allowlist.toml
# *before* regenerating the snapshot, then run the same command.

The snapshot test fails on any drift; CI will not silently accept a regression that happens to look like an improvement, and an improvement that isn't reflected in the snapshot fails just as loudly.