Skip to content

chore: add internal markdown link check#21831

Open
Geethapranay1 wants to merge 3 commits intoapache:mainfrom
Geethapranay1:chore/internal-markdown-link-check
Open

chore: add internal markdown link check#21831
Geethapranay1 wants to merge 3 commits intoapache:mainfrom
Geethapranay1:chore/internal-markdown-link-check

Conversation

@Geethapranay1
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

datafusion did not have a CI check for broken links in markdown content, docs workflows build and deploy docs, and dev checks formatting and spelling, but none of them validate link targets.
This pr adds a dedicated link check for internal markdown links so broken references fail early in PRs.
I kept the scope internal-only to avoid flaky CI failures from external websites and rate limits.
Rust doc comments remain covered by the existing rustdoc CI job.

What changes are included in this PR?

  • Added a new Dev workflow job, Check Markdown Links, in dev.yml.
  • Added LYCHEE_VERSION pin in tool_versions.sh.
  • Added markdown_link_check.sh to run lychee on the selected markdown paths.
  • Added lychee.toml with internal-link policy and exclusions.
  • Added check markdown links to required status checks in .asf.yaml.
  • Updated contributor testing docs with the new local command and scope note.
  • Fixed internal markdown links that failed under the new check in:
    • roadmap.md
    • 49.0.0.md
    • overview.md
    • dataframe.md
    • format_options.md

Are these changes tested?

Yes,

  • python3 ci/scripts/check_asf_yaml_status_checks.py passed.
  • bash -n ci/scripts/markdown_link_check.sh passed.
  • bash ci/scripts/markdown_link_check.sh passed with 0 errors.
  • cargo fmt --all --check passed.

OK: All 5 required_status_checks match existing GitHub Actions jobs.
🔍 12824 Total (in 0s) ✅ 490 OK 🚫 0 Errors 👻 12334 Excluded

Are there any user-facing changes?

No,

There is one contributor-facing CI change: PRs now fail when internal markdown links break in the checked markdown files.

@github-actions github-actions Bot added documentation Improvements or additions to documentation development-process Related to development process of DataFusion labels Apr 24, 2026
@Geethapranay1
Copy link
Copy Markdown
Contributor Author

@comphead PTAL


Run the internal markdown link check locally:

```shell
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very nice

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for local we need to document mapfile is needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also mapfile is not available on macOS

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and cargo install lychee

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Geethapranay1 for the PR

I tried to check it locally but with macOS I'm missing some bash commands. Would you help to attach to the PR how bad link would look like.

I'm assuming it should be very clear to the user what link needs to be fixed

@Geethapranay1
Copy link
Copy Markdown
Contributor Author

Thanks @comphead for the detailed review and for testing this on macOS.

I will:

  • update the local script to avoid macOS-missing bash commands (mapfile),
  • document local prerequisites including cargo install lychee,
  • add an example of how a broken link error looks in output so it is clear what to fix.

@Geethapranay1
Copy link
Copy Markdown
Contributor Author

@comphead PTAL, i have changed according to the review


```text
[docs/source/user-guide/cli/overview.md]:
[ERROR] file:///.../docs/source/user-guide/cli/missing-page.md | Cannot find file: File not found. Check if file exists and path is correct
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, lychee doesn't refer to a specific line in the md file?

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Geethapranay1 this pr makes a lot of sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: Create CI action that validates links in md files

2 participants