Skip to content

Commit 6f9c842

Browse files
authored
ci: Add script &CI to check dead links (#45)
* fix dead links * fix dead links * fix dead links * fix dead links * fix dead links * fix dead links * fix dead links * Update exclude_patterns.txt * Improve broken link checking workflow Replace the custom curl-based link checking with the mainstream lychee-based solution. - Add PR-triggered workflow using lychee-action - Create scheduled workflow that creates issues for broken links - Add .lycheeignore file for URL exclusion patterns - Update README with link checking documentation The lychee tool is more efficient and reliable than the previous solution, while providing additional features like caching and exclusion rules. * Update link checking workflow and remove README changes - Restore README to original state - Convert all comments to English in workflow files - Update the format of .lycheeignore file * Remove obsolete exclude_patterns.txt The exclusion patterns are now handled by .lycheeignore file in the new lychee-based implementation * Fix local file link checking issues - Add patterns to .lycheeignore for SVG and content directory files - Update workflow configurations to handle local file paths better - Ensure consistent settings between PR and scheduled workflows * Add alternative markdown-link-check workflow - Create alternative workflow using markdown-link-check tool - Configure with equivalent ignore patterns - Set as manually triggered workflow for comparison testing * Remove alternative workflow and optimize GitHub API rate limits - Remove markdown-link-check alternative implementation - Optimize lychee configuration to handle GitHub API rate limits - Add caching to PR workflow and increase cache time to 48h - Add specific GitHub patterns to .lycheeignore to reduce API requests - Increase timeout, retries and wait time between requests * Fix Hugo rendering issues in link checking - Update workflows to build Hugo site before checking links - Split link checks into external and generated site checks - Check external links in source files - Check local links in the generated Hugo site - Combine reports for better analysis - Fix the problem with Hugo rendering paths differently than source files * Simplify link checking to focus on external URLs only - Remove Hugo build steps due to template errors - Focus on checking external URLs only (HTTP/HTTPS) - Add Node.js setup for theme dependencies - Specify stable Hugo version to avoid breaking changes - Add diagnostic steps to check Hugo configuration * Improve broken link detection workflows 1. Update lychee configuration to fix API rate limit issues \n2. Enhance .lycheeignore file to handle Hugo path inconsistencies \n3. Optimize GitHub workflow configurations \n4. Add documentation * Enhance .lycheeignore with detailed comments and pattern categories
1 parent a661cec commit 6f9c842

File tree

4 files changed

+237
-0
lines changed

4 files changed

+237
-0
lines changed
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
name: Scheduled Broken Links Check
2+
3+
on:
4+
schedule:
5+
- cron: "0 0 * * 0"
6+
workflow_dispatch:
7+
8+
jobs:
9+
check-links:
10+
runs-on: ubuntu-latest
11+
permissions:
12+
contents: read
13+
issues: write
14+
15+
steps:
16+
- name: Checkout repository
17+
uses: actions/checkout@v4
18+
with:
19+
fetch-depth: 0
20+
21+
- name: Restore lychee cache
22+
uses: actions/cache@v4
23+
with:
24+
path: .lycheecache
25+
key: cache-lychee-scheduled-${{ github.sha }}
26+
restore-keys: |
27+
cache-lychee-scheduled-
28+
cache-lychee-
29+
30+
- name: Check external links
31+
id: lychee-external
32+
uses: lycheeverse/lychee-action@v2
33+
env:
34+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
35+
with:
36+
args: >-
37+
--cache
38+
--max-cache-age 72h
39+
--verbose
40+
--no-progress
41+
--exclude-path ".git"
42+
--exclude-path "node_modules"
43+
--exclude-path "themes"
44+
--exclude-path "static/lib"
45+
--scheme "https"
46+
--scheme "http"
47+
--max-retries 6
48+
--retry-wait-time 10
49+
--timeout 45
50+
--max-concurrency 4
51+
--github-token "${{ github.token }}"
52+
'./**/*.md'
53+
'./**/*.html'
54+
fail: false
55+
format: markdown
56+
output: ./lychee-external-report.md
57+
58+
- name: Check report content
59+
id: check-report
60+
run: |
61+
if [ -f ./lychee-external-report.md ] && [ -s ./lychee-external-report.md ] && grep -q "Broken links found" ./lychee-external-report.md; then
62+
echo "broken_links=true" >> $GITHUB_OUTPUT
63+
else
64+
echo "broken_links=false" >> $GITHUB_OUTPUT
65+
fi
66+
67+
- name: Create issue
68+
if: steps.lychee-external.outputs.exit_code != 0 && steps.check-report.outputs.broken_links == 'true'
69+
uses: peter-evans/create-issue-from-file@v5
70+
with:
71+
title: 🔍 External Broken Links Report
72+
content-filepath: ./lychee-external-report.md
73+
labels: bug, documentation
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
name: Check Broken Links
2+
3+
on:
4+
pull_request:
5+
types: [opened, synchronize, reopened]
6+
# Optional: Add scheduled checks
7+
# schedule:
8+
# - cron: "0 0 * * 0" # Runs once every Sunday
9+
10+
jobs:
11+
check-links:
12+
runs-on: ubuntu-latest
13+
permissions:
14+
contents: read
15+
pull-requests: write
16+
17+
steps:
18+
- name: Checkout repository
19+
uses: actions/checkout@v4
20+
with:
21+
fetch-depth: 0
22+
23+
# Optimize caching strategy to reduce API requests
24+
- name: Restore lychee cache
25+
uses: actions/cache@v4
26+
with:
27+
path: .lycheecache
28+
key: cache-lychee-${{ github.sha }}
29+
restore-keys: |
30+
cache-lychee-${{ github.event.pull_request.base.sha }}
31+
cache-lychee-
32+
33+
# Check external links only
34+
- name: Check external links
35+
id: lychee-external
36+
uses: lycheeverse/lychee-action@v2
37+
env:
38+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
39+
with:
40+
args: >-
41+
--cache
42+
--max-cache-age 72h
43+
--verbose
44+
--no-progress
45+
--exclude-path ".git"
46+
--exclude-path "node_modules"
47+
--exclude-path "themes"
48+
--exclude-path "static/lib"
49+
--scheme "https"
50+
--scheme "http"
51+
--max-retries 6
52+
--retry-wait-time 10
53+
--timeout 45
54+
--max-concurrency 4
55+
--github-token "${{ github.token }}"
56+
'./**/*.md'
57+
'./**/*.html'
58+
fail: true
59+
format: markdown
60+
output: ./lychee-external-report.md
61+
62+
# Add check results as PR comment
63+
- name: Create PR comment
64+
uses: peter-evans/create-or-update-comment@v3
65+
if: github.event_name == 'pull_request' && steps.lychee-external.outputs.exit_code != 0
66+
with:
67+
issue-number: ${{ github.event.pull_request.number }}
68+
body-file: ./lychee-external-report.md

.lycheeignore

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# .lycheeignore - Link Checker Ignore Patterns
2+
#
3+
# This file contains regex patterns for URLs that should be ignored by the lychee link checker
4+
# during GitHub Actions workflows. One pattern per line.
5+
#
6+
# Pattern Categories:
7+
# - Example domains: Standard placeholder URLs
8+
# - Development URLs: Local development links (localhost, etc.)
9+
# - Social media links: Often have anti-scraping measures
10+
# - Restricted files: Files that may require authentication
11+
# - Local file paths: Paths that exist in production but not in CI
12+
# - GitHub patterns: To reduce API rate limiting issues
13+
# - Hugo-specific patterns: To handle Hugo path differences
14+
# - Protocol-specific links: Special protocols like mailto, slack, etc.
15+
#
16+
# Common Pattern Syntax:
17+
# - ^ = Start of the URL
18+
# - $ = End of the URL
19+
# - \. = Literal dot (escaped)
20+
# - .* = Any character, any number of times
21+
# - [^/]+ = One or more characters that are not a slash
22+
23+
# Example domains
24+
^https?://example\.com
25+
^https?://example\.org
26+
27+
# Common temporary URLs or local development URLs
28+
^https?://localhost
29+
^https?://127\.0\.0\.1
30+
^https?://0\.0\.0\.0
31+
32+
# Social media links (often have anti-scraping measures that may cause checks to fail)
33+
^https?://(www\.)?linkedin\.com
34+
^https?://(www\.)?twitter\.com
35+
^https?://(www\.)?facebook\.com
36+
^https?://(www\.)?t\.co
37+
38+
# Files that may have restricted access
39+
\.pdf$
40+
41+
# Local file paths that exist in production but not in CI environment
42+
file:///home/runner/work/eclipse-edc.github.io/eclipse-edc.github.io/content/en/images/edc.schematic.svg
43+
# Exclude all local SVG files as they may be processed during build
44+
file://.*\.svg$
45+
# Exclude content directory files which may be generated during build
46+
file://.*?/content/.*
47+
48+
# GitHub specific patterns to reduce API rate limiting
49+
# These patterns are specifically for repositories that frequently cause 429 errors
50+
^https?://github\.com/git/git/blob/
51+
^https?://raw\.githubusercontent\.com/git/
52+
^https?://api\.github\.com/
53+
54+
# Hugo site specific patterns
55+
# Relative path references - resolved after Hugo build
56+
^/en/
57+
^/images/
58+
^/#.*
59+
60+
# Patterns to handle Hugo path inconsistencies
61+
# Internal file references that are correctly resolved after Hugo build
62+
^\.\.\/
63+
^\.\/
64+
^(/[^/]+)+/$
65+
^(content|static|assets)/.*
66+
67+
# Email addresses
68+
^mailto:.*
69+
70+
# Common special protocol links
71+
^slack://.*
72+
^vscode://.*
73+
^ssh://.*
74+
^git://.*
75+
76+
# Project specific URL patterns to exclude
77+
^https?://eclipse-edc.*\.local/
78+
^https?://connector\.[^/]+/
79+
^https?://api\.[^/]+/
80+
81+
# Add project-specific URL patterns to exclude here

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,18 @@ To remove the produced images run:
3939
docker compose rm
4040
```
4141
For more information see the [Docker Compose documentation][].
42+
43+
## Broken Link Checking
44+
45+
This repository includes GitHub Actions workflows to check for broken links:
46+
47+
- PR checks: `.github/workflows/check-broken-links.yaml` (runs on every PR)
48+
- Scheduled checks: `.github/workflows/check-broken-links-schedule.yaml` (runs weekly)
49+
50+
Both workflows use [lychee](https://github.com/lycheeverse/lychee) to detect broken links.
51+
52+
### Configuration
53+
54+
- URL patterns to ignore are specified in `.lycheeignore`
55+
- See `.lycheeignore.md` for documentation on the ignore patterns
56+
- See `.github/workflows/README.md` for workflow documentation

0 commit comments

Comments
 (0)