AI Markdown Mirror

AI Markdown Mirror

Nano Banana's visual interpretation of this workflow.

Create an AI-Ready Markdown Mirror of Your Website using GitHub Actions

A Dead Simple, No-Build-Tools Guide

AI systems understand Markdown far better than HTML. This guide shows you how to automatically generate clean Markdown versions of your HTML pages every time you push to GitHub, resulting in more accurate AI responses and fewer hallucinations.

No build system, no local scripts, no technical overhead. Set it and foregt it.

How it works


Why this matters

AI systems struggle with raw HTML because most webpages include:

AI doesn't need any of that. It needs:

The solution: automatically generate a clean Markdown mirror of each page.


Step 1. Add the GitHub Actions workflow

This script runs automatically on GitHub's servers every time you push HTML changes. It does three things:

You don't need to manually edit your HTML files. the GitHub Action handles everything.

html-to-md.yml on: push (main, *.html paths) html_to_md Check out repo 5s Set up Python 10s Install dependencies 15s Generate Markdown files 30s Commit and push 5s

GitHub Actions handles the conversion automatically.

Create this folder if it doesn't already exist:

.github/workflows/

Create a file named:

html-to-md.yml

Common excludes to consider: drafts, archive, vendor, dist, build. The workflow always excludes node_modules and hidden folders.

Paste the below code block: (IMPORTANT: after pasting, update the line that says BASE_URL = "https://yourdomain.com" to your domain)

name: Generate Markdown from HTML

on:
  push:
    branches:
      - main
    paths:
      - "*.html"
      - "**/*.html"

permissions:
  contents: write

jobs:
  html_to_md:
    runs-on: ubuntu-latest
    steps:
      - name: Check out repo
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: |
          pip install beautifulsoup4 lxml

      - name: Generate Markdown files
        run: |
          mkdir -p ai
          python - << 'PY'
          from bs4 import BeautifulSoup, NavigableString, Tag, Comment
          from pathlib import Path

          BASE_URL = "https://yourdomain.com"  # ← YOU MUST CHANGE THIS
          EXCLUDE_FOLDERS = {'node_modules'}  # ← Folders to skip

          def get_md_path(html_path):
              """Determine the markdown file path for an HTML file."""
              parts = html_path.parts
              if html_path.name == "index.html":
                  if len(parts) == 1:
                      return "ai/index.md"
                  else:
                      return f"ai/{parts[-2]}.md"
              else:
                  stem = html_path.stem
                  if len(parts) > 1:
                      prefix = "-".join(parts[:-1])
                      return f"ai/{prefix}-{stem}.md"
                  return f"ai/{stem}.md"

          def add_link_tag_if_missing(html_path, md_path):
              """Add the alternate link tag to HTML if missing."""
              content = html_path.read_text(encoding="utf-8")
              soup = BeautifulSoup(content, "lxml")

              existing = soup.find("link", {"rel": "alternate", "type": "text/markdown"})
              if existing:
                  return False

              head = soup.find("head")
              if not head:
                  return False

              new_link = soup.new_tag("link")
              new_link["rel"] = "alternate"
              new_link["type"] = "text/markdown"
              new_link["href"] = f"/{md_path}"

              comment = Comment(" Markdown version for AI bots ")
              title = head.find("title")
              if title:
                  title.insert_after("\n    ")
                  title.insert_after(new_link)
                  title.insert_after("\n    ")
                  title.insert_after(comment)
                  title.insert_after("\n\n    ")
              else:
                  head.append("\n    ")
                  head.append(comment)
                  head.append("\n    ")
                  head.append(new_link)
                  head.append("\n")

              html_path.write_text(str(soup), encoding="utf-8")
              return True

          # Find all HTML files and process them
          files = []
          for html_path in Path(".").rglob("*.html"):
              if any(part.startswith('.') or part in EXCLUDE_FOLDERS for part in html_path.parts):
                  continue
              try:
                  md_path = get_md_path(html_path)
                  add_link_tag_if_missing(html_path, md_path)
                  files.append((str(html_path), md_path))
              except Exception:
                  continue

          def normalize_space(text):
              return " ".join(text.split())

          def inline_to_md(node):
              pieces = []
              for child in getattr(node, "children", []):
                  if isinstance(child, NavigableString):
                      pieces.append(str(child))
                  elif isinstance(child, Tag):
                      name = child.name.lower()
                      if name == "a":
                          text = normalize_space(child.get_text(" ", strip=True))
                          href = child.get("href", "").strip()
                          if not text:
                              continue
                          if not href:
                              pieces.append(text)
                              continue
                          if href.startswith("http") or href.startswith("mailto:") or href.startswith("#"):
                              resolved = href
                          else:
                              resolved = f"{BASE_URL}{href}" if href.startswith("/") else f"{BASE_URL}/{href}"
                          pieces.append(f"[{text}]({resolved})")
                          continue
                      if name in ("strong", "b"):
                          pieces.append(f"**{normalize_space(inline_to_md(child))}**")
                          continue
                      if name in ("em", "i"):
                          pieces.append(f"*{normalize_space(inline_to_md(child))}*")
                          continue
                      pieces.append(inline_to_md(child))
              return "".join(pieces)

          def html_to_markdown(html_path, md_path):
              path = Path(html_path)
              if not path.exists():
                  return
              soup = BeautifulSoup(path.read_text(encoding="utf-8"), "lxml")
              title_tag = soup.find("title")
              title = title_tag.get_text(strip=True) if title_tag else ""
              root = soup.find("main") or soup.body or soup
              allowed = ["h1","h2","h3","h4","h5","h6","p","li"]
              elements = [tag for tag in root.find_all(allowed) if not tag.find_parent("nav")]
              lines = []
              if title:
                  lines.append(f"# {title}")
                  lines.append("")
              for el in elements:
                  name = el.name.lower()
                  text = normalize_space(inline_to_md(el))
                  if not text:
                      continue
                  if name.startswith("h"):
                      lines.append(f"{'#' * int(name[1])} {text}")
                      lines.append("")
                  elif name == "p":
                      lines.append(text)
                      lines.append("")
                  elif name == "li":
                      parent = el.find_parent(["ol", "ul"])
                      if parent and parent.name == "ol":
                          siblings = [s for s in parent.find_all("li", recursive=False)]
                          try:
                              idx = siblings.index(el) + 1
                          except ValueError:
                              idx = 1
                          lines.append(f"{idx}. {text}")
                      else:
                          lines.append(f"- {text}")
                      lines.append("")
              Path(md_path).write_text("\n".join(lines).rstrip() + "\n", encoding="utf-8")

          for src, dst in files:
              html_to_markdown(src, dst)
          PY

      - name: Commit and push changes
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add -A
          if git diff --staged --quiet; then
            echo "No changes to commit."
          else
            git commit -m "Auto-generate Markdown from HTML"
            git push
          fi

Step 2. Push to GitHub

Done. Your Markdown files now stay in sync with your HTML automatically.

Any time you push updates to your HTML:

Your repo will now include:

ai/
  index.md
  about.md
  contact.md
  etc...

If you want the Markdown files and updated HTML generated by the GitHub Action locally, just git pull after pushing your updated HTML files.


You now have

This approach works for any static site hosted in any environment, provided the source lives on GitHub.

About the Author

Burton Rast is a designer, a photographer, and a public speaker who loves to make things.