Extracting all external links from a website is a task that comes up during audits, SEO work, or link-rot checks. The right approach depends on whether you have access to the database, the WordPress admin, or just the live site. Here are four methods from simplest to most involved.
Problem: A site owner or SEO auditor needs a complete list of all external links on a WordPress site — including links in post content, widget text, and template files — to audit for broken links or outdated references.
Solution: The four methods below cover different access levels: Screaming Frog for no-server-access crawling, a WP-CLI command for quick database scanning, a SQL query for direct extraction, and a PHP function for programmatic use in a theme or plugin.
Method 1 — SEO Spider (no server access needed). Screaming Frog SEO Spider crawls the site and exports all external links in CSV format. The free version is limited to 500 URLs; the paid licence is £149/year with no limit.
Method 2 — Broken Link Checker plugin (WordPress admin access). Install the Broken Link Checker plugin and filter for external links in its admin panel. Useful for smaller sites but can be slow on large ones.
Method 3 — grep on the SQL dump (database access, Linux). If you have a database export, three shell commands extract unique external links in under a minute:
# Step 1: Extract all URLs from the dump
grep -Eo 'https?://[^/"]+' database.sql > all_links.txt
# Step 2: Remove your own domain
sed '/yourdomain\.com/d' all_links.txt > external_links.txt
# Step 3: Deduplicate
sort -u external_links.txt > unique_external_links.txt
Method 4 — shell script using a sitemap (no database access). If you only have access to the live site, generate a sitemap_links.txt file (one URL per line, from your sitemap.xml) and then crawl each page with the text browser lynx to collect outbound links:
#!/bin/sh
DOMAIN="yourdomain.com"
while IFS= read -r url; do
lynx -dump "$url" | awk '/http/{print $2}' | grep -v "$DOMAIN" | grep -Eo 'https?://[^/]+' | sort -u >> external_links.txt
echo "Processed: $url"
done < sitemap_links.txt
sort -u external_links.txt -o external_links.txt
echo "Done. Results in external_links.txt"
Install lynx if it is not already present: sudo apt install lynx (Debian/Ubuntu) or brew install lynx (macOS).
NOTE: Method 4 makes one HTTP request per page in the sitemap. On a large site with thousands of URLs this can take a very long time and may trigger rate-limiting on the server. Add a small delay (sleep 1 inside the loop) if you are crawling your own live site to avoid overwhelming it.