WordPress 5.5+ generates XML sitemaps automatically via the Sitemaps API, but the default output omits image sitemaps, uses imprecise lastmod dates, and does not let you control priority or changefreq per URL type. Extending the sitemaps API with filters gives you precise control over every aspect of what gets indexed and how frequently crawlers re-visit.
Problem: A WordPress site's XML sitemap includes every post, page, and archive URL with default changefreq and priority values — these defaults are inaccurate and Google has stated it largely ignores them, while the sitemap also includes low-value URLs that waste crawl budget.
Solution: Customise the sitemap using the wp_sitemaps_posts_query_args and wp_sitemaps_taxonomies filters to exclude low-value post types and taxonomies. Use wp_sitemaps_index_entry to add custom sitemaps for specific content types. For large sites, split by content type and date range. Validate the sitemap in Google Search Console and monitor coverage errors and excluded URLs.
The code below adds image sitemaps to post entries, overrides lastmod with the actual last-modified date from post meta, filters out low-value URLs, and adds a custom sitemap index entry for a taxonomy.
ID );
if ( $thumb_id ) {
$images[] = [
'loc' => esc_url( wp_get_attachment_url( $thumb_id ) ),
'title' => esc_html( get_the_title( $thumb_id ) ?: $post->post_title ),
'caption' => esc_html( wp_get_attachment_caption( $thumb_id ) ),
];
}
// Images in content
preg_match_all( '/
]+src=["\']([^"\']+)["\']/', $post->post_content, $matches );
foreach ( array_unique( $matches[1] ?? [] ) as $src ) {
if ( str_contains( $src, home_url() ) ) {
$images[] = [ 'loc' => esc_url( $src ) ];
}
}
if ( $images ) {
$entry['images'] = $images;
}
// Override lastmod with last comment date if more recent
$last_comment = get_comments( [
'post_id' => $post->ID,
'number' => 1,
'orderby' => 'comment_date',
'order' => 'DESC',
'status' => 'approve',
] );
if ( $last_comment ) {
$comment_date = strtotime( $last_comment[0]->comment_date_gmt );
$post_date = strtotime( $post->post_modified_gmt );
$entry['lastmod'] = date( 'c', max( $comment_date, $post_date ) );
}
return $entry;
}, 10, 2 );
// 2. Exclude specific post types and low-value posts from sitemaps
add_filter( 'wp_sitemaps_posts_query_args', function ( array $args, string $post_type ): array {
if ( $post_type === 'post' ) {
// Exclude posts with fewer than 300 words (thin content)
$args['meta_query'] = [ [
'key' => '_word_count',
'value' => 300,
'compare' => '>=',
'type' => 'NUMERIC',
] ];
}
return $args;
}, 10, 2 );
// 3. Remove attachment URLs from sitemaps entirely
add_filter( 'wp_sitemaps_post_types', function ( array $post_types ): array {
unset( $post_types['attachment'] );
return $post_types;
} );
NOTE: Google's documentation states that priority and changefreq values in XML sitemaps are "hints" that are generally ignored in favour of crawl frequency signals from Google's own crawl history — the most impactful sitemap improvements are accurate lastmod dates and image sitemaps, both of which Google actively uses.