Fix WordPress duplicate content with canonical URLs and noindex tags

Duplicate content is one of the most common technical SEO problems on WordPress sites because WordPress generates multiple URLs for the same content by default. A single blog post can be accessible at its permalink, through its category archive, its tag archive, the author archive, date archives, and via pagination parameters — and if search engines see all of these as separate pages with identical or near-identical content, they split ranking signals across the duplicates instead of consolidating them on the canonical URL. The two main tools for resolving this are the rel="canonical" link tag and the robots meta tag with a noindex value. Canonical tags tell search engines which URL is the preferred version of a piece of content. noindex tags tell crawlers not to include a page in the search index at all — the right choice for pages that have no independent SEO value such as author archives on a single-author blog, tag archives with fewer than five posts, and internal search result pages. WordPress adds a canonical tag automatically since version 4.6, but the default implementation has known issues with pagination (it canonicalises all paginated archive pages to the first page) and with posts accessible under multiple taxonomies. Yoast SEO and Rank Math both override the default canonical logic and add granular noindex controls per post type and taxonomy. When building without a full SEO plugin, the rel_canonical and wp_head hooks let you override canonical output with PHP. URL parameters from tracking (?utm_source=), pagination (?paged=), and WooCommerce sorting (?orderby=) are handled by adding those parameters to Google Search Console’s URL parameters tool or by ensuring canonical tags point to the clean URL. Review the custom sitemap guide to ensure only canonical URLs are listed in your sitemap.

Problem: WordPress generates duplicate URLs for posts through multiple archives, pagination, and URL parameters, splitting SEO value and causing indexing problems.

Solution: Override the default canonical tag and add selective noindex output using WordPress hooks:

// Remove WordPress default canonical and replace with a controlled version
remove_action( 'wp_head', 'rel_canonical' );
add_action( 'wp_head', 'ha_output_canonical', 1 );

function ha_output_canonical() {
    global $wp_query;

    if ( is_singular() ) {
        // Single post/page — always point to the clean permalink
        $canonical = get_permalink( get_queried_object_id() );
    } elseif ( is_front_page() ) {
        $canonical = home_url( '/' );
    } elseif ( is_paged() ) {
        // Paginated archives — canonical = first page of the archive
        $canonical = get_pagenum_link( 1 );
    } else {
        // Default: strip query string from current URL
        $canonical = get_pagenum_link( get_query_var( 'paged' ) ?: 1 );
    }

    if ( $canonical ) {
        echo '<link rel="canonical" href="' . esc_url( $canonical ) . '" />' . "
";
    }
}

// Add noindex to low-value archive pages
add_action( 'wp_head', 'ha_noindex_archives', 1 );

function ha_noindex_archives() {
    // noindex: author archives, date archives, internal search results
    if ( is_author() || is_date() || is_search() ) {
        echo '<meta name="robots" content="noindex, follow" />' . "
";
        return;
    }

    // noindex tag archives with fewer than 5 posts
    if ( is_tag() ) {
        $term = get_queried_object();
        if ( $term instanceof WP_Term && $term->count < 5 ) {
            echo '<meta name="robots" content="noindex, follow" />' . "
";
        }
    }
}

NOTE: Do not set noindex on category archives that rank well — only use it for truly low-value pages. Check your current indexation status in Google Search Console under Coverage to see which paginated and archive pages are already indexed before applying noindex, as removing them from the index takes several crawl cycles. If you use Yoast SEO, these canonical and noindex settings are available in the SEO → Search Appearance panel without custom code — the PHP approach above is for sites that manage SEO without a dedicated plugin.