WordPress wp_kses() Deep Dive: Custom Allowed HTML Maps, Protocols, and Secure Output

WordPress’s wp_kses() function (Keep Safe, Strip) is the most powerful HTML sanitization tool in the WordPress toolkit. Unlike strip_tags() which simply removes all HTML tags, wp_kses() performs a whitelist-based sanitization: it allows only the tags and attributes you explicitly permit, validates attribute values against allowed patterns, and filters out JavaScript event attributes (onclick, onload) and dangerous protocols (javascript:, vbscript:) from URLs. The two pre-configured wrappers — wp_kses_post() (allows all tags safe for post content) and wp_kses_data() (very restrictive, only inline formatting) — cover most common cases. But plugins that handle user-submitted rich content — comments with links, custom shortcode output, widget HTML, or imported content from external APIs — often need a custom allowed-tags map tuned to exactly what is needed, neither more nor less permissive than necessary.

Problem: Your plugin outputs user-submitted HTML from a database field. wp_kses_post() is too permissive (it allows <script>-adjacent tags like <iframe>) for a comment-like context, and strip_tags() removes valid formatting tags. You need a precise whitelist allowing only safe inline formatting and links.

Solution: Pass a custom allowed-tags array to wp_kses() specifying exactly which tags and attributes are permitted, with attribute value restrictions using regex patterns or value lists.

<?php
// ── Custom allowed HTML map for user-submitted comment-like content ────
function my_plugin_allowed_html(): array {
    return [
        // Inline formatting
        'strong' => [],
        'em'     => [],
        'b'      => [],
        'i'      => [],
        'u'      => [],
        'del'    => [],
        'code'   => [],
        'pre'    => [],

        // Links — restrict protocol and rel
        'a' => [
            'href'   => true,   // wp_kses validates against $allowedprotocols
            'title'  => true,
            'target' => [ '_blank', '_self' ], // only these values allowed
            'rel'    => true,
        ],

        // Block-level (if you need them)
        'p'          => [ 'class' => true ],
        'blockquote' => [ 'cite' => true ],
        'ul'         => [],
        'ol'         => [ 'start' => true, 'type' => true ],
        'li'         => [],
        'br'         => [],

        // Images — restrict src to https only via protocol filter
        'img' => [
            'src'    => true,
            'alt'    => true,
            'width'  => true,
            'height' => true,
            'class'  => true,
        ],
    ];
}

// Apply to user content before saving:
$clean_content = wp_kses( $_POST['user_note'] ?? '', my_plugin_allowed_html() );
update_post_meta( $post_id, '_user_note', $clean_content );

// Apply again on output as a second layer:
echo wp_kses( get_post_meta( $post_id, '_user_note', true ), my_plugin_allowed_html() );

// ── Allowed protocols filter ───────────────────────────────────────────
// wp_kses() validates href/src against a list of allowed URL protocols.
// Restrict href to http and https only (no ftp:, tel:, mailto:, etc.):
add_filter( 'kses_allowed_protocols', function ( $protocols ) {
    return [ 'http', 'https' ]; // very strict
} );

// ── wp_kses_post() — full post-content whitelist ──────────────────────
// Allows: p, div, span, table, ul, ol, img, a, h1-h6, figure, figcaption,
//         blockquote, pre, code, strong, em, br, hr, and more
// Does NOT allow: script, iframe, object, embed, form, input (unless added)
$safe_content = wp_kses_post( $raw_html_from_external_api );

// ── wp_kses_data() — minimal: only inline formatting, no block elements ──
$safe_text = wp_kses_data( $user_comment_with_some_html );

NOTE: wp_kses() is designed for sanitizing content going into the database — it should be called before update_post_meta() or wp_update_post(). Do not rely on wp_kses() as your only defense at output time — always also use esc_html() for plain text or wp_kses() again for rich content when echoing. Storing sanitized content and then outputting it without escaping is a common mistake — the stored value might have been sanitized at a time when the allowed-tags list was more permissive, or might have been set directly in the database bypassing sanitization. The principle is: sanitize on input, escape on output, and treat stored database values as untrusted input at output time.