WordPress’s wp_kses() function (Keep Safe, Strip) is the most powerful HTML sanitization tool in the WordPress toolkit. Unlike strip_tags() which simply removes all HTML tags, wp_kses() performs a whitelist-based sanitization: it allows only the tags and attributes you explicitly permit, validates attribute values against allowed patterns, and filters out JavaScript event attributes (onclick, onload) and dangerous protocols (javascript:, vbscript:) from URLs. The two pre-configured wrappers — wp_kses_post() (allows all tags safe for post content) and wp_kses_data() (very restrictive, only inline formatting) — cover most common cases. But plugins that handle user-submitted rich content — comments with links, custom shortcode output, widget HTML, or imported content from external APIs — often need a custom allowed-tags map tuned to exactly what is needed, neither more nor less permissive than necessary.
Problem: Your plugin outputs user-submitted HTML from a database field. wp_kses_post() is too permissive (it allows <script>-adjacent tags like <iframe>) for a comment-like context, and strip_tags() removes valid formatting tags. You need a precise whitelist allowing only safe inline formatting and links.
Solution: Pass a custom allowed-tags array to wp_kses() specifying exactly which tags and attributes are permitted, with attribute value restrictions using regex patterns or value lists.
<?php
// ── Custom allowed HTML map for user-submitted comment-like content ────
function my_plugin_allowed_html(): array {
return [
// Inline formatting
'strong' => [],
'em' => [],
'b' => [],
'i' => [],
'u' => [],
'del' => [],
'code' => [],
'pre' => [],
// Links — restrict protocol and rel
'a' => [
'href' => true, // wp_kses validates against $allowedprotocols
'title' => true,
'target' => [ '_blank', '_self' ], // only these values allowed
'rel' => true,
],
// Block-level (if you need them)
'p' => [ 'class' => true ],
'blockquote' => [ 'cite' => true ],
'ul' => [],
'ol' => [ 'start' => true, 'type' => true ],
'li' => [],
'br' => [],
// Images — restrict src to https only via protocol filter
'img' => [
'src' => true,
'alt' => true,
'width' => true,
'height' => true,
'class' => true,
],
];
}
// Apply to user content before saving:
$clean_content = wp_kses( $_POST['user_note'] ?? '', my_plugin_allowed_html() );
update_post_meta( $post_id, '_user_note', $clean_content );
// Apply again on output as a second layer:
echo wp_kses( get_post_meta( $post_id, '_user_note', true ), my_plugin_allowed_html() );
// ── Allowed protocols filter ───────────────────────────────────────────
// wp_kses() validates href/src against a list of allowed URL protocols.
// Restrict href to http and https only (no ftp:, tel:, mailto:, etc.):
add_filter( 'kses_allowed_protocols', function ( $protocols ) {
return [ 'http', 'https' ]; // very strict
} );
// ── wp_kses_post() — full post-content whitelist ──────────────────────
// Allows: p, div, span, table, ul, ol, img, a, h1-h6, figure, figcaption,
// blockquote, pre, code, strong, em, br, hr, and more
// Does NOT allow: script, iframe, object, embed, form, input (unless added)
$safe_content = wp_kses_post( $raw_html_from_external_api );
// ── wp_kses_data() — minimal: only inline formatting, no block elements ──
$safe_text = wp_kses_data( $user_comment_with_some_html );
NOTE: wp_kses() is designed for sanitizing content going into the database — it should be called before update_post_meta() or wp_update_post(). Do not rely on wp_kses() as your only defense at output time — always also use esc_html() for plain text or wp_kses() again for rich content when echoing. Storing sanitized content and then outputting it without escaping is a common mistake — the stored value might have been sanitized at a time when the allowed-tags list was more permissive, or might have been set directly in the database bypassing sanitization. The principle is: sanitize on input, escape on output, and treat stored database values as untrusted input at output time.