user
In JavaScript, how to sanitize a string with HTML tags ?
alphonsio

In JavaScript, to sanitize a string containing HTML tags and make it safe for display or further processing, you should ensure that any potentially harmful scripts or HTML elements are neutralized. This can prevent Cross-Site Scripting (XSS) attacks, which occur when malicious scripts are injected into webpages viewed by other users.

There are several approaches to sanitize a string with HTML tags in JavaScript:

1. Using the browser's built-in DOM parsing

You can leverage the browser's capability to parse HTML and then manipulate it to remove unwanted tags and attributes.

function sanitizeHTML(str) {
  const temp = document.createElement('div');
  temp.textContent = str;
  return temp.innerHTML;
}
const unsafeString = "<script>alert('Hello')</script><p>Safe content</p>";
const safeString = sanitizeHTML(unsafeString);
console.log(safeString);  // Output: "&lt;script&gt;alert('Hello')&lt;/script&gt;<p>Safe content</p>"

2. Using a library

There are libraries like DOMPurify that are specifically designed to sanitize HTML and prevent XSS attacks. DOMPurify is robust and widely used.

First, include DOMPurify:

<script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/2.3.6/purify.min.js"></script>

Then use it:

const dirty = "<script>alert('Hello')</script><p>Safe content</p>";
const clean = DOMPurify.sanitize(dirty);
console.log(clean); // Output: "<p>Safe content</p>"

3. Escaping HTML

If you merely want to escape HTML so that it is displayed as text, you can convert HTML special characters to their respective HTML entities:

function escapeHTML(html) {
  return html.replace(/&/g, '&amp;')
             .replace(/</g, '&lt;')
             .replace(/>/g, '&gt;')
             .replace(/"/g, '&quot;')
             .replace(/'/g, '&#039;');
}

const html = "<div>Example</div>";
const escapedHtml = escapeHTML(html);
console.log(escapedHtml); // Output: "&lt;div&gt;Example&lt;/div&gt;"

4. Using placeholders with templating engines

If you're working with JavaScript frameworks like React, Angular, or Vue, they have built-in ways to handle string sanitization to prevent XSS. For example, React automatically escapes content unless you deliberately use dangerouslySetInnerHTML.

Note on Security:
Handling HTML sanitization manually can be risky because it's challenging to cover all possible XSS vectors. Whenever possible, use well-maintained libraries like DOMPurify or rely on the security features of your JavaScript framework.

Best Practices

  • Always sanitize user inputs especially when they are injected into the HTML.
  • Use libraries for complex sanitizations to prevent XSS attacks.
  • Constantly update the libraries you are using to cover the latest security patches.