3 Different Ways To Convert HTML Into Plain Text

By Sanchitha SR

Feburuary 5th, 2021

image

I was working with a rich text editor the other day and needed to strip the HTML tags from the string and store it in the database. In doing so, I learned a few different methods to achieve this. I wanted to share this information with you as it could come in handy for anyone who is trying to do the same.

What we are trying to do is remove the tags from the string and make the string printable as plain text. Let’s dive in and see how it works.

1. Using .replace(/<[^>]*>/g, ‘’)

This method is a simple and efficient way to remove the tags from the text. This method uses the string method .replace(old value, new value) which replaces the HTML tag values with the empty string. The /g is used for it to happen globally (every value found in the string is replaced with the specified if the /g is used).

The drawback of this method is that we can’t remove some HTML entities. It still works well though.

var myHTML = "<div><h1>Jimbo.</h1>\n<p>That's what she said</p></div>";

var strippedHtml = myHTML.replace(/<[^>]+>/g, "");

// Jimbo.
// That's what she said
console.log(stripedHtml);

2. Create a temporary DOM element and retrieve the text

This is the most efficient way of doing the task. Create a dummy element and assign it to a variable. We can extract later using the element objects. Assign the HTML text to the innerHTML of the dummy element and we will get the plain text from the text element objects.

function convertToPlain(html) {
  // Create a new div element
  var tempDivElement = document.createElement("div");

  // Set the HTML content with the given value
  tempDivElement.innerHTML = html;

  // Retrieve the text property of the element
  return tempDivElement.textContent || tempDivElement.innerText || "";
}

var htmlString =
  "<div><h1>Bears Beets Battlestar Galactica </h1>\n<p>Quote by Dwight Schrute</p></div>";

console.log(convertToPlain(htmlString));
// Expected Result:
// Bears Beets Battlestar Galactica
// Quote by Dwight Schrute

3. html-to-text npm package

This is the package I discovered recently. This is the converter that parses HTML and returns beautiful text. It comes with many options to convert it to plain text like wordwrap, tags, whitespaceCharacters, formatters, etc.

Package.json is needed to use the package. We need to install the package first and then use it in our file.

You can find the official docs for the package here. I used it in my vue project and it worked very well.

Installation:

npm install html-to-text

Usage:

const { htmlToText } = require("html-to-text");

const text = htmlToText(
  "<div>Nope Its not Ashton Kutcher. It is Kevin Malone. <p>Equally Smart and equally handsome</p></div>",
  {
    wordwrap: 130,
  }
);
console.log(text); // expected result:
// Nope Its not Ashton Kutcher. It is Kevin Malone.

// Equally Smart and equally handsome

Conclusion

And that sums it up! You can find an example of the project here.

Thank you!



Continue Learning