One of the factors on my SEO Check list is the Text / HTML Ratio . Let’s be clear, it’s not that by improving this indicator you will see your site suddenly climb the SERP as if there is no tomorrow. As in Formula 1, in a competitive niche every “filed” helps to improve the lap time. But first let’s see what this index represents.
TEXT / HTML Ratio is an index that relates the volume of text of a web page visible to the user via the browser, on the volume of text present in the HTML code. This report indicates the quality and efficiency of the HTML code. Being a ratio, the Text / HTML ratio is not a value to be considered absolutely. A relationship makes sense when compared to others (as is usual in SEO).
How to calculate the Text / HTML ratio
I use two methods to calculate this index: by hand or with a tool.
To calculate the index by hand I use Word or Notepad ++ because both tools show the total number of words or characters in the document. Then I open the web page to be analyzed, copy the visible text in the browser (CTRL + A and CTRL + C) and paste it in the text editor (CTRL + V) marking the number of words. Now I need the volume of the HTML code so I visualize the code of the web page (CTRL + U) and repeat the copy-paste on all the code, marking the value that I will use as a divider in the report.
To quickly find the index I use URLsMatch.eu , a tool for SEO Copywriting that compares the keywords present in 3 different URLs. Previously I had written this plugin in PHP for WordPress which calculates the Text / HTML ratio for both words and characters.
There are tools that report the number of words and others that report the number of characters, which is better? In my opinion, it does not matter what is used for the calculation, the important thing is to keep the same factor in comparison with other competitor sites, words or characters. As children we were taught that adding apples to pears is never a good thing.
What is a good value of Text / HTML Ratio?
There is no “good value”, it depends on the industry and many other circumstances. A 30% Text / HTML may be good for one niche and too low for another niche. A modern eCommerce to work often needs more code than a simple blog, so for the same content text, eCommerce will have a lower ratio. A lower ratio won’t necessarily be bad if the average TOP competitors have that ratio.
In theory, the higher the ratio, the better the code quality, but the website will presumably be simpler as well. A relationship could collapse simply by inserting critical inline CSS into the HTML code and that wouldn’t be bad at all. If I had to give a recommended threshold below which not to go down I would say 10% for an eCommerce and 20% for a blog . An excessively low ratio could also be a warning sign of long loading times .
Remember that Google Panda is an algorithm dedicated to content evaluation. A page with 600kb of HTML and 200 words on-page is not what I would call Panda Friendly .
For which pages to calculate the index
It is essential to relate your index to that of the TOP competitors, always, and not limit yourself to calculating the index of the homepage alone, compare all the types of pages that your site uses: the home, a category page and a product page , at least!
Note: Monitoring the Text / HTML of some critical site pages on a monthly basis can help identify hack pages. A sudden drop in the ratio could in fact be cause by malicious code insert into the pages.
How to improve the Text / HTML ratio?
Since Text / HTML is a report, to improve it you can act in two ways: increase the text visible on the screen or decrease the HTML code, for example by moving inline CSS or JS strings to external dependencies. But at what price? You have to carefully consider these types of changes because the damage could be greater than the gain.
Sometimes developers for convenience or laziness insert pieces of CSS or JS into the HTML. If the code string should only be use on a specific page then it is okay to add the code in the HTML. In fact, it would be inefficient to encumber the CSS or JS file for a function used only on a page.
In case the CSS or JS code inserted in the HTML is present in all the pages, then it would be more convenient to move it to an external dependency so as to be able to take advantage of the browser cache and lighten the HTML.
In summary, these are the activities that can improve the Text / HTML relationship :
- Validate your HTML with W3C Validator to avoid errors and useless code
- Remove unnecessary code
- Remove large whitespace without code
- Avoid excessive use of tables
- Remove comments from HTML
- Use external dependency CSS rules to format the website
- Use JavaScript only if necessary
- Try to keep the HTML size below 300 kb
- Remove text not visible to the end user
- Your page must have a good volume of quality text, visible and with good information