Hyphenation is an issue if you want to maintain the aesthetics of your website, specially in responsive designs. But how do you do it, and how does it affect your SEO?
This whole issue appeared when I was developing a responsive multilingual website and I realized how ugly the text looked on a mobile device. The customer wanted the text to be justified, and that looked great -on a computer screen. Unfortunately, when the text was made responsive we had a major issue – either we resized the text until it became unreadable or we used a bigger font and got plenty of empty space because longer words moved to the next line. HTML, unfortunately, does not know about hyphenation.
The answer, of course, was to use the HTML soft hyphenation character entity reference ­ (­ or ­). This HTML entity sets a soft hyphenation inside a word. within the HTML text. For those of you who are unsure a soft hyphenation text is, it means that the web page will hyphenate the word containing the soft hyphenation at the location of the soft hyphenation only if it cannot keep the whole word on the same line. Thus, a word like “unfortunately”, written like “unfortu­nately” would be divided into “unfortu-” in the first line and “nately” in the second line if it did not fit completely in the first line. It does mess up the text to include the “­” code when writing it, so you should include it only after you’re satisfied with the text. When viewing the HTML page, of course, you would not see it at all, only the hyphens if a page needs to be hyphenated. This is excellent from the responsive point of view, as you maintain an aesthetically appealing website, but at the price of some additional work.
I had thought about also using the <wbr /> alternative, but that gave me some problems, as <wbr /> (to my knowledge initially invented y Netscape) is not a standard and does not work properly on all browsers, hence that I thought that ­ (obviously a standard code) would be more appropriate, though some older browsers will not be able to display that properly.
Interestingly, while researching other alternatives I stumbled on an interesting article on Stackoverflow on the subject of soft hyphens. The article did not provide any additional know-how on what I already knew, but it did include an interesting insight from Paramaeleon that I cannot resist quoting verbatim:
Spargelder (germ. saved money, pl.) may, by syllabification rules, be wrapped in two places (
Spar-gel-der). However, wrapping it in the second position, turns the first part to show up as
Spargel- (germ. asparagus), activating a completely misleading concept in the head of the reader and therefore shoud be avoided.
And what about the string
Wachstube? It could either mean ‘guardroom’ (
Wach-stu-be) or ‘tube of wax’ (
Wachs-tu-be). You may probably find other examples in other languages as well. You should aim to provide an environment in which the desk man can be supported in creating a well-syllabified text, proof-reading every critical word.
Nicholas also contributed t the discussion highlighting that:
The common example in English of Spar-gelder vs. Spargel-der is “re-cord” vs “rec-ord” (homographs but not homophones). The former is a verb, the latter is a noun. Algorithms are usually not smart enough for this, even in English (one of the best supported languages in IT).
Finally, CJ Dennis also mentioned:
I would suggest experts-exchange vs expert-sexchange.
The final question is: Does hyphenation affect SEO? Would the “­” code affect our keywords? Some very old articles (2007) like this one on Comgroups says it does. I very strongly disagree, based on my research of how Search Engines Identify Language. One of the key ways that they do is by identifying the words that a web page has, and these words (in foreign languages) have special codes embedded, either to display accented characters or special characters that are peculiar to one language. For example, the HTML code for “España” (Spain) is written like “EspaÑa” or “EspaÑa”. Both are legitimate HTML codes. I simply can’t believe that search engines would consider the “raw” HTML code (and therefore consider the two codings as different words).
It is more logical to assume that search engines parse a web page, strip HTML codes and convert special character codes into the corresponding character, then storing it in their databases as UTF-8, as this would also occupy less space. And if they parse the page to do this, it’s trivial to simply delete the “­” inside a word. Search engines are not stupid, so the impact of using “­” is… nil.