SEO Translator

How to optimize your web site translation for the search engines!

Browsing Posts published by Ramon

According to an article published in New Scientist, artificial intelligence (AI) will have evolved so much in the next years that AI will be able to beat us at everything by 2060. Among other things, it states that machines are predicted to be better than us at translating languages by 2024, writing high-school essays by 2026, driving a truck by 2027, working in retail by 2031, writing a bestselling book by 2049 and surgery by 2053. In fact, all human jobs will be automated within the next 120 years, say respondents.

So, in principle, all translators will be jobless by 2024. I, personally, am a little bit skeptic.

I remember that in 1983, while I was working as a full-time translator at an aerospace company, they brought in all translators to see a demo of Systran, and told us that in a year or so probably we would have to look for a new job. As I was studying Computer Sciences at the university, I laughed out loud (I believe that I was the only one that found it funny). Now, that was 34 years ago, and machine translation is still not yet ready for prime time. True, an impressive progress has been made since the Systran era, but still…

I happen to be an IT guy myself. I am also a translator and write books, so I know what I am talking about. I keep an eye on the progress of Computer Sciences in general and on machine translation in particular, and it takes a lot of credulity to believe that in 7 years the translators will be replaced by AI. Yes, there will be progress, but not to the extent that this article indicates. Knowing the progress of automated driving, I can believe that a computer will be able to drive a truck in 2027, though personally I think that’s a bit optimistic.  But I am *very* skeptic about more creative work.

Yes, machine translation will advance to the extent that translators will be eventually replaced by machine translation. My take is that it is still 20-25 years away, so I would not worry too much. Writing high-school essays by 2026 (in 9 years?) looks ludicrous. Machines have not yet passed the Turing test. This test, basically, consists in that in a question-and-answer game through a terminal, the machine has to convince you at least 1/3 of the times that it is human. It takes a little bit more than that to write an essay (even a high-school one) that can pass as if it has been written by a human. I’d put it 25-30 years in the future. Finally, that a machine in 32 years will be to write a bestseller also stretches a little bit my credulity. I think it will be eventually feasible, but due to the need for a proper plot, apart from proper grammar and conversation, it might take slightly more. My take is that we will not see that before 50-60 years.

Creativity is as of today exclusive to mankind. There are already computer programs that can generate reasonably good music, but music is easier to process by means of algorithms than language, and the machines do not “know” what they are actually creating. With massive computing and huge translation memories, it will be also possible to provide quite accurate translations, though there are subtleties in the language that are likely to be lost in machine translation. Even writing an essay is possible, using massive information sources (like Wikipedia) and paraphrasing the original text. That does not mean that the machine will “understand” what it has written, but, like the current chatbots, might actually get away by pretending it does. I do however not see that until machines start passing the Turing test. Writing a bestseller is more complicated, and it is an exciting programming challenge. It will be eventually done, but I am skeptic about whether the machine will actually “know” what it is doing.

So will machines replace all human jobs in 120 years? Again, I am skeptic. Remember that in the ‘50s they were expecting that we would all have flying cars and moon colonies all over the place? If history is a guide –and it often is-, the AI guys have in the past vastly overrated the progress in artificial intelligence, and the technology has often gone in unexpected directions. Who in the ‘50s or even the early ‘70s would have expected Internet? Nanotechnology?

But the next 120 years look interesting anyhow.

One question that recently popped when talking with a customer was whether he should keep or not a same domain for his translated site. This leads to an interesting discussion: Do we keep the translation on the same site or not?

Before discussing the different solutions about where to store or host the translated site, first let’s make clear what you must NOT do. Never -and I mean never- keep the translated pages in the same directory as the non-translated pages. Not only is housekeeping and maintenance a nightmare, especially if you have many pages. The main reason why this should not be done is because of the way that search engines recognize the website language, as I pointed out in a previous series of posts. Mix up pages of different languages in a same folder and you may end up as being a site in the wrong language, with no translation at all and possibly even be penalized by search engines because of being so sloppy.

Another thing to avoid under all circumstances is to use machine translation for your website translations. Yes, I know that is dirt-cheap, but Matt Cuts has explicitly warned at Google Webmasters against machine translation, as “our guidelines against auto-generated text can also apply against auto-translated text”.  If you cannot afford a proper translation, stick to a widget that, as Matt states, “says translate into this language or something like that”, which is an acceptable practice to Google. I repeat, do NOT use machine translation, not even Google Translate. You are warned!

There are basically four ways to store the translated site:

  1. In a website database, accessing the pages by URL parameters (e.g.,, ?country=france)
  2. In a subdirectory of your existing domain (e.g, for French, for Spanish, etc)
  3. In a subdomain of your existing domain (e.g., for French, for Spanish, etc)
  4. On a separate domain (e.g., for French, for Spanish, etc).

We will have a look at each of the different options separately.

Website database with URL parameters

This might look an easy solution if your website is already stored in a database (e.g., because you use a CMS system). I think that the only potential benefit is that you do not have to set up anything else, but from my point of view it is a bad idea. First, because you cannot segment the pages, and you have the same situation as if you put a lot of HTML pages in a same directory – the search engines may not be able to solve the riddle of multiple languages on a same location. Similarly, the users might have difficulties in getting the visual clues of where they are, especially if there are many URL parameters. Not to speak about the fact that with URL parameters it is not possible to perform geotargeting in Webmaster Tools, so you are removing a very powerful weapon from your SEO arsenal.

Use a language subdirectory within your generic top-level domain

If your main site is a generic top-level domain (such as .com, .net, .org, or one of the newer ones such as .global or .club), there is some merit in creating a subdirectory for each language. Apart from the fact that all web pages are stored together and easier to set up and maintain, it is possible to geotarget this from Webmaster Tools, not to forget the fact that it is cheap as you use the same host for all translations. Many CMS systems also allow you to set up the pages in a “language subdirectory” structure such as Though in this case it is a virtual subdirectory, the effect is the same.

Disadvantages? Not many. Again, users might not recognize the geotargeting, but this might not be always very important. The main drawback is that as everything is hosted on a same server, this may become an issue if you get a lot of traffic. Moving each language to its own site may then become an issue, specially if you have cross-linked the pages to its translated equivalents. However, if you do not expect millions of hits, this might not be a major problem.

One additional advantage is that, since all pages are on the same site (and assuming that you have cross-linked them to each other), the increase of page rank (PR) on each page increases the overall site page rank. So the pages in French or Spanish would also increase the page rank of your English site. Nice!

Create a subdomain for each language

A different solution while maintaining the same server is to use subdomains for each language, such as for French. Again, this is good for proper site housekeeping and maintenance. No problem either with geotargeting with Webmaster Tools. It is also far easier to move a site if your traffic grows so big that you need two (or more) different servers, possibly even at different locations. After all, even if pages are interlinked, the links will always refer to the site / subsite URL, so there is no problem of a suddenly missing subdirectory and lots of  “404 Page not found” errors. And if eventually you decide to move to a country-specific top-level domain, a permanent redirect of that subdomain to the new ccTLD will solve the issue, while maintaining the page rank (more on that in a later post).

Again, there are not many disadvantages. Again, users might not recognize the geotargeting (though it’s far easier to see than the previous one) or might confuse it with the country instead of the language (e.g., does “fr” stand for “French” or “France”? Personally I would not worry much about it, most users don’t look at the URL anyhow. It means some additional work, but not that much. What might be an issue (if you are on a shared hosted site) that you might have a limit on the number of subdomains you may be allowed to create on your site. A couple of additional dollars per month usually solves that problem.

One interesting thing is that very often subdomains (on shared hosting) are created as subdomains of the main site, with their own URL  (so and would be actually the same thing). This might become an issue if a search engine decides that there is duplicate content.To prevent this, either choose a preferred version and redirect to it or use the “rel=canonical” link element to the subdomain.

Use a country-code top-level domain (ccTLD) for each language

Though Google prefers this particular solution (as highlighted in the video mentioned above), it is also the most complicated to manage, as well as the most expensive. Housekeeping is easy (one language at each site, such as for French), but cross-domain maintenance is complicated. If you need to implement a same change in five languages, that means accessing five different sites, possibly also in up to five different servers. It also means more infrastructure, and the availability of five different sites is likely to be less than that of a single one, not to speak about the detail that you might require more staff to handle so many more sites. If that were not enough, there is a risk of loss of brand, as you have a big proliferation of sites with different names.

Another issue is where do you stop? You see, a same language can be spoken in many countries. For example, Wikipedia lists 21 countries where Spanish is an official language. But these are by means not the only ones! According to existing statistics, USA has more Spanish speakers than Spain itself! Will you send all Spanish-language speakers to Or perhaps to Or each to their specific ccTLD? Apart that you have 21 ccTLDs for a same language (remember what I said about duplicate content), what do you do with the USA, which has an official language (English) and an unofficial one (Spanish) that has over 50 millions native speakers that you can simply not ignore? Or with countries that have several official languages, such as Canada (English and French) or Belgium (French and Flemish)?

And remember, you will not always be able to get the ccTLD with your brand name in a specific country. I am not talking about site name speculators, I am talking about local companies that have the same name (or a different one, but which matches the acronym that you use) and got the local ccTLD first. Sorry. Dead end. If you did not buy up the ccTLDs of the 196 countries that exist today in the world when you created your company, it is almost certain that you will never be able to grab them all up.

Which approach is best?

The simple answer is exactly what you would expect: It depends. In any case, I feel that the usage website database, accessing the pages by URL parameters, is the worst possible solution, that I do not recommend to anybody.

For small websites, not expecting to ever hit millions of visitors, I would recommend the usage of subdirectories if there are few pages, and subdomains if they expect to use many pages in the future, or have hopes for great growth.

For medium websites, my recommendation is to use subdomains. The (small) additional work that this implies is worth it, as it is future-proof and allows for expansion/segregation with a low risk.

For big companies, I would rather go for subdomains, unless local laws require otherwise. For example, you might be legally required to have a local presence, in which case a ccTLD becomes mandatory. But, unless these sites are set as independent businesses (such as subsidiaries), I would set them up as language-specific subdomains. This would significantly simplify the management and reduce the associated costs. For example, you could redirect all 21 Spanish-speaking countries and their associated ccTLDs to If later you created a subsidiary in Mexico and another in Spain, it would be a piece of cake to spin off two sites from this subdomain.

In another post we will discuss in detail the problem of multilingual countries, but that is a perfect example of language subdomain implementation.

Now, what do you think?

Hyphenation is an issue if you want to maintain the aesthetics of your website, specially in responsive designs. But how do you do it, and how does it affect your SEO?

This whole issue appeared when I was developing a responsive multilingual website and I realized how ugly the text looked on a mobile device. The customer wanted the text to be justified, and that looked great -on a computer screen. Unfortunately, when the text was made responsive we had a major issue – either we resized the text until it became unreadable or we used a bigger font and got plenty of empty space because longer words moved to the next line. HTML, unfortunately, does not know about hyphenation.

The answer, of course, was to use the HTML soft hyphenation character entity reference ­ (­ or ­). This HTML entity sets a soft hyphenation inside a word. within the HTML text. For those of you who are unsure a soft hyphenation text is, it means that the web page will hyphenate the word containing the soft hyphenation at the location of the soft hyphenation only if it cannot keep the whole word on the same line. Thus, a word like “unfortunately”, written like “unfortu­nately” would be divided into “unfortu-” in the first line and “nately” in the second line if it did not fit completely in the first line. It does mess up the text to include the “­” code when writing it, so you should include it only after you’re satisfied with the text. When viewing the HTML page, of course, you would not see it at all, only the hyphens if a page needs to be hyphenated. This is excellent from the responsive point of view, as you maintain an aesthetically appealing website, but at the price of some additional work.

I had thought about also using the <wbr /> alternative, but that gave me some problems, as <wbr /> (to my knowledge initially invented y Netscape) is not a standard and does not work properly on all browsers, hence that I thought that &shy; (obviously a standard code) would be more appropriate, though some older browsers will not be able to display that properly.

Interestingly, while researching other alternatives I stumbled on an interesting article on Stackoverflow on the subject of soft hyphens. The article did not provide any additional know-how on what I already knew, but it did include an interesting insight from Paramaeleon that I cannot resist quoting verbatim:

Syllabification isn’t that easy and I cannot recommend leaving it to some Javascript. It’s a language specific topic and may need to be carefully revised by the desk man if you don’t want it to turn your text irritating. Some languages, such as German, form compound words and are likely to lead to decomposition problems. E.g. Spargelder (germ. saved money, pl.) may, by syllabification rules, be wrapped in two places (Spar-gel-der). However, wrapping it in the second position, turns the first part to show up as Spargel- (germ. asparagus), activating a completely misleading concept in the head of the reader and therefore shoud be avoided.

And what about the string Wachstube? It could either mean ‘guardroom’ (Wach-stu-be) or ‘tube of wax’ (Wachs-tu-be). You may probably find other examples in other languages as well. You should aim to provide an environment in which the desk man can be supported in creating a well-syllabified text, proof-reading every critical word.

Nicholas also contributed t the discussion highlighting that:

The common example in English of Spar-gelder vs. Spargel-der is “re-cord” vs “rec-ord” (homographs but not homophones). The former is a verb, the latter is a noun. Algorithms are usually not smart enough for this, even in English (one of the best supported languages in IT).

Finally, CJ Dennis also mentioned:

I would suggest experts-exchange vs expert-sexchange. 🙂

What does this mean? What these three people basically highlighted is that hyphenation is not something that should be done lightly – you need somebody who knows the language, who knows the meaning of the different words and can hyphenate them not only based on grammar rules, but also based on their meaning and context. So using &shy; is something you DO want to do so as to increase the attractiveness of your site, instead of trusting some javascript to do it correctly (which it won’t do).

The final question is: Does hyphenation affect SEO? Would the “&shy;” code affect our keywords? Some very old articles (2007) like this one on Comgroups says it does. I very strongly disagree, based on my research of how Search Engines Identify Language. One of the key ways that they do is by identifying the words that a web page has, and these words (in foreign languages) have special codes embedded, either to display accented characters or special characters that are peculiar to one language. For example, the HTML code for “España” (Spain) is written like “Espa&#209;a” or “Espa&Ntilde;a”. Both are legitimate HTML codes. I simply can’t believe that search engines would consider the “raw” HTML code (and therefore consider the two codings as different words).

It is more logical to assume that search engines parse a web page, strip HTML codes and convert special character codes into the corresponding character, then storing it in their databases as UTF-8, as this would also occupy less space. And if they parse the page to do this, it’s trivial to simply delete the “&shy;” inside a word. Search engines are not stupid, so the impact of using “&shy;” is… nil.

Page title imageWhat happens if you have two pages with the same title? Usually, that is not good for SEO. But if you have a multilingual site and your pages in different languages share a same title, is this also bad for SEO?

Let’s be honest: This situation -unless you really don’t have a clue about SEO- will usually not occur. You want your titles to be keyword-rich, so they usually do not restrict themselves to a single word. And the probability that a sentence or a part of sentence that you use in your title is the same in two different languages is close to zero. Unfortunately, many CMS (customer management systems) don’t give you control of your website to a sufficient degree as to prevent this – the page titles are often generated automatically.

Now, this becomes a problem when you use certain words. For example, the word “romance” is spelled exactly in the same way in English, Spanish, French, Dutch, Basque, Czech, Hausa and probably a lot of additional languages. If your web site sells fiction books, it is very likely that you will have a page with this category, and the title name will be repeated over and over. Now, this is likely to become unavoidable. But will it hurt your ranking?

It’s old news that the title of a page is specially scrutinized by the search engines, and that the keywords in the title carry more weight than those in the text. You would assume that the repetition of a same text across different pages in likely to be considered some kind of keyword spamming by Google and other search engines.

However, we should remember that the corresponding pages are in different languages. You should be aware how search engines recognize language (see also my book “Search Engine Babble” on that subject) to understand this, but the most important thing to remember is that the title is NOT the only criterion for language identification, and not even the most important one. The search engine -assuming you have sufficient text in your page- will eventually recognize the language of the page. The title will then be considered to be in the same language as the page and will therefore NOT be considered repeated content.

A same page title (in the vast majority of the cases, one single word) should therefore not be an issue. Of course, you should make sure that the page can be recognized as being in a specific language. Ensure that each page sends the correct language signals to the search engines. If you need to rework your site to make this happen, then start with the highest value pages (those ranking highest, that provide you with a higher income or receive the higher number of visitors).

All the above is is also true with a page header that is automatically inserted by some CMS, though usually you do have more control about page headers. Again, if sufficient text is available for language recognition, it should not be an issue.

In any case, and even though it won’t penalize you to have the same pages with a same title or page headers in different languages, it is advisable to change such titles and page headers if possible. Not only to prevent duplication – but also to add additional keywords to your title. Instead of “Romance” you could write “Romance books” (English), “Livres de romance” (French), etc. The moment you use more than one word, the probability that you will repeat titles is extremely small. And that will not only prevent a potential doubt by the search engines, but it will also allow them to classify you better and enhance your ranking in other keywords.


Choosing the Right Keywords for Multilingual Websites

Every company that has a website simply cannot ignore strategies for search engine optimizations (SEO). It is in almost everyone’s lips, because it is a very important aspect in making a website attract visitors. And playing a huge part in SEO is using the right keywords. This is a complex process that requires expert guidance. First, websites have to contend with the regular change in the algorithms of many search engines such as Google. Now throw in a foreign language or two into the process and the idea of making the keywords work becomes almost insurmountable.

When there is an absolute need to create a multilingual website, then there are many things to be considered to ensure that multilingual keywords will work and will guarantee good results.

Get to know multilingual keywords

It does not matter what the main language in your website is. What matters are the keywords used for the site to achieve high search engine ranking, as these keywords are the crucial link between your website content and what people look for on the Internet. Relevance of your website to the search also depends on the right keywords. In short, if you do not have the right keywords in your website, it will not be found by search engines. Keywords should be worked into your website’s copy and using the right keywords is instrumental in increasing your online presence immensely.


Finding the multilingual keywords is not difficult. It you have an English-language website, then you already have a bunch of good keywords. What you need is to have a good translation company do the translation of these keywords. A professional translator will have the language skills and the cultural insight that will ensure that the keywords to be translated will match searches done locally. If your international website is to be used as a money-making tool, you should also consider the dialects, instead of just the official foreign language. This means treating your target audiences as separate groups. As an example, lunch in French, Swiss/Belgian French and Canadian French is déjeuner.  However, in Belgium and Switzerland, it is called dîner, which in France is the term for the meal you eat in the evening.

What is important to remember is that people use day to day phrases instead of technical or corporate language when they do their searches. This means you have to carefully pick those keywords that have little competition and good search volumes.

You do not need an expensive app to make your website multilingual-friendly. Using Unicode for the text removes the need for encoding the website pages into different languages, as it can be used for more than 90 scripts and have more than 100,000 characters already built in. This means that symbols and special characters used in different languages are already available. Unicode works across any language, program or platform.

Optimizing for international market

Your website should be fully translated and localized before you launch it. However, do remember that it is not a good idea to translate your English website word for word as well as your keywords and phrases. Research plays a major part in the success of your multilingual website. You should know what terms the consumers actually use when searching for items in their own language, including, synonyms, acronyms and abbreviations.

It is also a good idea to make your website bilingual, and leave some English elements in it. You have to keep in mind that more people around the globe are now bilingual and most people know at least some simple English terms. Still you have to do your research as this should also be based on your target clients’ demographics. It is also a good idea to involve your local country teams in the development stage of the website. Their local knowledge and input will be very beneficial in creating a multilingual website that have keywords that are already localized.


Further reading/references:

This post was written by Mariana Sarceda from Day Translations.