People strive hard to get their web pages indexed, later to create links so as to improve the ranking of their sites. But imagine for one moment the following scenario: Search engine robots crawl your site and decide that the website is written in a totally different language. Do you really think that is good? Will it help you or destroy your potential business?
Now, you might think that this scenario cannot happen, that the “smart” search engines should have no problem in identifying that your page is in English, German, French, or Spanish. Really? And what if mighty Google decides that your English page is in reality in…. Chinese? Or that your Turkish page is actually in English? Assume that Google insists that your pages are not Romanian but English? Or your English page is Croatian? Can’t happen? Well, you’re in for a heavy disappointment: Google Webmaster Central includes complaints for all four cases (English treated as Chinese, Turkish page being identified by Google as English, Romanian page is identified as English, English page identified as Croatian). There are other examples, such as an UK site appearing to be written in Japanese, and similar situations, so the problem is relatively frequent.
I wrote last year a series of posts on how search engines identify language, including Basic Language Recognition Research, Search Engine Difficulties in Recognizing Language, How Search Engines Recognize Languages, and 13 Tricks To Ensure Language Recognition By Search Engines. But I forgot to highlight why this was important. And when I was requested to make an e-book on this subject, I realized that this particular post was missing.
So why is language recognition by the search engines important? The guy that complained that Google identified his website as being in English instead of Romanian reported an 80% drop in traffic, because people in Romania search using Google.ro, using Romanian, but instead he got his traffic only from Google.com because the pages was “supposed” to be in English. I would assume that his click-through rate would also drop, as most of the people finding him through Google.com might not speak Romanian in the first place!
The first reason is therefore wrong placement in the local search engines. If your website is in a specific language, you may rank very well in a certain local Google. You may not be able to rank #1 in the global Google because the rest of the world gets diverted to the local Google versions (See “Your #1 ranking is worthless in the rest of the world“), but you can rank really high on the local Google version. Or can you? If your language is detected incorrectly,as happened with this Romanian chap, then your results could plummet because the local Google will obviously rank higher results that are in the “native” language of that country. If your pages are in Romanian but the page is identified as being English, you will rank nowhere in Google.ro! Similarly, your English pages that are identified as being in Romanian might do pretty well in Google.ro, but will disappear from Google.com!
A second reason is the language factor in the keywords. Imagine you want to score high for the keyword “kitchen”, but your pages are classified as being Spanish. In Google.com, being the U.S. site, a site in Spanish will score pretty bad when the search is performed in English- you’re handicapping your search results. On the other hand, this will score pretty well in, say, Google.es (Spain), where everybody in Spain will be redirected when they try to access Google. But how many people will actually search in Spain for the word “kitchen”, which is in English? Will they not rather search for the word “cocina”, which is how it is translated into Spanish? And even if they find you, will they understand your pages? And will they buy from you, when you are supposedly selling only in the US? Imagine that instead of Spanish, which is pretty common in the US, your English pages are identified as Turkish? Or Chinese?
A third factor that will impact you is the impact of the Panda algorithm on localized websites. Imagine that your English website is identified as being in Turkish. How would a word like “kitchen” be considered? A misspelling in Turkish? Rubbish text? Not only could you be hit by being in the “wrong” localized Google, but you could ALSO be penalized by Panda because of it would fail the Google Google Webmaster Central blog guidelines about quality sites.
Finally, imagine that the visitor does not speak your language, as is likely to happen if he gets to your page from the wrong local search engine. The use of a tool like Google Translate becomes worthless, as the source text will be wrongly identified the translation will be completely meaningless. For example, if your visitor is German, and your English site is identified as Japanese, what kind of accurate information do you expect from the Japanese-German translation of your site?.
But, you may say, identifying the page language is very easy, as HTML has means to indicate it. Well, though some of the search engines DO use it, Google is for example quite emphatic that their methods of language detection do NOT depend at all on the HTML markup (eg <html lang=”en”> ). Tough luck, buddy, that won’t work. Luckily for you, I already wrote a post on 13 Tricks To Ensure Language Recognition By Search Engines.
How to detect that Google detects the wrong language? There are several ways. An indirect way is when you identify in your logs that you get a lot of traffic from a local Google that is not in your “natural” market. But the easiest method by far is to use Google itself – search for your site, and if it offers to translate it to your native language, then it is evident that you have a problem.
And as a bonus, here’s an additional trick: If you want to verify that your pages are recognized by Google for a specific language, there is a way to do that – use Google with a site: -query and language restrict such as for example: http://www.google.com/search?q=site:seo-translator.com&meta=lr%3Dlang_en&hl=en&gl=US. Simply replace the “English” codes by those of the language you want to research and you’ll get the pages in that particular language.