At a certain moment in time, many website owners start thinking about whether they should or not translate their website. It’s a big world out there, and once you have decided to target the whole world, your single-language site is not sufficient to expand your services world-wide. Most of your potential future customers might not even know your language, so it will be difficult to sell them anything. And currently there are in the world not less than 6,909 known living languages!
By itself this does not mean much. There are 133 languages with less than 10 people speaking it, 472 languages with less than 100 native speakers, but only 8 languages with more than 100 million people having them as their native language. Overall, it turns out that 389 (or nearly 6%) of the world’s languages have at least one million speakers and account for 94% of the world’s population. By contrast, the remaining 94% of languages are spoken by only 6% of the world’s people. (Source: Ethnologue)
Before starting the translation process, you should first check whether it makes economic sense at all. If you have complied with all three golden rules for website translation, you can start. But into which language?
Let’s assume that you do not want to target a particular market, say South America, as in this case the languages would be obvious (Spanish and Portuguese). Let’s say that you want to target the whole world, and get the maximum number of users, independently of where they live.
The obvious choice (if your site is not already in that language) is English. The world population speaking this language in 2009 was estimated to be 1264 million people. Note that this is not the population having English as their native language, which is “only” 328 million people. The difference between native and non-native people is astounding, but becomes evident when you consider that the international communication language is English. In other languages the difference is not so dramatic.
Selection by number of native language speakers?
Once we discard the anomalous result of English, as its international nature distorts the results, one potential selection criteria would be the number of native speakers of a language. The greatest number is obviously Chinese, with some 1213 million Chinese speakers. But this figure is misleading – strictly speaking, there is no such thing as “Chinese”, there are not less than 292 different living languages in China. These are not mutually intelligible, but for sociological and political reasons are considered a single Chinese language. The Mandarin has 845 million people speaking it, but there are other important variants such as Wu (72 million), Yue (55 million), Jinyu (45 million) and others. On the other hand, and contrary to most languages, the main differences are in the spoken forms, while the written form is essentially the same across China. So yes, we could say the are 1.2 billion Chinese “readers”, but do not forget the differences if you want to include audio on your site.
There is a dispute about the second language, but the majority of the philologists seem to agree that Spanish has already overtaken English. Both have around 329 million native speakers, with Spanish slightly more. The fourth language is Arabic (as for Chinese, with many variants), with some 221 million, followed by Hindi and Bengali (182 million), Portuguese (178 million), Russian (144), Japanese (122 million) and German (90 million).
Selection by number of Internet users in that language?
Internet World Stats provides a different picture when you look at the Internet world users by language:
Due to its international nature, English is obviously the first language (#3 of native speakers), with Chinese as the second (#1) and Spanish as the third (#2). But then it gets interesting: Japanese is fourth (#9), Portuguese is fifth (#7), German sixth (#10), Arabic seventh (#4) and -most curious of all- French is seventh (#16!) and Korean tenth (#17!). The fifth and sixth most spoken languages in the world (Hindi and Bengali) do not even appear in the top ten languages! The language penetration in the web is obviously not the same as in the real world (e.g., 76% in North America, vs only 32% in South America), but this is rapidly changing. Internet World Stats report a 1,162 % growth for Chinese, 2,297.7 % growth for Arabic, 1,359.7 % growth for Russian and 669.2% growth for Spanish, for “only” 250% growth for English. The world is catching up, and in a few years we will start seeing the shift.
A strategy for website language selection
So which language(s) should we select? I would suggest that if you want to apply a short-term growth strategy, you should use the ones most used in Internet, but for a long-term vision you should target the “native” populations, as the main growth will be there. Hindi and Bengali, for example, have not even seriously started. The Internet penetration of India is just 7%, in Bangladesh a meager 0.4%. When they start rising from poverty (Bangladesh is one of the poorest countries in the world), there will be a real explosion. Whoever gets into that market early will have a big head start in a market of 360 million people for those two languages only (the population of India & Bangladesh is a combined 1.3 billion people).
And the language niche markets…
Yes, I know that it is very tempting to target the 1 billion+ Chinese, but so is everybody else. Your competition will be fierce. On the other hand, do not forget that there is business to be made taking a whole language as a niche. Ever heard of Godwari, Hunsrik, Piemontese, Hassaniyya, Galician, Lithuanian, Éwé, Shan, Aceh, Kabyle, Tsonga? Yet all these languages have over 3 million native speakers each. OK, so that’s not the same as one billion. But how many pages do you think that there are in these languages? And how many opportunities are you losing because you are not selling them in their own language? Now, what are you waiting for?