Pages

Sunday, April 10, 2011

Wikiwhat?

As posted on yourStory.in

The greatest differentiator of humans and our triumph as the alpha species on the planet has been our increasing ability to record and share our collective knowledge. With this ability, each new generation, rather than reinventing the wheel, can stand firmly on the shoulders of those before to reach further. Today with the internet we can do this better, faster and among more people than ever before. Wikipedia is an incredible example of this. Today Wikipedia has over 15 million articles contributed by several hundreds of thousands of people and is one of the largest and most actively accessed public repository of human knowledge. These articles are in 281 different languages. Yet almost 30% are in a single language – English. No surprise. The top ten languages – all western European with the exception of Japanese and Russian, account for almost 70%. I’m still not surprised.

I scroll down the list of languages on Wikipedia, sorted by their article count, searching for Indian languages. Right away I pass Chinese (Mandarin) at number 12. With about 331,000 articles it is just close to 10% of English. But 12 is not a bad rank. I keep going expecting to find Hindi and Tamil in close succession....I see Ukranian (15), Vietnamese (17), Indonesian (21), Arabic (25), Lithuanian (28)..Volapuk (31)... I’ve never even heard of Volapuk. I click the link.


Volapuk is a constructed language, Wikipedia tells me, created in 1879–1880 by Johann Martin Schleyer, a Roman Catholic priest in Baden, Germany. Schleyer felt that God had told him in a dream to create an international language....... In 2000, it was estimated that there were 20–30 Volapük speakers in the world

How is it even possible that Volapuk is ahead of Hindi, a language with some 250 million native speakers? I keep going... Waray-waray (36), Croatian (37) and then finally, there it is: Hindi at 38 with about 90,000 articles. I breathe a sigh of relief. At least it’s there. Telugu comes in next at 53 with 47,658 articles, Marathi at 61 with 33,270 articles and Tamil at 68 with 30,082. In between are whole bunches of language I have never heard of.

I'm not just surprised, I'm shocked! The Hindi Wikipedia is only 25% of Chinese. But I suppose I expected the Chinese to be ahead. What troubles me more are comparisons like these: Vietnam has a population equivalent to Andhra Pradesh (almost 80 million) and only slightly above Tamil Nadu (about 65 million) and a per capita GDP equivalent to India, but 5 to 8 times as many Vietnamese Wikipedia articles. And while there are about 200,000 Vietnamese Wikipedia users, there are only roughly 20,000 or 10% as many Telugu Wikipedia users, equivalent to just .02% of the Telugu speaking population. The fraction of Hindi speakers on Hindi Wikipedia is even lower. Yet there are an estimated 100 million internet users in India. That’s about roughly equivalent to the estimated English speakers. So by my best guess, the two – internet users and English speakers - probably overlap almost completely. The other 90% are shut out of an enormous knowledge base. We are more divided a country than I had imagined.

I probe further. For Hindi and Tamil about half of the content or edits are contributed by a handful of people – four or five to be precise (I didn’t check out the other Indian languages). This is a tenuous link between a knowledge repository of enormous value and potentially hundreds of millions of people. It is disconcerting.

The promise of the internet is the ability to deliver relevant knowledge and information fast. So as we get excited about the extraordinary potential of the Internet to change this country, we have to ask: what’s on offer on the internet for the 90% of India that don’t speak English? When Gulshan from Baganwala and Munnuswamy from Usilampatti log on for the first time bright eyed with anticipation of a great new world of knowledge they will be sorely disappointed.

For the first time it strikes me that to deliver the promise of the internet in India, broadband penetration is not the biggest barrier. The barrier is far greater. It is language.

3 comments:

  1. A very interesting read.. especially the part where a recently constructed language such as Volapuk is ranked higher than any of our Regional languages... insightful

    ReplyDelete
  2. I think you nailed it right on the head. Rather than using western or some international parameters, local environment should be considered in case of a multicultural, multi-divided India. What i would like to know is that are there such standardized social parameters researched by social scientist in India and why is this information not accessible or rather can it be made accessible and marketed as well so that efforts are guided in the right direction by the people who are striving to do so.

    ReplyDelete
  3. I guess most net-savvy Tamilians know English enough to not worry about Tamil articles. I guess the same goes for most other regional language speakers except for those in the Hindi belt.

    Even if there were more Hindi articles, I doubt they would be useful, for they are illiterate and don't know how to read.

    ReplyDelete