Issue 8 |

On Ausbausprachen and Abstandsprachen: How do you define a language?

by on September 20, 2016

आज तीन घंटे बारिश हुई, सड़कों पर पानी जमा हो गया

0808 perso-arabic script

The sentences above, the first in the Devanagari script, the second in the Perso-Arabic script, both mean ‘it rained for three hours today, water built up on the roads’. Both are also, incidentally, pronounced in the exact same manner — ‘āj tīn ghanṭe bārish huī, saṛkon par pānī jamā ho gayā’. Puzzlingly, despite being identical in pronunciation and meaning, they are written in two different ‘languages’ — the former in Hindi, the latter in Urdu.

Hindi and Urdu are formal standards of the same dialect, linguists usually hyphenate them as a single language, ‘Hindi-Urdu’, or simply refer to it as Hindustani. The two are not only similar, but virtually identical in grammar, syntax, and informal vocabulary. Visible differences only begin to show in discussions of a decidedly technical nature. In such situations, standard Urdu taps into Persian and Arabic, while Hindi borrows (or re-borrows) words from Sanskrit. In everyday conversations and popular culture, the two are very difficult to tell apart. ‘Hindi’ and ‘Urdu’ were interchangeable terms until the early 20th century, the idea that they are distinct languages emerged along with inter-communal politics in the region. After the British administration made Urdu the sole official language of the northern provinces of India, resentment arose against what was seen as a ‘Muslim’ language, with its Perso-Arabic vocabulary and script. The new standard of Hindi with its Sanskrit-based formal vocabulary and indigenous script was the ‘Hindu’ alternative (National Council for Promotion of the Urdu Language, 2007).

Yet, publicly professing the idea of a single ‘Hindi-Urdu’ language can invite very passionate dissent from speakers. An ethnic Punjabi from Lahore who speaks Urdu as a second language will vehemently defend its status as a ‘separate language’ from Hindi, even as she watches Indian Hindi-language films with perfect comprehension. A native Hindi speaker from New Delhi will be similarly resolute in his defence of the separateness of his native language from the national language of Pakistan, even though he faces no language barrier while talking to Pakistanis he may meet while on a trip to London. Language rests on faith here, often literally.

While the example of Hindi-Urdu may be surprising, it is far from unique. Arbitrary, counter-intuitive definitions of ‘language’ and ‘dialect’ are highly commonplace. Popular ideas of what are ‘different languages’, or varieties of the same language, can often be in complete disregard of objective linguistic parameters.

Austrian linguist, Heinz Kloss, coined the terms Abstandsprachen, and Ausbausprachen which refer to two different sets of criteria for distinguishing one language from the other. The former (literally ‘distance language’) is a form of speech which is distinguished from others on the basis of objective linguistic criteria. Spanish and French are distinguished from each other as abstand languages because they differ in pronunciation, vocabulary, grammar, and orthography — they are largely mutually unintelligible (speakers of one can’t normally understand speakers of the other without formal training), and a layperson will be quick to notice the difference (Kloss, 1967).

The latter (lit: ‘elaboration language’) refers to languages which are differentiated for socio-political reasons despite mutual intelligibility. Ausbau languages are often tools for building nationalist identity — to carve reality into simple constructs of ‘one-nation one-language’. Such languages form a continuum, in terms of how different they actually are. On the one hand, there may be multiple standards that are based on the same dialect (like Hindi-Urdu); on the other, there are ausbau languages that are based on separate dialects (Bulgarian and Macedonian, or German and Luxembourgish), but are not different enough to qualify as abstand languages. Different ausbau languages normally emerge when groups of different ethnicities or different nationalities, who speak the same language, decide to emphasize their difference by claiming a difference in language.

Both Persian and Turkish are interesting examples of the use of ausbau languages as a political tool, both languages were spoken in different regions of the former Soviet Union, and were similarly manipulated to support state agendas. Persian has multiple standardized forms — the eponymous ‘Persian’ or ‘Farsi’ based on the dialect of Tehran, ‘Dari’ in Afghanistan, and ‘Tajiki’ in Tajikistan — where it is written in the Cyrillic script. Azeri, the official language of Azerbaijan, is largely mutually intelligible with Turkish, and until the early 20th century Azerbaijanis would normally refer to their own language as simply ‘Turkish’. The idea of Tajiki and Azeri as separate from Persian and Turkish only gained traction as a result of Soviet language policy specifically aimed at distancing the Tajik Soviet Socialist Republic and the Azerbaijan Soviet Socialist Republic from Iran and Turkey respectively undefined.

The languages of most of Scandinavia — Norwegian, Swedish and Danish, all of which belong to the North-Germanic language family — behave more like a group of dialects than separate languages, with a very high degree of mutual intelligibility. Norwegians, Swedes and Danes can watch each other’s television and talk to each across the table without any formal training. The de jure difference in language is largely to match the difference in nationality. In the United Kingdom of Denmark and Norway (which ceased to exist in 1814), what we know today as dialects of Norwegian were simply considered regional varieties of Danish (Jahr, 2014). National borders are also the major reason we think of Serbian, Croatian and Bosnian as separate languages today — all mutually intelligible forms of a multipolar Serbo-Croatian language (Brazović, 1991).

Political manipulation of the definition of language works in the other direction as well, nationalities or ethnic groups may maintain an artificial ‘sameness of language’ to support their claim to a single identity. German speakers who speak only Hochdeutsch, or Standard German, face a language barrier when they move to ‘German-speaking’ parts of Switzerland. While Standard German is taught in schools almost as a foreign language, the Schwiizertüütsch or Swiss German that’s spoken on the streets is surprisingly opaque to those who only speak Hochdeutsch (MacNamee, 2010).

On a larger scale, the various ‘dialects’ of Arabic vary steadily on a continuum from west to east quite like the Romance languages in Europe, so that Moroccan Arabic is largely incomprehensible to a speaker of Levantine Arabic. Educated Arabs with different native dialects would normally switch to Modern Standard Arabic to communicate with each other — a relatively new standardized form based on the Classical Arabic of the Qur’an, taught quite like a second language in schools across the Arab world. The enigma that the Arab situation is becomes clearer when we realize that this is analogous to France, Spain, Portugal and Italy till a few hundred years ago. Classical, liturgical Latin, taught in schools across the region was the lingua franca of the educated classes and the language of administration and science. Whereas the local varieties of Latin, which by then had already greatly diverged into early forms of French, Spanish, Portuguese and Italian were dismissed as crude ‘dialects’ (even referred to as “Vulgar Latin”) that shouldn’t be used in any formal situation. The idea of the Chinese ‘dialects’ is similarly deceptive; Mandarin and Cantonese for example, are at least as different in speech as German and Swedish  (Zhang, 1998; The Economist, 2013).

Amid this chaos, what does it even mean to be a language anymore? There is no single definition in practice and it’s futile to look for rationality in how such distinctions work in real life. Language is a very fluid concept to begin with, languages and dialects vary gradually on a continuum, and it’s difficult at the get-go to decide when two forms of speech begin to vary enough to be considered different languages. To add to the complexities, language is a convenient tool for furthering political agendas.

To be a ‘language’ does not always mean what we expect. What eventually gets labelled as a ‘language’ or a ‘dialect’ is very often the result of a series of events in political history, and is not necessarily always based on objective linguistic criteria of mutual intelligibility. People that claim to speak the “same language” could struggle to understand each other. Two ‘dialects’ may be much further apart than different ‘languages’ like Spanish and Portuguese (or even Spanish and French). Groups that claim to speak different languages could still be able to readily communicate with little or no formal training (as with Hindi and Urdu).

This could make for a fascinating thought experiment — would Norwegian still have emerged as a separate language if the Denmark–Norway union had survived? Would Hindi and Urdu be considered two different languages today if not for inter-communal tensions and the eventual partition in South Asia? If the province of Guangdong (‘Canton’) were its own nation, would Cantonese still be considered a ‘dialect’ of one Chinese language? It’s useful to always remember that a language is just a “dialect with an army and a navy”, or that a dialect is a language that no one’s trying to market. Cultures and peoples do not vary in abrupt black-and-white categories; national borders are artificial, circumstantial — poor indicators of the complexities of popular identity. ‘Linguistic borders’, often invented to complement such national borders, suffer from the same drawbacks.


Brazović, Dalibor (1991). Serbo-Croatian as a Pluricentric Language. In Michael Clyne (ed.), Pluricentric Languages: Differing Norms in Different Nations (pp. 347-380). Berlin, Germany: Walter de Gruyter.

Jahr, Ernst Håkom (2014). Language Planning as a Sociolinguistic Experiment: The Case of Modern Norwegian. Edinburgh, United Kingdom: Edinburgh University Press.

Kloss, Heinz (1967). ‘Abstand Languages’ and ‘Ausbau Languages’. Anthropological Linguistics 9(7): 29-41. Retrieved from

MacNamee, Terence (2010). German Newcomers Struggle with Swiss German. Retrieved from

National Council for Promotion of the Urdu Language (2007). Government of India. A Historical Perspective on Urdu. Retrieved from

The Economist (2013). Arabic: A Language with too many armies and navies?. Retrieved from

Zhang Xiaoheng (1998). Dialect MT: A Case Study Between Cantonese and Mandarin. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics – Volume 2 (pp. 1460-1464). Available at

Leave a Comment