Tools and policies for linguistic diversity

italiano english


Carlo Zoli - CEO Riccardo Capella - CTO


Iacopo Risi Amélie Ngantcha Matteo Rossi Silvia Randaccio

How did the project come about?

The project started as an experiment. Carlo Zoli, the founder, was working in a traditional IT company (He’s an electronic engineer) but he had an “insane” passion for languages and linguistics – during his university career he took a course on Russian language unbeknown to his father, and during his teen years he enjoyed Sardinian grammars and dictionaries as nighttime reading. So he persuaded his partners in the IT company that computational linguistics and lexicography were good investments for their business (actually, he was quite unsure about it!) and he, with the help of some colleagues, started developing a first draft of the electronic platform we have maintained and updated during the years and which is used by several working groups at the moment. Carlo got in contact with some language activists that he found in his researches and some of them accepted to participate in the project.

By the way, it may sound strange that a Florentine guy (which, by definition, should speak the only language variety which can be originally considered “Italian”) is so interested into minority languages. Well, as already said, from an early age Carlo was attracted by the magic of the the diversity of languages​​, by the fact that we can all talk about the world with different sounds, accents, intonation, words and structures. And he was also a little bit disappointed that, as a Florentine, he had only one language at disposal (even if he also had Romagnol ancestors). Carlo felt that there was an entire world in serious danger in our “Italian” culture which was forgetting minority languages and cultures in favor of a unitary tendency. This tendency is slowly reversing in recent years but when he was young the supposed Italian “dialects” (a misleading definition) were mostly denigrated. So he managed to make this passion into a job, together with his other interest - and topic of my formal studies: IT and communication technologies. Carlo also travels a lot, and everywhere he goes he tries to contrast the culture of monolingualism of the national language with the redemption of minority languages: this redemption should necessarily pass through a serious scientific approach, and through the introduction of these languages to modern and technologically advanced tools.

When did the project start? What is the origin of the name “Smallcodes”?

It was 2004 and the first languages Carlo and some colleagues dealt with were Ladin, Sardinian, and Lombard of Italian Switzerland. Carlo worked hard to enlarge our group of contributors and after a few years he was able to establish a foundation on his own. He called it Smallcodes (because a “code” can be a language, but it may also refer to informatics) and he found other crazy people who accepted to work with him at this mission.

Who does the team consist of? What are our backgrounds?

We are a team of six people (a mixed group of ITs, engineers and linguists) and we were able to work at this project only because of a so-called “virtuous snowball effect”; that cooperation among minorities has permitted to generate. In fact, every single step we were able to add to our system is a goal we achieved through the cooperation at the project of different institutions, associations, bodies, activists and so on. This approach to the situation has permitted us to open a small office in Florence and to make a living of our common passion.

What future languages are going to be incorporated on Smallcodes' project? Are we looking at non-European minority languages as well?

We are currently turning to Afroasiatic languages. There are ongoing talks on such future projects and we are especially interested into Tamazight and Maltese language. These languages could really help testing our system and improve it. In fact, non-Indoeuropean languages should necessarily require new features and fields. At the moment our system contemplate hundreds of options and each lexicography group is free to choose those which are useful for their language. But we can say our structure will never be complete: when you talk about a human phenomenon such as language there is always an infinite range of possibilities and even if we dealt with hundreds of languages there could always be a new one which requires different approaches. It is a continuous challenge and that is why it is so exciting. Another group of languages we are really interested into is the Mexican Oto-Manguean family: this language family has such an internal diversity (in a relatively small area) that European languages cannot even be compared with it.

Which minority language is our primary focus right now?

We are dealing with multiple minority languages right now, but there is an ongoing project on Rromani language: this is a European language, technically speaking, but it has some special peculiarities which have also helped us to improve our electronic platform. Rromani language is really diversified from one region to another (in this case differences among varieties are not only caused by historical evolution, but also by geographical dispersion) and it has been a real challenge to demonstrate, in accordance with one of the most important scholars of Rromani language, Marcel Courthiade, that there is indeed a fundamental unity! Our Rromani dictionary support this thesis and for this purpose, for example, we had to develop a special function: the geo-synonymic linkage between entries and between different meaning of the same entry.

How hard is it to create a dictionary for a minority language? How do we gather our resources regarding a language - do we talk to native speakers, or... ?

It is for some aspects harder than making one for a majority language and the reasons are mainly two: the fear for standardization (which is typical of any minority community) and the need for local informants. Starting from the latter, we must be aware that for less resourced languages, language technologies are useless without adequate language resources. But, unlike big languages, these resources for minority languages cannot be found, either on the web or from classical sources, to the same extent as for majority languages. What one finds is inconsistent, due to graphical instability and to scarce presence in official environments. Tools developers such as us must be supported by customers and users who are also co-producers, co-developers and can supply their large amount of data to technology experts. It is therefore crucial to have a federation of users who share this commitment. This federation of users should be non-profit in order to access funding for language policies or for scientific research world-wide (our contributors are universities, local bodies, institutions devoted to language defense, but also cultural associations of volunteers). In this way, the software among this community is open-sourced but guided by a leading developer. Unlike classic open source development, the development of language technology depends both on the software codes and on the linguistic research. The traditional approach of open-source based on volunteers and donations is not applicable in this field because of the need for cooperation among linguistic experts and IT experts.

What’s the business approach?

Our working approach is that the software is centrally developed, and partnerships and funding opportunities are established every time a new language group enters the community. Every new language expert group adds new expertise, new funding, requests new features, but development is pursued in an industrial fashion, with attention to the latest web technologies and with a highly resourced staff in an a “web 2.0 commercial way”. Then, the business itself is basically non-profit, but however this is different from software development done inside the linguistic academic world, which cannot have the structure and the attitude of a software house such as ours.

How can we better explain the need for standardization of these languages?

One of the main obstacle to language rehabilitation is that it must necessarily pass through some kind of standardization (only because, for example, a dictionary normally requires regularity). A standard written form is necessary for every language to meet today’s technological expectations: in fact only a standardized orthography can access modern and technological tools (see for example Google Translator and the like). Moreover, writing is the only possible approach to the domains of administration, bureaucracy and education. Therefore, language standardization should not be regarded as an enemy of autochthonous languages but as a small price to be paid to support minority languages in the real world and in the cyberspace. If we want to save and preserve minority languages in the modern world, we must necessarily let them have access to the tools and resources of the same technological level as those of “bigger” languages.

But anyway, we must be realistic: it is very common to find people who fears standardization because they are afraid of cultural loss. And that is why our dictionaries normally contemplate the exact range of local variants and treat them equally. But at the same time we try to spread the message that languages such as Sardinian, Occitan, Rromani and so on are basically unitary and are NOT an unclear mixture of dialects (as the imperialistic logic of divide et impera has tried to publicize in the past centuries).

How do we finance this project?

This question has been partially answered before: we work with both local bodies and institutions, universities and local associations of activists which decided to devote part of their funding in order to join our project and have access to our platform, to our linguistic consulting and to our IT resources for future software development or improvements.

Our final vision: the creation of a network. What does this mean?

We want to conclude with a metaphor: minority languages, if they want to survive in the modern world, must be like “dwarves sitting on the shoulders of giants”. This means that in order to avoid fragmentation and useless dispersion of human and financial resources, it is necessary a collaborative approach and also the willing to benefit from the experience and technological development already reached by majority languages. Joining forces and building a compact community is essential for a rapid and effective development of language technologies for small languages. Our vision is to bring together experts of research institutions, academy, companies, consortia, associations, and individual language activists. This community is aimed at sharing expertise (using, in a “green” perspective, what already exists, without creating new resources which can be a waste of time and money) and, once they have obtained some results, spreading them among other language communities which can take profit of it. This is a challenge all minority cultures must take if they do not want that the divide with the more resourced language will get wider, and smaller languages will definitely cease to exist in the digital space. And this is also the ultimate reason of our project and the core of our beliefs.