The PersPred project

PersPred, is the first online multilingual syntactic and semantic database of Persian compound verbs (complex predicates), developed by the members of the research unit Mondes iranien et indien (CNRS, Sorbonne Nouvelle, Inalco, EPHE) within the ANR-DFG project PERGRAM (2008-2012) and the LR4.1 work package of the Strand 6 of the Labex Empirical Foundations of Linguistics (EFL). Upon its first delivery in June 2012, PersPred included around 650 compound verbs formed with zadan ‘to hit’. The present version includes around 1500 entries with various verbs, ex. dâdan ‘give’, gereftan ‘take’, âvardan ‘bring’, etc. 25 fields are encoded for each entry to capture its lexical, syntactic and semantic properties. PersPred is an ongoing project. Over time, its goal is to provide an extensive and richly annotated lexicon of Persian complex predicates.  

Persian has only around 250 simplex verbs, half of which are currently used by the speech community. The morphological lexeme formation process outputting verbs from nouns, largely used in languages such as English (ex. comb (n.) > to comb (v.), short (adj.) > to shorten (v.))  and French (ex. balai (n.) > balayer (v.), court (adj.) > raccourcir (v.)), though available, is no longer productive in Persian. The verbal lexicon is mainly formed by syntactic combinations, including a verb and a non-verbal element, which can be a noun, e.g. harf zadan ‘to talk’ (Lit. ‘talk hit’), an adjective, e.g. bâz kardan ‘to open’ (Lit. ‘open do’), a particle, e.g. bar dâštan ‘to take’ (Lit. ‘PARTICLE have’), or a prepositional phrase, e.g. be kâr bordan ‘to use’ (Lit. ‘to work take’). These combinations are generally referred to as “complex predicates”, “compound verbs” or “light verb constructions”. New “verbal concepts” are regularly coined as complex predicates rather than simplex verbs, for instance yonize kardan ‘to ionize’ (Lit. ‘ionized do’) instead of yonidan.

Consequently, in the same way as the verbal lexicon of English includes all its simplex verbs, the inventory of the verbal lexicon in Persian (dictionaries), must include these combinations. However, despite several attempts, this task has not been carried out in a systematic way and such a resource is cruelly missing. Although dictionaries mention some of the lexicalized combinations, either under the entry associated to the verb, or to the non-verbal element, the underlying criteria in the choice of combinations is far from being clear and the resulting list significantly varies from one dictionary to another.

PersPred aims to contribute to fill this gap by proposing a framework for the storage and the description of Persian complex predicates. It provides not only an inventory of Persian complex predicates, but also a rich syntactic and semantic annotation for each of them, along with its translation in English and in French. Each complex predicate is illustrated by one or several attested examples from literature, written press and the Web. A particularly innovative feature of PersPred is that it proposes semantic groupings of predicates. This accounts for the productivity of these combinations and thus allows for the integration of the newly coined predicates.

Its design and the diversity of the information it contains make PresPred not only an appropriate lexicographic tool for numerous applications such as translation, NLT, language teaching, but also a valuable resource for investigations on various theoretical topics.

The online interface allows for a multi-criteria exploration. A detailed description of PersPred is provided at PersPred Documentation. For an efficient exploration, please refer to this documentation. 

PersPred is distributed under the LGPL-LR license. You must accept the terms of LGPL-LR license to use this resource.



We would like to thank Sarra El Ayari for her invovement in the previous stages of the elaboration of PersPred.


To cite

Samvelian Pollet & Pegah Faghiri (2013), Introducing PersPred, a syntactic and semantic database for Persian Complex Predicates, in Proceedings of The 9th Workshop on Multiword Expressions, NAACL-HLT 2013, June 14-15, Atlanta, Georgia, USA : Association for Computational Linguistics, pp 11-20.