Post-Doctoral position : Automatic acquisition of lexicon/Automatic acquisition of Persian compound verbs
(complex predicates) and their semantic and syntactic properties (with possibility of extension to other related languages)
Strand 6 - Language Resources
CONTACT: Pollet Samvelian - Mondes Iranien et Indien - Université Sorbonne Nouvelle Paris 3
CONTACT EMAIL: firstname.lastname@example.org
JOB RANK: Post Doc
DURATION: 12 months
SALARY: 2000 to 2400 euros per month net of taxes (according to applicant's experience)
KEYWORDS: Computational linguistics or computer science with knowledge in natural language processing
APPLICATION DEADLINE: 2016/03/18
Starting date: From May 1st on
The project consists in automatically enriching the PersPred database (http://perspred.cnrs.fr/), the first online syntactic and semantic multilingual database dedicated to Persian compound verbs (complex predicates). PersPred currently has 1600 entries. We aim at considerably improving the coverage of the database (by tripling the number of entries). We are also interested in the possibility of applying the methods elaborated for enriching PersPred to develop similar resources for other Iranian languages, Kurdish or Pashto, for instance, which like Persian resort to compound verbs.
Persian has only around 250 simplex verbs, half of which are currently used by the speech community. The morphological lexeme formation process outputting verbs from nouns, largely used in languages such as English (ex. comb (n.) > to comb (v.), short (adj.) > to shorten (v.)), though available, is no longer productive in Persian. The verbal lexicon is mainly formed by syntactic combinations, including a verb and a non-verbal element, which can be a noun, e.g. harf zadan ‘to talk’ (Lit. ‘talk hit’), an adjective, e.g. bâz kardan ‘to open’ (Lit. ‘open do’), a particle, e.g. bar dâštan ‘to take’ (Lit. ‘PARTICLE have’), or a prepositional phrase, e.g. be kâr bordan ‘to use’ (Lit. ‘to work take’). These combinations are generally referred to as “complex predicates”, “compound verbs” or “light verb constructions”. New “verbal concepts” are regularly coined as complex predicates rather than simplex verbs, for instance yonize kardan ‘to ionize’ (Lit. ‘ionized do’) instead of yonidan.
Although Persian complex predicates have been a focus of interest in theoretical studies, little attention has been paid to the necessity of the elaboration of a rich lexicon of these combinations. Computational studies have mentioned the lack of large-scale lexical resources for Persian and have developed probabilistic measures to determine the acceptability of the combination of a verb and a noun as a CP (Taslimipoor et al., 2012). PersPred aims to contribute to fill this gap by proposing a framework for the storage and the description of Persian CPs. Based on Samvelian’s (2012) theoretical and descriptive survey, PersPred provides a syntactic and semantic classification of Persian complex predicates. Its first version was developed within the PERGRAM project (ANR-DFG) and included around 700 entries with the verb zadan `to hit’ (Samvelian & Faghiri 2013, 2014, to appear). PersPred2 was obtained by semi-automatic enrichment of PersPred1 using the valency information encoded in the database.
Applicants should have a PhD in computational linguistics or computer science with an excellent knowledge in natural language processing. Familiarity with methods of automatic acquisition of lexical knowledge about multiword expressions is desirable.
Knowledge of Persian or at least familiarity with the Arabic script would be highly appreciated.
The project is a part of the work package LR41 (Morphological and syntactic resources for Iranian languages), of the strand 6 of the Labex EFL. It will be carried out under the joint supervision of Pollet Samvelian (for the linguistic component) and Benoît Crabbé (for the computational component).
- Applicants are invited to send to Pollet Samvelian (email@example.com), Benoît Crabbé (firstname.lastname@example.org) and Cédric Gendrot (email@example.com):
- A cover letter
- A CV including their list of publications
- The name and the contact of two referees
- A link for downloading their publications
- Bonami, O. and P. Samvelian. 2010. Persian complex predicates: Lexeme formation by itself. Paper presented at Septièmes Décembrettes Morphology Conference, Toulouse, December 3.
- Samvelian P. & P.Faghiri .2014. Persian Complex Predicates : How compositional are they ?, Semantics - Syntax Interface 1:43-75, University of Tehran.
- Samvelian P., Faghiri P. & S. El Ayari (2014), Extending the coverage of a MWE database for Persian CPs exploiting valency alternations, in Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland, 26-31 mai 2014.
- Samvelian P. & Faghiri P. (2013), Introducing PersPred, a Syntactic and Semantic Database for Persian Complex Predicates, in Proceedings of the 9th Workshop on Multiword Expressions, NAACL-HLT 2013, June 14-15, Atlanta, Georgia, USA : Association for Computational Linguistics, pp. 11-20.
- Samvelian P. & P. Faghiri. to appear. Re-thinking compositionality in persian complex predi- cats. In Proceedings of the 39th Berkeley Linguistics Society. Linguistic Society of America, Berkeley.
- P. Samvelian. 2012. Grammaire des prédicats complexes. Les constructions nom-verbe. Lavoisier.
- S. Taslimipoor, A. Fazly, and A. Hamzeh. 2012. Using noun similarity to adapt an acceptability measure for persian light verb constructions. In Language Resources and Evaluation Conference (LREC 2012), Istanbul.