Building morphological analyzer for Nepali
List of Authors
  • Bhat, Shahid Mushtaq , Rai, Rupesh

Keyword
  • Morphological analyzer, Word and paradigm model, Apertium, LT-Tool Box, Paradigm, Concatenative Morphology, Machine Translation, Devnagri, Transliteration

Abstract
  • Morphological analyzer is a fundamental tool in Natural Language Processing (NLP) that generates the morphological analyses of a given word-form. It can be used in enhancing the accuracy of POS-Tagging, Chunking, Syntactic Parsing, Word Sense Disambiguation (WSD), Information Retrieval (IR) & Machine Translation (MT) Systems. This paper describes an ongoing effort to develop Nepali morphological analyzer, using an open source platform-Apertium (LT-Toolbox). Since, it is the initial stage of this project; we have confined our work to inflectional morphology. So far, we have covered all the possible categories, as per LDC-IL1 POS tag-set of Nepali. Currently, the coverage of Nepali Morph-Analyzer is 20,000 words, classified into 219 paradigms.

Reference
  • Adhikari, H. R. (1993). Samsamayik Nepali vyaakaran. Kathmandu: Kunjal Prakashan.
    Bharti, A., Chantanya, V. & Sanghal, R. (1995). Natural language processing: A Paninian perspective. New Delhi: Prentice Hall.
    Baerman, M., Brown, D., & Corbett, G. G. (2005). The syntax-morphology interface: A study of Syncretism. Cambridge University Press.
    Boye, G. (1999). Nepali verb morphophonology. In P. Yogendra, P. Yavada & W. Warren (eds.). Topics in Nepalese linguistics (pp. 118-169). Kathmandu: Royal Nepal Academy.
    Forcada, M. L., Bonev, B. I., Rojas, S. O., Ortiz, J. A. P., Sanchez, G. R., Martinez, F. S., Armentano-Oller, C., Montava, M. A., & Tyers, F. (2010). Documentation of the open-source shallow-transfer machine translation platform Apertium. Retrieved on December 3, 2012 from http://xixona.dlsi.ua.es/~fran/apertium2-documentation.pdf.
    Hussain, S. (2004). Finite-state Morphological analyzer for Urdu. Unpublished MS thesis, Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, Pakistan.
    Jurafsky, D. & Martin, J. (2005). Speech and language processing: An introduction to natural language processing computational linguistics and speech recognition. Boulder: University of Colorado Boulder.
    LDC-IL Transliteration Chart For Indian Languages, Retrieved on December 3, 2012 from http://www.ldcil.org/download/Transliterationstandards.pdf.
    Stump, G. T. (2001). Inflectional morphology. A theory of paradigm structure. Cambridge: Cambridge University Press.
    Uma Maheshwar Rao G. & Prameshwari, K. (2010). On the description of morphological data for morphological analyzers and generators: A case study of Telgu Tamil & Kannada. Knowledge Sharing Events. LDC-IL, CIIL, Mysore.
    Vaidhya, Ashwini and Dipti Mishra.2009.Using Paradigms for Certain Morphological phenomenon in Marathi, 7th International Conference on NLP (ICON-2009. New Delhi: Macmilan.
    Zwicky, A.M. (1985). How to describe inflection. Linguistic Society 11, 372-386.