Persian Named Entity Recognition
Dashtipour K., Gogate M., Adeel A., Algarafi A., Howard N., Hussain A.
© 2017 IEEE. Named Entity Recognition (NER) is an important natural language processing (NLP) tool for information extraction and retrieval from unstructured texts such as newspapers, blogs and emails. NER involves processing unstructured text for classification of words or expressions into relevant categories. In literature, NER has been developed for various languages but limited work has been conducted to develop NER for Persian text. This is due to limited resources (such as corpus, lexicons etc.) and tools for Persian named entities. In this paper, a novel scalable system for Persian Named Entity Recognition (PNER) is presented. The proposed PNER can recognize and extract three most important named entities in Persian script: The person name, location and date. The proposed PNER has been developed by combining a grammatical rule-based approach with machine learning. The proposed framework has integrated dictionaries of Persian named entities, Persian grammar rules and a Support Vector Machine (SVM). The performance evaluation of PNER in terms of precision, recall and f-measure has achieved comparable results with the state-of-the-art NER frameworks in other languages.