নৈঃশব্দ

Stemming of Assamese

Stemming is the process of automatically extracting the base form of a given word of a language. Assamese is a morphologically rich, relatively free word order, Indo-Aryan language spoken in North-Eastern part of India that uses Assamese-Bengali script for writing. As it is among the less computationally studied languages, our aim is to extract stem from a given word. We adopt the suffix stripping approach along with a rule engine that generates all the possible suffix sequences. Algorithm-I Read a line from the corpus file. Extract words (from this point we called it as token ) from the line, clean the token, that is remove punctuation marker attached with token if there is one. Look up suffix-list generated manually from the end of the token. If matched with the suffix-list extract and exit. Go to step 1 until the end of the corpus. Algorithm-II Read a line from the corpus file. Extract words (from this point we called it as token fro...

অধিক পঢ়ক »

নৈঃশব্দ

Search This Blog

Posts

Stemming of Assamese

Suffix based Noun and Verb Identifier for Assamese