International Journal of Applied Information Systems |
Foundation of Computer Science (FCS), NY, USA |
Volume 1 - Number 9 |
Year of Publication: 2012 |
Authors: Deepak Sharma, Prakash.r.devale |
10.5120/ijais12-450225 |
Deepak Sharma, Prakash.r.devale . Approach for Transforming Monolingual Text Corpus into XML Corpus. International Journal of Applied Information Systems. 1, 9 ( April 2012), 1-5. DOI=10.5120/ijais12-450225
In this paper, we are presenting the approach to convert the text based monolingual corpus to Part-Of-Speech tagging using an standard tagging tool in tagged file and then convert tagged file in the XML format as per defined DTD (Document Type Definition). The tagged text document is parsed through the logic to generate the corpus in XML and also, it can be further used for Information Retrieval, Text-To-Speech conversion, Word Sense Disambiguation and also useful for preprocessing step of parsing by providing unique tag to each word which reduces the number of parses.