CFP last date
16 December 2024
Reseach Article

Approach for Transforming Monolingual Text Corpus into XML Corpus

by Deepak Sharma, Prakash.r.devale
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 9
Year of Publication: 2012
Authors: Deepak Sharma, Prakash.r.devale
10.5120/ijais12-450225

Deepak Sharma, Prakash.r.devale . Approach for Transforming Monolingual Text Corpus into XML Corpus. International Journal of Applied Information Systems. 1, 9 ( April 2012), 1-5. DOI=10.5120/ijais12-450225

@article{ 10.5120/ijais12-450225,
author = { Deepak Sharma, Prakash.r.devale },
title = { Approach for Transforming Monolingual Text Corpus into XML Corpus },
journal = { International Journal of Applied Information Systems },
issue_date = { April 2012 },
volume = { 1 },
number = { 9 },
month = { April },
year = { 2012 },
issn = { 2249-0868 },
pages = { 1-5 },
numpages = {9},
url = { https://www.ijais.org/archives/volume1/number9/113-0225/ },
doi = { 10.5120/ijais12-450225 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T10:41:56.388792+05:30
%A Deepak Sharma
%A Prakash.r.devale
%T Approach for Transforming Monolingual Text Corpus into XML Corpus
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 1
%N 9
%P 1-5
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, we are presenting the approach to convert the text based monolingual corpus to Part-Of-Speech tagging using an standard tagging tool in tagged file and then convert tagged file in the XML format as per defined DTD (Document Type Definition). The tagged text document is parsed through the logic to generate the corpus in XML and also, it can be further used for Information Retrieval, Text-To-Speech conversion, Word Sense Disambiguation and also useful for preprocessing step of parsing by providing unique tag to each word which reduces the number of parses.

References
  1. Andrew MacKinlay and Timothy Baldwin, "POS Tagging with a More Informative Tagset", at Proceedings of the Australasian Language Technology Workshop 2005, pages 40–48, Sydney, Australia, December 2005.
  2. Christopher D. Manning, Part-Of-Speech Tagging From 97% To 100%: Is It Time For Some Linguistics?, in CICLing2011.
  3. Su Cheng Haw, G. S. V. Radha Krishna Rao,,"A Comparative Study and Benchmarking on XML Parsers", Faculty of Information Technology, Multimedia University, 63100 Cyberjaya.
  4. Edwin Goei, Software Engineer, Sun Microsystems," Java and XML Parsing Using Standard APIs", September 11, 2000
  5. Nishchal Bhalla, Sahba Kazerooni,"Web Services Vulnerabilities", at Security Compass Inc 2007.
  6. C. Ramisch, A. Villavicencio, C. Boitet, Mwetoolkit: A Framework For Multiword Expression Identification", in: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valetta, Malta, May 2010
Index Terms

Computer Science
Information Sciences

Keywords

Part-of-speech Tagging Java Xml Library Dom Parser