International Conference and workshop on Advanced Computing 2013 |
Foundation of Computer Science USA |
ICWAC - Number 3 |
June 2013 |
Authors: Nazia Ilyas Baig, Gresha Bhatia |
1d947cf2-f226-403f-8435-1e31848a2284 |
Nazia Ilyas Baig, Gresha Bhatia . WSD Tool for Ontology-based Text Document Classification. International Conference and workshop on Advanced Computing 2013. ICWAC, 3 (June 2013), 0-0.
The classification of document is required to extract relevant information from the huge set of documents. There are various traditional approaches which are being used satisfactorily, but even such approaches or techniques are not enough. These traditional approaches require training sets of pre-classified documents in order to train the classifier. These approaches mainly depend only on 'bag of words', this representation used is unsatisfactory as it ignores possible relations between terms. When training set is not available, ontology provides us with knowledge that can be efficiently used for classification without using training sets. Ontology expresses information in the document form of hierarchical structure. For classifying the documents using ontology we need to define the class or the concepts to categorize the document. Here we use WordNet to capture the relations between the words. Also it is seen that WordNet alone is not sufficient to remove Word Sense Disambiguation (WSD). So in our approach we use Lesk algorithm to deal with the WSD. In this paper, we implement the tool which disambiguates a keyword in the text file. This tool is actually a utility where the input will be a text file and the utility will process the input file to give the best sense for the most occurring keyword in the file. There are various modules for achieving this. This keyword is further used for mapping with concepts to create ontology. The ontology will have classes/concepts defined for all the files in the corpus. Our approach is leveraging the strengths of ontology, WordNet and Lesk Algorithm for improving text document classification.