International Journal of Applied Information Systems |
Foundation of Computer Science (FCS), NY, USA |
Volume 7 - Number 10 |
Year of Publication: 2014 |
Authors: Barilee Baridam |
10.5120/ijais14-451243 |
Barilee Baridam . Biological Sequence Clustering with Symbol Table Data Structure. International Journal of Applied Information Systems. 7, 10 ( October 2014), 1-6. DOI=10.5120/ijais14-451243
Clustering is the identification of interesting distribution patterns and similarities, natural groupings or clusters, within a collection of objects in a dataset based on some user-defined criteria. Clustering as an unsupervised learning problem can be distance-based or conceptual. In distance-based clustering the similarity criterion is based on distance. Objects belong to the same cluster if they are close according to a given distance. Conceptual clustering defines a concept common to all the objects in the cluster. In this case, objects are clustered based on their fitness to some descriptive concepts, and not according to distance or similarity measure. The extension of the usage of the common symbol table is employed in this paper to the clustering of biological sequences. The method does not depend on concept as does conceptual clustering. It does not also use distance measure, rather it uses data structures (hash table or list) and detect the occurrence of codons by way of comparing sequence to sequence (pattern-element-wise) using the codon-based scoring method. The results obtained indicate the usefulness of the symbol table in biological sequence clustering.