Advanced graph mining methods for protein analysis

Chen, Yi-Ping, Rong, Jia and Li, Gang 2010, Advanced graph mining methods for protein analysis. In Chen, Jake Y. and Lonardi, Stefano (ed), , Chapman & Hall/CRC, Boca Raton, Fla., pp.111-136.

Attached Files
Name Description MIMEType Size Downloads

Title Advanced graph mining methods for protein analysis
Author(s) Chen, Yi-Ping
Rong, Jia
Li, GangORCID iD for Li, Gang
Editor(s) Chen, Jake Y.
Lonardi, Stefano
Publication date 2010
Series Data mining and knowledge discovery series
Chapter number 6
Total chapters 26
Start page 111
End page 136
Total pages 26
Publisher Chapman & Hall/CRC
Place of Publication Boca Raton, Fla.
Summary As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches.

Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed.

In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis.

The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.
ISBN 9781420086843
Language eng
Field of Research 080106 Image Processing
Socio Economic Objective 860801 Human Biological Preventatives (e.g. Vaccines)
HERDC Research category B1 Book chapter
Copyright notice ©2010, Chapman & Hall/CRC
Persistent URL

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 701 Abstract Views, 14 File Downloads  -  Detailed Statistics
Created: Wed, 27 Apr 2011, 12:30:10 EST by Sandra Dunoon

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact