Review on the Research Progress of Mining of OMIM Data_Journal of Biomedical Engineering

Authors：

LIJianhua ¹ , LIZheren ¹ , KANGYan ¹ ,  LILing ^1,2

1. Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang 110819, China;
2. State Key Laboratory of Biotherapy, Sichuan University, Chengdu 610041, China;

Corresponding?author：

LILing, Email: jliling2000@gmail.com

Keywords：

phenotype-genotype correlation; text mining; similarity comparison; candidate gene; molecular pathway

DOI：

10.7507/1001-5515.20140265

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

Online Mendelian Inheritance in Man (OMIM) is a knowledge source and data base for human genetic diseases and related genes. Each OMIM entry includes clinical synopsis, linkage analysis for candidate genes, chromosomal localization and animal models, which has become an authoritative source of information for the study of the relationship between genes and diseases. As overlap of disease symptoms may reflect interactions at the molecular level, comparison of phenotypic similarity may indicate candidate genes and help to discover functional connections between genes and proteins. However, the OMIM has used free text to describe disease phenotypes, which does not suit computer analysis. Standardization of OMIM data therefore has important implications for large-scale comparison of disease phenotypes and prediction of phenotype-genotype correlations. Recently, standard medical language systems, term frequency-inverse document frequency and the law of cosines for document classification have been introduced for mining of OMIM data. Combined with Gene Ontology and various comparison methods, this has achieved substantial successes. In this article, we have reviewed various methods for standardization and similarity comparison of OMIM data. We also predicted the trend for research in this direction.

Citation： LIJianhua, LIZheren, KANGYan, LILing. Review on the Research Progress of Mining of OMIM Data. Journal of Biomedical Engineering, 2014, 31(6): 1400-1404. doi: 10.7507/1001-5515.20140265 Copy

1.	MCKUSICK V A. Mendelian inheritance in man and its online version, OMIM[J]. Am J Hum Genet, 2007, 80(4):588-604.
2.	VAN DRIEL M A, BRUGGEMAN J, VRIEND G, et al. A text-mining analysis of the human phenome[J]. Eur J Hum Genet, 2006, 14(5):535-542.
3.	ZHANG S H, WU C, LI X, et al. From phenotype to gene:detecting disease-specific gene functional modules via a text-based human disease phenotype network construction[J]. FEBS Lett, 2010, 584(16):3635-3643.
4.	ROBINSON P N, MUNDLOS S. The human phenotype ontology[J]. Clin Genet, 2010, 77(6):525-534.
5.	BECK T, FREE R C, THORISSON G A, et al. Semantically enabling a genome-wide association study database[J]. J Biomed Semantics, 2012, 3(1):9.
6.	COHEN R, GEFEN A, ELHADAD M, et al. CSI-OMIM——Clinical synopsis search in OMIM[J]. BMC Bioinformatics, 2011, 12:65.
7.	OELLRICH A, GKOUTOS G V, HOEHNDORF R, et al. Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology[J]. J Biomed Semantics, 2012, 3(Suppl 2):S1.
8.	OELLRICH A, HOEHNDORF R, GKOUTOS G V, et al. Improving disease gene prioritization by comparing the semantic similarity of phenotypes in mice with those of human diseases[J]. PLoS One, 2012, 7(6):e38937.
9.	GROZA T, HUNTER J, ZANKL A. Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods[J]. BMC Bioinformatics, 2012, 13:265.
10.	K?HLER S, DOELKEN S C, RATH A, et al. Ontological phenotype standards for neurogenetics[J]. Hum Mutat, 2012, 33(9):1333-1339.
11.	HWANG T, ATLURI G, XIE M Q, et al. Co-clustering phenome-genome for phenotype classification and disease gene discovery[J]. Nucleic Acids Res, 2012, 40(19):e146.
12.	HOEHNDORF R, SCHOFIELD P N, GKOUTOS G V. PhenomeNET:a whole-phenome approach to disease gene discovery[J]. Nucleic Acids Res, 2011, 39(18):e119.
13.	ZHANG S Z, CHANG Z Q, LI Z Q, et al. Calculating phenotypic similarity between genes using hierarchical structure data based on semantic similarity[J]. Gene, 2012, 497(1):58-65.
14.	OTI M, HUYNEN M A, BRUNNER H G. The biological coherence of human phenome databases[J]. Am J Hum Genet, 2009, 85(6):801-808.
15.	GEFEN A, COHEN R, BIRK O S. Syndrome to gene (S2G):in-silico identification of candidate genes for human diseases[J]. Hum Mutat, 2010, 31(3):229-236.
16.	PATHAK J, KIEFER R C, FREIMUTH R R, et al. Validation and discovery of genotype-phenotype associations in chronic diseases using linked data[J]. Stud Health Technol Inform, 2012, 180:549-553.
17.	王志剛,謝麗芳,陳鑫,等.基于語義的疾病表型相似性[J].生物信息學,2012,10(3):154-157.
18.	CHEN H L, ZHANG Z P. Prediction of associations between OMIM diseases and microRNAs by random walk on OMIM disease similarity network[J]. The Scientific World Journal, 2013, 2013:Article ID 204658.
19.	CHEN C M, CHEN C C, SHIH T H, et al. Efficient algorithms for identifying orthologous simple sequence repeats of disease genes[J]. J Syst Sci Complex, 2010, 23(5):906-916.
20.	WU X B, LIU Q F, JIANG R. Align human interactome with phenome to identify causative genes and networks underlying disease families[J]. Bioinformatics, 2009, 25(1):98-104.
21.	VANUNU O, MAGGER O, RUPPIN E, et al. Associating genes and protein complexes with disease via network propagation[J]. PLoS Comput Biol, 2010, 6(1):e1000641.
22.	ERTEN S, BEBEK G, EWING R M, et al. DADA:Degree-aware algorithms for network-based disease gene prioritization[J]. BioData Min, 2011, 4:19.
23.	LI Y J, PATRA J C. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network[J]. Bioinformatics, 2010, 26(9):1219-1224.
24.	ERTEN S, BEBEK G, KOYUT?RK M. Vavien:an algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks[J]. J Comput Biol, 2011, 18(11):1561-1574.
25.	NAKAZATO T, BONO H, MATSUDA H, et al. Gendoo:functional profiling of gene and disease features using MeSH vocabulary[J]. Nucleic Acids Res, 2009, 37(Suppl 2):W166-W169.

1. MCKUSICK V A. Mendelian inheritance in man and its online version, OMIM[J]. Am J Hum Genet, 2007, 80(4):588-604.
2. VAN DRIEL M A, BRUGGEMAN J, VRIEND G, et al. A text-mining analysis of the human phenome[J]. Eur J Hum Genet, 2006, 14(5):535-542.
3. ZHANG S H, WU C, LI X, et al. From phenotype to gene:detecting disease-specific gene functional modules via a text-based human disease phenotype network construction[J]. FEBS Lett, 2010, 584(16):3635-3643.
4. ROBINSON P N, MUNDLOS S. The human phenotype ontology[J]. Clin Genet, 2010, 77(6):525-534.
5. BECK T, FREE R C, THORISSON G A, et al. Semantically enabling a genome-wide association study database[J]. J Biomed Semantics, 2012, 3(1):9.
6. COHEN R, GEFEN A, ELHADAD M, et al. CSI-OMIM——Clinical synopsis search in OMIM[J]. BMC Bioinformatics, 2011, 12:65.
7. OELLRICH A, GKOUTOS G V, HOEHNDORF R, et al. Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology[J]. J Biomed Semantics, 2012, 3(Suppl 2):S1.
8. OELLRICH A, HOEHNDORF R, GKOUTOS G V, et al. Improving disease gene prioritization by comparing the semantic similarity of phenotypes in mice with those of human diseases[J]. PLoS One, 2012, 7(6):e38937.
9. GROZA T, HUNTER J, ZANKL A. Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods[J]. BMC Bioinformatics, 2012, 13:265.
10. K?HLER S, DOELKEN S C, RATH A, et al. Ontological phenotype standards for neurogenetics[J]. Hum Mutat, 2012, 33(9):1333-1339.
11. HWANG T, ATLURI G, XIE M Q, et al. Co-clustering phenome-genome for phenotype classification and disease gene discovery[J]. Nucleic Acids Res, 2012, 40(19):e146.
12. HOEHNDORF R, SCHOFIELD P N, GKOUTOS G V. PhenomeNET:a whole-phenome approach to disease gene discovery[J]. Nucleic Acids Res, 2011, 39(18):e119.
13. ZHANG S Z, CHANG Z Q, LI Z Q, et al. Calculating phenotypic similarity between genes using hierarchical structure data based on semantic similarity[J]. Gene, 2012, 497(1):58-65.
14. OTI M, HUYNEN M A, BRUNNER H G. The biological coherence of human phenome databases[J]. Am J Hum Genet, 2009, 85(6):801-808.
15. GEFEN A, COHEN R, BIRK O S. Syndrome to gene (S2G):in-silico identification of candidate genes for human diseases[J]. Hum Mutat, 2010, 31(3):229-236.
16. PATHAK J, KIEFER R C, FREIMUTH R R, et al. Validation and discovery of genotype-phenotype associations in chronic diseases using linked data[J]. Stud Health Technol Inform, 2012, 180:549-553.
17. 王志剛,謝麗芳,陳鑫,等.基于語義的疾病表型相似性[J].生物信息學,2012,10(3):154-157.
18. CHEN H L, ZHANG Z P. Prediction of associations between OMIM diseases and microRNAs by random walk on OMIM disease similarity network[J]. The Scientific World Journal, 2013, 2013:Article ID 204658.
19. CHEN C M, CHEN C C, SHIH T H, et al. Efficient algorithms for identifying orthologous simple sequence repeats of disease genes[J]. J Syst Sci Complex, 2010, 23(5):906-916.
20. WU X B, LIU Q F, JIANG R. Align human interactome with phenome to identify causative genes and networks underlying disease families[J]. Bioinformatics, 2009, 25(1):98-104.
21. VANUNU O, MAGGER O, RUPPIN E, et al. Associating genes and protein complexes with disease via network propagation[J]. PLoS Comput Biol, 2010, 6(1):e1000641.
22. ERTEN S, BEBEK G, EWING R M, et al. DADA:Degree-aware algorithms for network-based disease gene prioritization[J]. BioData Min, 2011, 4:19.
23. LI Y J, PATRA J C. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network[J]. Bioinformatics, 2010, 26(9):1219-1224.
24. ERTEN S, BEBEK G, KOYUT?RK M. Vavien:an algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks[J]. J Comput Biol, 2011, 18(11):1561-1574.
25. NAKAZATO T, BONO H, MATSUDA H, et al. Gendoo:functional profiling of gene and disease features using MeSH vocabulary[J]. Nucleic Acids Res, 2009, 37(Suppl 2):W166-W169.

Journal of Biomedical Engineering

Review on the Research Progress of Mining of OMIM Data

Abstract Full text Figures/Tables Video References Cited by

Previous Article

Next Article

Format

Content