“School of Biological”

Back to Papers Home
Back to Papers of School of Biological

Paper   IPM / Biological / 14429
School of Biological Sciences
  Title:   MGP-HMM: detecting genome-wide CNVs using an HMM for modeling mate pair insertion sizes and read counts
  Author(s): 
1.  Seyed Amir Malekpour
2.  Hamid Pezeshk
3.  Mehdi Sadeghi
  Status:   Published
  Journal: Mathematical Biosciences
  Year:  2016
  Pages:   53-62
  Supported by:  IPM
  Abstract:
Motivation
Association of Copy Number Variation (CNV) with schizophrenia, autism, developmental disabilities and fatal diseases such as cancer is verified. Recent developments in Next Generation Sequencing (NGS) have facilitated the CNV studies. However, many of the current CNV detection tools are not capable of discriminating tandem duplication from non-tandem duplications.
Results
In this study, we propose MGP-HMM as a tool which besides detecting genome-wide deletions discriminates tandem duplications from non-tandem duplications. MGP-HMM takes mate pair abnormalities into account and predicts the digitized number of tandem or non-tandem copies. Abnormalities in the mate pair directions and insertion sizes, after being mapped to the reference genome, are elucidated using a Hidden Markov Model (HMM). For this purpose, a Mixture Gaussian density with time-dependent parameters is applied for emitting mate pair insertion sizes from HMM states.
Indeed, depending on observed abnormalities in mate pair insertion size or its orientation, each component in the mixture density will have different parameters. MGP-HMM also applies a Poisson distribution for modeling read depth data. This parametric modeling of the mate pair reads enables us to estimate the length of CNVs precisely, which is an advantage over methods which rely only on read depth approach for the CNV detection. Hidden state of the proposed HMM is the digitized copy number of a genomic segment and states correspond to the multipliers of the mixture Gaussian components. The accuracy of our model is validated on a set of next generation sequencing real and simulated data and is compared to other tools.

Download TeX format
back to top
scroll left or right