IDA@SMU Banner

EMM Sequence Analysis (EMMSA)

The Extensible Markov Model (EMM, see project EMM) is a spatiotemporal modeling tool that has been successfully applied to many domains and applications (flood prediction, anomaly detection in VoIP traffic, anomaly detection to automobile traffic, intrusion detection, miRNA prediction). The objective of the EMMSA project is the creation of an online tool to facilitate the use of EMM in analyzing DNA/RNA sequences. Without first performing an expensive alignment effort, an EMM models the sequence of subpatterns (of any length) within a longer nucleotide sequence. By creating EMMs to model the normal behavior of a class of sequences, new unknown sequences can be evaluated as to membership in this class using a metaclassification approach, MCM.

The approach studied in this project is now used as the basis for QuasiAlign: Position Sensitive P-Mer Frequency Clustering.

Team

R. Kotamarti, M. Dunham, M. Hahsler

Developed Software

Publications

  1. R.M. Kotamarti. "Quasi alignment methods for molecular sequence analysis," PhD Thesis, SMU, 2010.
  2. R.M. Kotamarti, M. Hahsler, D.W. Raiford, M. McGee and M.H. Dunham. "Analyzing Classification Using Extensible Markov Models," Bioinformatics, 26(18):2235-2241, 2010.
  3. R.M. Kotamarti, M. Hahsler, D.W. Raiford and M.H. Dunham. "Sequence Transformation to a Complex Signature Form for Consistent Phylogetic Tree Using Extensible Markov Model," Proceedings of the 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2010.
  4. R.M. Kotamarti, and M.H. Dunham. Alignment-free Sequence Analysis with Extensible Markov Model. 9th International Workshop on Data Mining in Bioinformatics (BIOKDD'10), 2010
  5. R.M. Kotamarti, D.W. Raiford, M.L. Raymer and M.H. Dunham. "A Data Mining Approach to Predicting Phylum for Microbial Organisms Using Genome-Wide Sequence Data," 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering.
  6. R.M. Kotamarti, D.W. Raiford, M. Hahsler, and Y. Wang, M. McGee, and M.H. Dunham. "Targeted Genomic signature profiling with Quasi-alignment statistics," COBRA preprint series, 2009.

Acknowledgement of Support

NSF This research is partially funded by a research grant by T-System, the National Science Foundation under Grant No. IIS-0948893 and by the UTSW Quantitative Biomedical Research Initiative.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the supporting organizations.

IDA Images