## Extensible Markov Model (EMM)

Extensible Markov Model (EMM) is essentially a time varying Markov Chain. Nodes in the chain actually represent clusters of real world states (as opposed to the states themselves). It has the advantage of learning and adjusting its structure (number of states) as well as state transition probabilities based on the input data seen. In addition, learning continues as new data arrives even during the application phase.

The EMM is a very powerful modeling tool. Applications initially examined using the EMM approach include prediction of river flow rate/water level, prediction of traffic volumes for both networks and roadways, identification of rare events in roadways, and identification of rare events for network traffic.

The size of EMM grows at a sublinear rate being able to take advantage of the clustering aspect of nodes. The degree of clustering (and thus the EMM size) depends on the clustering technique, as well as the dataset. Prediction accuracy is good, and at least as good as one available neural network approach specifically designed for the dataset studied.

## Team

M. Dunham, M. Hahsler, C. Isaksson

## Develped Software

Our current EMM implementations are in R and Java.

- R implementation rEMM
- R interface to JEMM called RJEMM
- Java implementation JEMM
- Matlab implementation: code and documentation (currently not maintained)

## Publications

- Hahsler M, Dunham HM (2010). "rEMM: Extensible Markov Model for Data Stream Clustering in R." Michael Hahsler and Margaret H. Dunham, rEMM: Extensible Markov Model for Data Stream Clustering in R, Journal of Statistical Software, 35(5):1-31, 2010.
- Yu Meng and Margaret H. Dunham, "Efficient Mining of Emerging Events in a Dynamic Spatiotemporal," Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)
- Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, "Rare Event Detection in a Spatiotemporal Environment," Proceedings of the IEEE Conference on Granular Computing, May 2006
- Yu Meng and Margaret H. Dunham, "Online Mining of Risk Level of Traffic Anomalies with User's Feedbacks," Proceedings of the IEEE Conference on Granular Computing, May 2006.
- Yu Meng and Margaret H. Dunham, "Mining Developing Trends of Dynamic Spatiotemporal Data Streams," Journal of Computers, Vol 1, No 3, June 2006, pp 43-50.
- Isaksson C, Meng Y, Dunham MH (2006). "Risk Leveling of Network Traffic Anomalies." International Journal of Computer Science and Network Security, 6(6), 258-265.
- Lin Lu, Margaret H. Dunham, and Yu Meng, "Discovery Significant Usage Patterns from Clusters of Clickstream Data," Proceedings of the Workshop on Knowledge discovery in the Web, August 2005.
- Jie Huang, Yu Meng, and Margaret H. Dunham, "Extensible Markov Model," Proceedings IEEE ICDM Conference, November 2004, pp 371-374.