IDA@SMU Banner

TRACDS: Temporal Relationships Among Clusters for Massive Data Streams

State-of-the-art data stream clustering algorithms developed by the data mining community do not utilize the temporal order of events and therefore in the resulting clustering all temporal information is lost. This is quite strange as one of the salient features of data streams is temporal ordering of events. In this project we develop a technique to efficiently incorporate temporal ordering into the clustering process and prove its usefulness on large, high-throughput data streams. Temporal ordering is introduced into the data stream clustering process by dynamically constructing an evolving Markov Chain where the states represent clusters. Our approach is based on the previously developed Extensible Markov Model (EMM). The results of this project will provide a framework upon which important stream mining applications such as anomaly detection and prediction of future events are easily implemented.

Broader Impact. By showing that state-of-the-art data steam clustering algorithms can incorporate temporal order information efficiently, this project will have a broad impact on many areas where temporal order is essential. As examples, NOAA Hurricane Data and NASA satellite data will be used throughout this project.

This research was supported from 2009 to 2013 in part by the National Science Foundation.

Team

Matt Bolanos, Sudheer Chelluboina, Margaret H. Dunham (Co-PI), John Forrest, Michael Hahsler (Co-PI), Vladimir Jovanovic, Hadil Shaiba, Yu Su

Developed Software

Activities

Media

Publications

  1. Hadil Shaiba and Michael Hahsler. Intensity prediction model for tropical cyclone rapid intensification events. In Proceedings of the IADIS Applied Computing 2013 (AC 2013) Conference, Forth Worth, TX, October 2013.
  2. Matthew Bolanos, John Forrest, and Michael Hahsler. Clustering large datasets using data stream clustering techniques. In Proceedings of the 36th Annual Conference of the Gesellschaft für Klassifikation e.V., Hildesheim, August 1-3, 2012, Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, 2013.
  3. Anurag Nagar and Michael Hahsler. Using text and data mining techniques to extract stock market sentiment from live news streams. In 2012 International Conference on Computer Technology and Science (ICCTS 2012), August 2012.
  4. Charlie Isaksson, Margaret H. Dunham, and Michael Hahsler. SOStream: Self organizing density-based clustering over data stream. In International Conference on Machine Learning and Data Mining (MLDM'2012). Springer, July 2012.
  5. Vladimir Jovanovic, Margaret H. Dunham, Michael Hahsler, and Yu Su. Evaluating hurricane intensity prediction techniques in real time. In Third IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Proceedings of the of the 2011 IEEE International Conference on Data Mining Workshops (ICDMW 2011). IEEE, 2011.
  6. John Forrest. Stream: A Framework for Data Stream Modeling in R. Bachelor Thesis, Department of Computer Science and Engineering, SMU, 2011.
  7. Michael Hahsler and Margaret H. Dunham. Temporal structure learning for clustering massive data streams in real-time. In SIAM Conference on Data Mining (SDM11). SIAM, 2011.
  8. Yu Su, Sudheer Chelluboina, Michael Hahsler, and Margaret H Dunham, A New Data Mining Model for Hurricane Intensity Prediction, 2nd IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Proceedings of the 2010 IEEE International Conference on Data Mining Workshops (ICDMW 2010). IEEE, 2010
  9. Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou. Novel data stream pattern mining, Report on the StreamKDD’10 Workshop. SIGKDD Explorations, 12(2):54-55, 2010.
  10. Michael Hahsler and Margaret H. Dunham, rEMM: Extensible Markov Model for Data Stream Clustering in R, Journal of Statistical Software, 35(5):1-31, 2010.
  11. Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou, editors. Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques (StreamKDD'10). ACM Press, New York, NY, USA, 2010

Acknowledgement of Support

NSF This research was supported 2009-2013 by the National Science Foundation under Grant No. IIS-0948893.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

IDA Images