Zhu and Shasha [1] addressed the problem of elastic burst detection. While they acknowledged that finding the thresholds for labelling a window as "bursty" is part of the problem, they associated a unique threshold with each possible window size. We make no such assumption, and formulate the problem as a search problem in a two-dimensional space. Our problem formulation, as well as approach to the solution, are significantly different from [2] also, which introduced a hierarchical model to address the problem of burst detection, since a long burst might contain several smaller bursts within itself.
Analysis of CT Machine Logs for Predictive Maintenance (Summer 2010, with Dmitriy Fradkin and Fabian Moerchen): The X-ray tube is one of the most important and expensive components of the Computed Tomography (CT) machines. The tubes have to be replaced and serviced regularly for routine maintenance. Predictive maintenance of the CT machines becomes easier if it can be predicted when the tube should be replaced next time. Since we only knew when in the past tube replacements had taken place, we had to rely on model-based anomaly detection techniques. We trained a multidimensional Gaussian Mixture Model using data (on temperature, current, voltage etc) from time-windows much before the replacement, and then derived the likelihood values of the data points based on these models as the dates approached the replacement date. For a significant number of machines, we noticed a steady decline in the likelihood values as we approached the replacement dates.
Design and Implementation of a Text Parser for Processing Maintenance History of CT Machines (Summer 2010, with Dmitriy Fradkin and Fabian Moerchen): We worked with logs obtained from Computed Tomography (CT) machines where each machine had a number of photomutiplier tubes (PMTs) and two detectors. The service (repair/replacement) history of these PMTs and detectors were available as unstructured, free-format text in the log files. Like the extraction phase of typical ETL systems, the goal was to convert this unstructured textual history into structured data, which gives accurate description of when each PMT was serviced/replaced, and which detector(s) was/were serviced alongwith it. We implemented a text analytic tool using the regular expression API of Java to accomplish this.
[2] "Bursty and Hierarchical Structure in Streams", Jon Kleinberg, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2002