Bibudh Lahiri ~~ projects

3125 Coover Hall
Department of Electrical & Computer Engineering
Iowa State University, Ames, Iowa

(515) 451-0307 (M)
(515) 294-6503 (O)



Network Monitoring Detecting exploit patterns from network flow streams: Network-based Intrusion Detection Systems (NIDS) try to detect malicious activities by monitoring network traffic. Research on network traffic measurement has identified various patterns that the typical exploits on today's Internet exhibit. The goal of our research is to devise single-pass (online) data stream algorithms for detecting these patterns from network traffic flow data, using a workspace that is much smaller than the size of the traffic flow. We aim to design algorithms with a provable guarantee on the space and time requirements and the degree of approximation in the estimates returned. A detailed proposal of the work can be found here.

The following projects are all parts of this work:

  Payload-based Anomaly Detection for Network Packet Streams (Spring 2009-current, with Yang Liu, Yong Guan and Srikanta Tirthapura): We address the problem of detecting anomalous behavior from payloads of network packet streams, using small-space approximation algorithms. An N-gram is a sequence of N adjacent bytes in a payload unit. Since there can be 256N different values of N-grams, any anomaly detection strategy that is based on keeping track of the exact frequencies of the N-grams, may need exponential space. Our goal is to develop a small-space sketch that can "learn" the normal traffic profile for a site/network from a training dataset, and then to use the sketch for detecting anomalous payloads from the test dataset. We are using random projection methods for buidling the sketches and non-adaptive combinatorial group testing techniques for comparing them.

  Identifying Heavy-Hitters from Data Streams with Time-Decay (Summer 2009): We propose small-space, one-pass approximation algorithms for identifying heavy-hitters from a data stream, where the weight of an item reduces with time according to some decay function. We consider the exponential, the monomial and the sliding window decay functions here. For exponential and monomial decay, we adopt the Misra-Gries sketch, and our algorithms take O(cmax/&epsilon) space to produce an &epsilon-approximate solution, where cmax is an apriori upper bound on the initial weight any element in the stream appears with. For small (O(1)) values of cmax, this is an improvement by a factor of &Omega(log m/&epsilon) over the space-complexity of the algorithm by Cormode et al. We also extend the count-min sketch to identify the &epsilon-approximate heavy-hitters over a sliding window on a cash-register stream.

  A Generic Framework for Detecting Top-k Items from a Stream (Summer 2009): We propose a generic framework for identifying the k most frequent items from a network data stream in a single pass, using a workspace much smaller than the stream itself. We observed that any small-space sketch that approximately maintains the frequencies of the items appearing in a stream can be utilized to extract the top-k items. We show theoretically how the Misra-Gries sketch and the count-min sketch can fit into this framework, analyze the space-complexities of both, and present experimental results on packet traces from a backbone network link. With the realistic assumption that the frequencies of the items follow a Zipfian distribution, we fully present an algorithm, based on the Misra-Gries sketch, that takes only O(kO(1)/&epsilon) space and guarantees that the solution provided is &epsilon-approximate. Under identical input distribution, the space-requirement of the top-k algorithm provided by Charikar et al is logarithmic in the size of the stream, so our algorithm achieves significant space-reduction; and does not demand any apriori knowledge from the user about the size of the stream. Here are the C++-based simulations of a naive and our small-space algorithm.

  Finding Correlated Heavy-Hitters over Data Streams (Fall 2008-Summer 2009, with Srikanta Tirthapura): We consider online mining of correlated heavy-hitters (CHH) from network data streams. Given a multi-dimensional dataset, a correlated aggregate query first filters a subset by applying a predicate along a primary dimension, and then computes aggregates along some secondary dimension of that subset data. Previous work has focused on computing the correlated sum (or count) where the predicate for the primary dimension involved a basic aggregate that can be maintained exactly in small space (e.g., minimum, maximum or average). To our knowledge, our work is the first to address queries of the following form: "On a stream of (x, y) tuples, maintain the y-values that occur frequently alongwith the x-values that appear frequently in the stream". Previous techniques for correlated aggregates cannot handle queries of this form, since even for a one-dimensional stream, heavy-hitters cannot be maintained exactly using small space.

Our online data stream algorithm is easy to implement and uses workspace which is orders of magnitude smaller than the stream itself. We present provable guarantees on the maximum error estimates, as well as experimental results, that demonstrate the space-accuracy trade-off, on more than one billion packet headers from a backbone network link. The theoretical and experimental results are in this paper that got accepted in IPCCC 2009. Here are the C++-based simulations of a naive and our small-space algorithm.

Distributed Data Mining Computing Frequent Elements using Gossip (Fall 2007-Spring 2009, with Srikanta Tirthapura): Due to the large scale of P2P and sensor networks, the values of aggregate functions over the data in the whole network are often more important than individual data at nodes. For example, in a P2P file-sharing network (like Gnutella or Napster), an administrator may be typically interested in identifying the most frequently accessed items over the whole network. Any reliance on central coordination limits the system’s scalability, therefore gossip-based protocols are emerging as an important communication paradigm. In gossip-based protocols, each node contacts one or a few nodes in each round (usually chosen at random), and exchanges information with these nodes. This mechanism leads to high fault-tolerance and self-stabilization. We designed efficient randomized distributed algorithms for identifying the frequently occurring data elements with probabilistic guarantees on accuracy, where the nodes exchange small-sized "sketches" or synopses of their individual data sets through gossip algorithms. The theoretical results are in this paper that we published in SIROCCO 2008. A Java-based simulation of the algorithm can be downloaded from here.

Sensor Networks Development of Distributed, Lightweight, Secure Hierarchical Directories for Tracking Mobile Objects in Sensor Networks (Fall 2006-Summer 2007, with Srikanta Tirthapura and Bojian Xu): The MTS300 Sensor BoardThe MICA2 mote Most practical techniques for locating remote objects in a distributed system suffer from problems of scalability and locality of reference. We are implementing the Arrow distributed directory protocol, a scalable and local mechanism for ensuring mutually exclusive access to mobile objects. This directory has communication complexity optimal within a factor of (1+MST-stretch(G))/2, where MST-stretch(G) is the minimum spanning tree stretch of the underlying network. In Arrow protocol, local change in the object’s position does not result in a global change in the network. This has been deployed on a fixed spanning-tree-based network of MICA2 motes, where the presence of the object is detected by measuring the amount of ambient light using MTS300 photo sensors, and the node detecting the object transmits this information to its neighbours so that the pointers are updated appropriately.

Business Applications

Trailblazer (Acuity) (January 2006-July 2006): This web-based application developed for McGraw-Hill Digital Learning enhanced the functionality of an existing application named TrailBlazer with creating test, exercise and assignments for students; scoring the students' response and generating predictive, diagnostic and summary reports. The system also monitors the structure of state curriculum and tracks the progress of a student in class.

Roles played:

  • Interacted with the onsite team in requirement collection
  • Developed the class and sequence diagrams using UML tool Enterprise Architect 6.0
  • Developed the application on Maverick framework using Eclipse 3.1

DADS (DUoS and Associated Distribution Services) (November 2002-December 2005): The goal of the DADS program was to meet the regulatory requirement to separate electricity distribution and supply businesses of United Utilities and develop an IT solution that will cater to the pure distribution business. The application developed by TCS provided a common web-based interface across all the distribution functions and it replaced a number of different legacy applications that were used for distribution billing previously.

Roles played:

  • Interacted with the onsite team in requirement collection
  • Developed prototypes using Dreamweaver MX 6.0, prepared specification documents for screens, batches and reports and Entity Relationship Diagram using System Architect
  • Developed web components for the Payment Management and Revenue Protection Service modules based on BC4J (Business Component for Java) framework using Oracle 9i JDeveloper 9.0.2
  • Developed Oracle-based reports using Oracle 9i Reports Developer
  • Developed batches and procedures on Oracle 9i database
  • Joined the project as a developer and later took the responsibility as a process owner of Payment Management and Revenue Protection Service modules
  • Involved in User Acceptance Test support for the Payment Management and Revenue Protection Service modules at the client site
  • Involved in development and timely execution of PL/SQL procedures on Oracle 9i for migrating data from legacy applications to DADS


Bibudh Lahiri
Department of Electrical & Computer Engineering
Iowa State University
3125 Coover Hall, Ames, Iowa