In data stream applications, a good approximation obtained in a timely manner is often better than the exact answer that’s delayed beyond the window of opportunity. Of course, the quality of the approximate is as important as its timely delivery. Unfortunately, algorithms capable of online processing do not conform strictly to a precise error guarantee. Since online processing is essential and so is the precision of the error, it is necessary that stream algorithms meet both criteria. Yet, this is not the case for mining frequent sets in data streams. We present EStream, a novel algorithm that allows online processing while producing results strictly within the error bound. Our theoretical and experimental results show that EStream is a better candidate for finding frequent sets in data streams, when both constraints need to be satisfied.
Book Title: Data Warehousing and Knowledge Discovery