public class CountSketch<T> extends BaseFrequency<T>
Implementation of the CountSketch algorithm from the paper 'Finding frequent items in data streams' written by 'Charikar, M., Chen, K., and Farach-colton, M. (2002)'.
| Modifier and Type | Field and Description | 
|---|---|
| protected int[][] | data
 Data structure to estimate the frequency. | 
| protected long | elementsCountedTotal number of occurences of all elements counted so far. | 
| protected Class<? extends HashFunction<?>> | functionClassThe hash function which will be used. | 
| protected List<HashFunction<T>> | h
 Hashfunctions which determine the bucket to add the
 s-hashfunctions. | 
| protected int | k
 As CountSketch also provides top-k estimation a parameter k can be provided. | 
| protected List<HashFunction<T>> | sHashfunctions which determine the value which will be added. | 
| protected Map<T,CountEntry<T>> | topItems
 Map for the top-k-algorithm. | 
defaultMinSupport| Constructor and Description | 
|---|
| CountSketch(int domain,
           int numberOfHashFunctions,
           int numberOfBuckets,
           int k)
 Constructor of the CountSketch algorithm. | 
| Modifier and Type | Method and Description | 
|---|---|
| boolean | add(T item,
   long incrementCount)
 Counts the item by passing it to the internal
 data strucutre. | 
| boolean | contains(T item)Tells whether the data structure contains the given item or not. | 
| long | estimateCount(T item)Estimate the frequency for the given item. | 
| long | estimateFrequency(T item)
 Estimates the frequency of the provided item. | 
| List<CountEntry<T>> | getFrequentItems(double minSupport) | 
| Collection<T> | getTopK()
 Returns the top-k items which were
 counted so far. | 
| protected void | initializeHashes(int domain,
                int nrOfHashFunctions,
                int nrOfbuckets,
                HashFunctionFactory<T> factory)
 Initialize all necessary hash functions. | 
| Set<T> | keySet() | 
| long | size() | 
| protected boolean | updateData(T item)
 Updates the data structure with the given item. | 
add, getDefaultMinSupport, getFrequentItems, peek, peek, setDefaultMinSupportprotected long elementsCounted
protected int[][] data
Data structure to estimate the frequency.
protected int k
As CountSketch also provides top-k estimation a parameter k can be provided. If <= 0 top-k overhead is disabled.
protected Map<T,CountEntry<T>> topItems
Map for the top-k-algorithm.
protected Class<? extends HashFunction<?>> functionClass
The hash function which will be used.
protected List<HashFunction<T>> h
Hashfunctions which determine the bucket to add the s-hashfunctions.
protected List<HashFunction<T>> s
public CountSketch(int domain,
           int numberOfHashFunctions,
           int numberOfBuckets,
           int k)
Constructor of the CountSketch algorithm. This construction can take quite a long time since the construction of the hashfunctions is rather time-consuming.
domain - The (estim.) domain, i.e. how many different items are expectednumberOfHashFunctions - The number of hashfunctions which determine a bucketnumberOfBuckets - The number of buckets where a counter will be maintainedk - parameter for the top-k variant. If you want to disable
 the top-k overhead, than set k to 0 or lowerprotected void initializeHashes(int domain,
                    int nrOfHashFunctions,
                    int nrOfbuckets,
                    HashFunctionFactory<T> factory)
Initialize all necessary hash functions.
domain - the (estim.) domain, i.e. how many different items are expectednrOfHashFunctions - the number of hashfunctions which determine a bucketnrOfbuckets - the number of buckets where a counter will be maintainedpublic boolean add(T item, long incrementCount)
Counts the item by passing it to the internal data strucutre.
If a k greater than zero was set, the top-k map will be maintained also.
item - The item to countincrementCount - public long estimateCount(T item)
ISimpleFrequencyitem - The item that will have the frequency estimatedpublic long size()
public List<CountEntry<T>> getFrequentItems(double minSupport)
public Collection<T> getTopK()
Returns the top-k items which were counted so far.
If k was set to 0 or lower, an empty Collection will be returned.
public long estimateFrequency(T item)
Estimates the frequency of the provided item.
item - the item which frequency shall be estimatedpublic boolean contains(T item)
IRichFrequencyitem - The item to be verifiedprotected boolean updateData(T item)
Updates the data structure with the given item.
item - the item to insert into the data structureCopyright © 2014. All rights reserved.