mirror of
https://github.com/trekhleb/javascript-algorithms.git
synced 2025-07-06 17:44:08 +08:00
Update bloom filter README.
This commit is contained in:
@ -1,34 +1,61 @@
|
||||
# Bloom Filter
|
||||
|
||||
A bloom filter is a data structure designed to
|
||||
test whether an element is present in a set. It
|
||||
is designed to be blazingly fast and use minimal
|
||||
memory at the cost of potential false positives.
|
||||
A bloom filter is a space-efficient probabilistic
|
||||
data structure designed to test whether an element
|
||||
is present in a set. It is designed to be blazingly
|
||||
fast and use minimal memory at the cost of potential
|
||||
false positives. False positive matches are possible,
|
||||
but false negatives are not – in other words, a query
|
||||
returns either "possibly in set" or "definitely not in set".
|
||||
|
||||
Bloom proposed the technique for applications where the
|
||||
amount of source data would require an impractically large
|
||||
amount of memory if "conventional" error-free hashing
|
||||
techniques were applied.
|
||||
|
||||
## Algorithm description
|
||||
|
||||
An empty Bloom filter is a bit array of `m` bits, all
|
||||
set to `0`. There must also be `k` different hash functions
|
||||
defined, each of which maps or hashes some set element to
|
||||
one of the `m` array positions, generating a uniform random
|
||||
distribution. Typically, `k` is a constant, much smaller
|
||||
than `m`, which is proportional to the number of elements
|
||||
to be added; the precise choice of `k` and the constant of
|
||||
proportionality of `m` are determined by the intended
|
||||
false positive rate of the filter.
|
||||
|
||||
Here is an example of a Bloom filter, representing the
|
||||
set `{x, y, z}`. The colored arrows show the positions
|
||||
in the bit array that each set element is mapped to. The
|
||||
element `w` is not in the set `{x, y, z}`, because it
|
||||
hashes to one bit-array position containing `0`. For
|
||||
this figure, `m = 18` and `k = 3`.
|
||||
|
||||

|
||||
|
||||
## Operations
|
||||
|
||||
There are two main operations a bloom filter can
|
||||
perform: insertion and search. Search may result in
|
||||
perform: _insertion_ and _search_. Search may result in
|
||||
false positives. Deletion is not possible.
|
||||
|
||||
In other words, the filter can take in items. When
|
||||
we go to check if an item has previously been
|
||||
inserted, it can tell us either "no" or "maybe".
|
||||
|
||||
Both insertion and search are O(1) operations.
|
||||
Both insertion and search are `O(1)` operations.
|
||||
|
||||
## Making the filter
|
||||
|
||||
A bloom filter is created by allotting a certain size.
|
||||
In our example, we use 100 as a default length. All
|
||||
In our example, we use `100` as a default length. All
|
||||
locations are initialized to `false`.
|
||||
|
||||
### Insertion
|
||||
|
||||
During insertion, a number of hash functions,
|
||||
in our case 3 hash functions, are used to create
|
||||
in our case `3` hash functions, are used to create
|
||||
hashes of the input. These hash functions output
|
||||
indexes. At every index received, we simply change
|
||||
the value in our bloom filter to `true`.
|
||||
@ -65,13 +92,13 @@ The formula to calculate probablity of a false positive is:
|
||||
|
||||
( 1 - e <sup>-kn/m</sup> ) <sup>k</sup>
|
||||
|
||||
k = # hash functions
|
||||
`k` = number of hash functions
|
||||
|
||||
m = size
|
||||
`m` = filter size
|
||||
|
||||
n = # items inserted
|
||||
`n` = number of items inserted
|
||||
|
||||
These variables, k, m, and n, should be picked based
|
||||
These variables, `k`, `m`, and `n`, should be picked based
|
||||
on how acceptable false positives are. If the values
|
||||
are picked and the resulting probability is too high,
|
||||
the values should be tweaked and the probability
|
||||
@ -92,9 +119,6 @@ but the cost is acceptable. It's ok if a user never sees
|
||||
a few articles as long as they have other, brand new ones
|
||||
to see every time they visit the site.
|
||||
|
||||
The popular blog site Medium does a version of this.
|
||||
Feel free to read [their article](https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff).
|
||||
|
||||
## References
|
||||
|
||||
- [Wikipedia](https://en.wikipedia.org/wiki/Bloom_filter)
|
||||
|
Reference in New Issue
Block a user