Update bloom filter README.

2025-07-06 17:44:08 +08:00 · 2018-06-30 20:23:02 +03:00
parent 9dbf1c9889
commit b33b1fe1bc
1 changed files with 39 additions and 15 deletions
--- a/src/data-structures/bloom-filter/README.md
+++ b/src/data-structures/bloom-filter/README.md
@ -1,34 +1,61 @@
 # Bloom Filter

-A bloom filter is a data structure designed to
-test whether an element is present in a set. It
-is designed to be blazingly fast and use minimal
-memory at the cost of potential false positives.
+A bloom filter is a space-efficient probabilistic 
+data structure designed to test whether an element 
+is present in a set. It is designed to be blazingly 
+fast and use minimal memory at the cost of potential
+false positives. False positive matches are possible,
+but false negatives are not – in other words, a query
+returns either "possibly in set" or "definitely not in set".
+
+Bloom proposed the technique for applications where the 
+amount of source data would require an impractically large
+amount of memory if "conventional" error-free hashing 
+techniques were applied.
+
+## Algorithm description
+
+An empty Bloom filter is a bit array of `m` bits, all 
+set to `0`. There must also be `k` different hash functions
+defined, each of which maps or hashes some set element to 
+one of the `m` array positions, generating a uniform random 
+distribution. Typically, `k` is a constant, much smaller 
+than `m`, which is proportional to the number of elements 
+to be added; the precise choice of `k` and the constant of 
+proportionality of `m` are determined by the intended 
+false positive rate of the filter.
+
+Here is an example of a Bloom filter, representing the 
+set `{x, y, z}`. The colored arrows show the positions 
+in the bit array that each set element is mapped to. The 
+element `w` is not in the set `{x, y, z}`, because it 
+hashes to one bit-array position containing `0`. For 
+this figure, `m = 18` and `k = 3`.

 ![Bloom Filter](https://upload.wikimedia.org/wikipedia/commons/a/ac/Bloom_filter.svg)

 ## Operations

 There are two main operations a bloom filter can
-perform: insertion and search. Search may result in
+perform: _insertion_ and _search_. Search may result in
 false positives. Deletion is not possible.

 In other words, the filter can take in items. When
 we go to check if an item has previously been
 inserted, it can tell us either "no" or "maybe".

-Both insertion and search are O(1) operations.
+Both insertion and search are `O(1)` operations.

 ## Making the filter

 A bloom filter is created by allotting a certain size.
-In our example, we use 100 as a default length. All
+In our example, we use `100` as a default length. All
 locations are initialized to `false`.

 ### Insertion

 During insertion, a number of hash functions,
-in our case 3 hash functions, are used to create
+in our case `3` hash functions, are used to create
 hashes of the input. These hash functions output
 indexes. At every index received, we simply change
 the value in our bloom filter to `true`.
@ -65,13 +92,13 @@ The formula to calculate probablity of a false positive is:

 ( 1 - e <sup>-kn/m</sup> ) <sup>k</sup>

-k = # hash functions
+`k` = number of hash functions

-m = size
+`m` = filter size

-n = # items inserted
+`n` = number of items inserted

-These variables, k, m, and n, should be picked based
+These variables, `k`, `m`, and `n`, should be picked based
 on how acceptable false positives are. If the values
 are picked and the resulting probability is too high,
 the values should be tweaked and the probability
@ -92,9 +119,6 @@ but the cost is acceptable. It's ok if a user never sees
 a few articles as long as they have other, brand new ones
 to see every time they visit the site.

-The popular blog site Medium does a version of this.
-Feel free to read [their article](https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff).
-
 ## References

 - [Wikipedia](https://en.wikipedia.org/wiki/Bloom_filter)