Fibonacci Hashing: The Optimization That the World Forgot

143 points by juancampa 3 months ago

317070 3 months ago

Why the golden ratio? Because the continued fraction of the golden ratio is all 1's [0]. So it is uniquely hard to approximate with a rational number. The golden ratio is the bound on Hurwitz's theorem [1]. And avoiding a rational number is what you want for good hashing, because multiplying with a rational number doesn't mix your digits well.

[0] https://codegolf.stackexchange.com/questions/48589/generate-... [1] https://en.m.wikipedia.org/wiki/Hurwitz%27s_theorem_(number_...

arcastroe 2 months ago

> So it is uniquely hard to approximate with a rational number.
Fun fact. The best rational approximations for golden ratio are sequential fibonnaci numbers. E.g. fibonacci(n+1)/fibonacci(n)
or for a more concrete example, 21/13

jlebar 3 months ago

I think the author has misunderstood things here.

This technique is orthogonal to integer mod. Indeed the author multiplies by their magic constant and then does an integer mod to map into their hashtable's buckets.

This technique is actually just applying a fast integer hash on the input keys to the hashtable before mapping the keys to buckets. You can then map to buckets however you want.

The additional hash is useful if and only if the input hash function for your table's keys doesn't appear to be a random function, i.e. it doesn't mix its bits for whatever reason. If your input hash functions are indeed random then this is a (small but perhaps measurable) waste of time.

Using prime-numbered table sizes is another way to accomplish basically the same thing. Dividing the input hash key by a prime forces you to look at all the bits of the input. In practice these are written as division by a constant, so they use multiplies and shifts. It's basically a hash function. (Though I'd use multiply by a magic number over divide by a prime, mul alone should be faster.)

Relatedly see this post by Daniel Lemire about an alternative to integer mod, https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-... which is interesting if your number of buckets is not a power of 2 for some reason.

sdenton4 3 months ago

I think the post talks about exactly this? The method is combining hashing the keys and finding a position in the target range. There's a bit where he talks about how Knuth uses the term 'hash function' as the combination of these two operations, while modern texts look at the two operations in isolation.
So maybe one way of looking at this is as an efficient fusion operation, which doesn't look special when you look at the ops in isolation, but combine to something that is both fast and advised problems with input patterns.
ithkuil 2 months ago

The way I understood this article, the problem Fibonacci hashing seems to solve is that it turns a hashing strategy that would require a prime modulo into something that can use a power of two modulo.
I think there are some hashing functions around that are already designed to solve that problem at "step 1".
So the question just boils down to which is faster

beeforpork 2 months ago

Oh -- this has a name? And it is not commonly known to be better than modulo? Forgotten? I feel old.

The reason multiply+use of high bits is better than modulo is not the golden ratio. In fact, any number using a lot of bits is good (well, not all are the same, like 0 is extremely bad, and 1 and all powers of two are quite bad, because they just shift. But OTOH, the golden ratio is not outstandingly good either -- many values are good.). This is because in the result of a multiplication, higher bits depend on all lower-bits of the operands, because the carry propagates from low bits to higher bits through the resulting value. In contrast, modulo essentially cuts off the upper bits, and so the upper bits are lost for the distribution entropy. This means that if you take the high bits of a multiplication result to distribute into a small set of integers, then you get a better distribution than using only the lower bits, because more bits of the input value are significant for the output result.

I thought this was common knowledge. The man page on rand() used to tell you that you should always multiply into the target range instead of modulo into the target range. It is surprising to see that all big hash libraries seem to miss this.

thot_experiment 3 months ago

I love this property of phi, I used this in a project a while back to order a an initial keyframe stream where it was important to quickly allow the user to arbitrarily jump to any given point in the timeline. It's super easy to implement and you guaranteed that you have something near the query point to show the user right away.

senderista 2 months ago

This article is both confused and confusing. What he's really talking about is not a hash function but a pseudorandom permutation (functionally the same as a block cipher), i.e. a pseudorandom bijective function on an integer domain. In the context of hash functions this is often called a "mixer" or "finalizer" (since it's applied as the final step before determining the bucket). Multiplication by the golden ratio is about the simplest such permutation, but it's far inferior to others (several of which I catalogued here, along with their inverses: https://github.com/senderista/hashtable-benchmarks/tree/mast...). The important thing to remember is that multiplication only mixes low bits into high bits, but you should also mix high bits into low bits (e.g. with XOR), as I do here with the "golden ratio" multiplier: https://github.com/senderista/hashtable-benchmarks/blob/mast....

One useful property of permutations is that they're reversible, so you can reconstruct a key from its hash code, which is useful for hash tables with integer keys. I used this technique here to avoid storing keys separately from hash codes: https://github.com/senderista/hashtable-benchmarks/blob/mast....

Applying a separate "mixer" function to the result of a user-supplied hash function makes sense only if the quality of the user's hash function is suspect. If you control the hash function yourself then there's no reason to use one (as I mentioned, a high-quality hash function will include a finalizer anyway). And if you do use a mixer, you should use a better one than this.

Genbox 3 months ago

I opened this post in a tab 3 days ago, but now it says "5 hours ago". Someone is playing around.

pixl97 3 months ago

>Someone is playing around
No, not really. It's called the second chance pool.
https://news.ycombinator.com/item?id=26998308
willvarfar 3 months ago

No idea in this particular case, but in the past I've got pinged by HN mods suggesting I resubmit posts that I recently submitted but bombed. I imagine HN has an automated system for finding some posts with potential that never got the attention they deserved and boosting them to give them a second chance?
The article is 2018.
IggleSniggle 3 months ago

I h8 it when thir teens get all cheeky like that. Hopefully they'll have matured a bit by 21.
Edit: it seems people don't like my Fibonacci joke. I thought I had this kind of thing figured out by 34.
- Calwestjobs 3 months ago
  
  maybe they are confused because they never heard of mathematical operation of maturing.

peterlada 3 months ago

To summarize: multiplication hashes are inferior but when used with the golden ratio derived integer, they are actually superior

btilly 3 months ago

Poor summary.
Better summary. Fibonacci hashing isn't a great hash function, but it is a really good solution to mapping large integers to small integers. Using it for that doubles the speed of hashing in practice.
- CyberDildonics 2 months ago
  
  Why not just xor all the bits together?
Sesse__ 3 months ago

No, plenty of systems use other factors. The golden ratio has some nice properties, but it's not essential.

mkj 2 months ago

The Linux kernel integer hash uses something similar, so not totally forgotten.

   \* These are the negative, (1 - phi) = phi\*2 = (3 - sqrt(5))/2,
   \* which is very slightly easier to multiply by and makes no
   \* difference to the hash distribution.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

bmn__ 3 months ago

https://github.com/rurban/smhasher#:~:text=fibonacci

NohatCoder 3 months ago

Because it is not a general purpose hash function, it is barely a hash function, but for a specific purpose it does everything needed while being much faster than everything else.

infoaddicted 3 months ago

The youtube video is marked private?

softblush 3 months ago

Part 1 https://vimeo.com/147913571
Part 2 https://vimeo.com/147799853
Part 3 https://vimeo.com/147913569
- Calwestjobs 3 months ago
  
  vimeo and dailymotion to the rescue XD
nayuki 3 months ago

Vihart's YouTube now has exactly one publicly visible video: https://www.youtube.com/@Vihart/videos , https://www.youtube.com/watch?v=hmKix-75dsg "On Gender" [2015-06-08]
skerit 3 months ago

This is a pretty old video, even the series linked to it underneath is now private. Strange.

shaggie76 2 months ago

While reading this article it occurred to me to wonder if the CPU CRC32C instruction would be good for hash functions; I think the latency is about the same as an integer multiply.

up2isomorphism 2 months ago

This information density in the article is extremely low.

hyperbrainer 3 months ago

Needs (2018)

igtztorrero 3 months ago

The Fibonacci series, as fascinating as its name, the origin of the spiral in nature, and the Debian logo, ever-present in computing. The algorithm is surely better as the author says, but most of the time we use what's available. Perhaps the Python, Golang, and JS core programmers could review it.