FUDforum: comp.lang.php » FYI performance of hash algorithms

Home » Imported messages » comp.lang.php » FYI performance of hash algorithms

Show: Today's Messages :: Polls :: Message Navigator

FYI performance of hash algorithms [message #177852]

Mon, 23 April 2012 19:42

M. Strobel
Messages: 386
Registered: December 2011

Karma: 0

Senior Member

I am just asking myself which of the many hash algorithms performs best, so I can use
it to index my translation strings.

So I wrote a test script. It iterates over an array of 3.917.116 strings calculating
the hash of each.

Here you have the results, with run time, and run time relative to hash length
(smaller is better).
Interesting that the tiger hashes seem to perform well.

There were only collisions with adler32 (many), crc32 and crc32b.

/Str.

Name Length Timing t/Byte (skal.)
crc32 8 2,2150330544 276,9
crc32b 8 2,2309110165 278,9
adler32 8 2,4038968086 300,5
md4 32 2,9871790409 93,3
md5 32 3,0691349506 95,9
tiger128,3 32 3,4266569614 107,1
tiger160,3 40 3,5735139847 89,3
tiger128,4 32 3,6424109936 113,8
tiger192,3 48 3,6483809948 76,0
sha1 40 3,6727621555 91,8
tiger160,4 40 3,7693319321 94,2
tiger192,4 48 3,8536069393 80,3
ripemd128 32 3,9478549957 123,4
ripemd256 64 4,2519729137 66,4
salsa20 128 4,4366490841 34,7
salsa10 128 4,4819700718 35,0
ripemd160 40 4,8140711784 120,4
sha224 56 5,2336411476 93,5
ripemd320 80 5,2477719784 65,6
sha256 64 5,3834650517 84,1
haval128,3 32 5,8804600239 183,8
haval160,3 40 5,9342179298 148,4
haval192,3 48 6,0202069283 125,4
haval224,3 56 6,0966057777 108,9
haval256,3 64 6,2231030464 97,2
sha384 96 6,4372200966 67,1
sha512 128 6,7605450153 52,8
haval128,4 32 7,0831079483 221,3
haval160,4 40 7,1777567863 179,4
haval192,4 48 7,2292518616 150,6
haval224,4 56 7,3201220036 130,7
haval256,4 64 7,4621391296 116,6
whirlpool 128 7,4762940407 58,4
haval128,5 32 7,9410018921 248,2
haval160,5 40 8,028870821 200,7
haval192,5 48 8,102463007 168,8
haval224,5 56 8,1795060635 146,1
haval256,5 64 8,3195581436 130,0
gost 64 12,0381391048 188,1
snefru 64 14,0808851719 220,0
snefru256 64 14,1807420254 221,6
md2 32 17,4794299603 546,2

Report message to a moderator

Re: FYI performance of hash algorithms [message #177853 is a reply to message #177852]

Mon, 23 April 2012 20:16

Jerry Stuckle
Messages: 2598
Registered: September 2010

Karma: 0

Senior Member

On 4/23/2012 3:42 PM, M. Strobel wrote:
> I am just asking myself which of the many hash algorithms performs best, so I can use
> it to index my translation strings.
>
> So I wrote a test script. It iterates over an array of 3.917.116 strings calculating
> the hash of each.
>
> Here you have the results, with run time, and run time relative to hash length
> (smaller is better).
> Interesting that the tiger hashes seem to perform well.
>
> There were only collisions with adler32 (many), crc32 and crc32b.
>
> /Str.
>

<snip>

Interesting, but only applicable on your system with your libraries and
your data. A different system could have different versions of the
libraries and completely different performance figures.

As for collisions - completely dependent on your data. While you can
say some things with certainty (i.e. hashing 2^16+1 values with CRC-16
is guaranteed to have at least one collision), by picking different
strings to be hashed your collisions will come out differently.

For instance, it's not really surprising that adler32 had a lot of
collisions. It was designed for CRC and works better with binary data;
it was never meant to be a general purpose hashing algorithm. Rather it
was designed for speed. It's also very short.

The same is true but to a lesser extent with the CRC-32 and CRC-32b
algorithms, so again I'm not surprised about the collisions.

But the real question here is - how much does it matter if hashing a
string takes 2 microseconds or 2.5 microseconds? Unless you're doing
millions of them at one time, no one is going to notice the difference.
Personally, when looking at such things I always elect the fewest
collisions over the fastest time.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================

Report message to a moderator

Re: FYI performance of hash algorithms [message #177854 is a reply to message #177853]

Mon, 23 April 2012 20:44

M. Strobel
Messages: 386
Registered: December 2011

Karma: 0

Senior Member

Am 23.04.2012 22:16, schrieb Jerry Stuckle:
> On 4/23/2012 3:42 PM, M. Strobel wrote:
>> I am just asking myself which of the many hash algorithms performs best, so I can use
>> it to index my translation strings.
>>
>> So I wrote a test script. It iterates over an array of 3.917.116 strings calculating
>> the hash of each.
>>
>> Here you have the results, with run time, and run time relative to hash length
>> (smaller is better).
>> Interesting that the tiger hashes seem to perform well.
>>
>> There were only collisions with adler32 (many), crc32 and crc32b.
>>
>> /Str.
>>
>
> <snip>
>
> Interesting, but only applicable on your system with your libraries and your data. A
> different system could have different versions of the libraries and completely
> different performance figures.

Agreed. But it gives you hints.

> As for collisions - completely dependent on your data. While you can say some things
> with certainty (i.e. hashing 2^16+1 values with CRC-16 is guaranteed to have at least
> one collision), by picking different strings to be hashed your collisions will come
> out differently.
>
> For instance, it's not really surprising that adler32 had a lot of collisions. It
> was designed for CRC and works better with binary data; it was never meant to be a
> general purpose hashing algorithm. Rather it was designed for speed. It's also very
> short.
>
> The same is true but to a lesser extent with the CRC-32 and CRC-32b algorithms, so
> again I'm not surprised about the collisions.

Neither am I. I just did not cut the list, they are listed by hash_algos().

> But the real question here is - how much does it matter if hashing a string takes 2
> microseconds or 2.5 microseconds? Unless you're doing millions of them at one time,
> no one is going to notice the difference. Personally, when looking at such things I
> always elect the fewest collisions over the fastest time.

Even for this simple test I had to bump up memory_limit to 3G. And a real collision
test is difficult.

The problem is indeed that hashes are often used without collision detection, I am
running the app without as well. I made this test to have an indication about how
much longer one of the longer hashes takes to compute, for example md5 / sha1.

For me the winner is ripemd256: a very long hash, running relatively fast.

/Str.

Report message to a moderator

Previous Topic:	Stats comp.lang.php (last 7 days)
Next Topic:	Download now

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Wed Nov 27 13:18:03 GMT 2024

Total time taken to generate the page: 0.02172 seconds