Portable hash

The Taligent Application Environment sometimes stores objects in disk files that are accessed via a hash. Examples of classes that do this are TDiskSetOf and TDiskDictionaryOf. In order for the index structures in these files to work when the files are transported across platforms, the hash functions used must return the same result on every platform.

To do a portable hash, follow these rules:

Only use other portable-hash results or portably converted primitives as elements in your calculations.

Store the hash result as a value; never interpret it as a sequence of bytes.

All of the primitives that you use must be converted to a portable form, with precisely the same results on all machines. This means that you convert all primitive values to HashResult (unsignedlong), using only 32 bits.

All arithmetic operations that you perform in combining elements must be portable; they must have precisely the same results on all hardware. This means that every operation on HashResult must be masked to 32 bits afterwards. Restrict yourself to the arithmetic operations (+, -, *, /, %) and logical operations (^, &, |, ~, >>, <<, RotateUp, RotateDown).

    result = (x + y) & kMask32
You must guarantee to all clients that you will never change the hashing algorithm, even if there are bugs in it. The only exception to this would be if you had a bug in your algorithm so severe that its prior results were useless. For example, if Hash() returns TRandomNumberGenerator::First(), the old data isn't retrievable. So changing Hash() to return 0 is permissible because it allows new applications to work, while the old ones already don't work.

Strive for uniform distribution

If you want your hash function to return hash values that are uniformly distributed across what the return type can represent, send your result as a seed to a random number generator that returns values of that type. Because the semantics of the random number generator are to return numbers that are uniformly distributed, this makes your hash function also generate values with the same property.

In the future, the Taligent Application Environment will define APIs to assist in computing portable hashes. In the meantime, you might want to write your own helpers, such as a masking function.

Do not implement Hash via member functions

For many objects, Hash (and comparison) aren't intrinsic properties of the objects, but of the collections they are inserted into. Thus, they shouldn't be implemented via member functions. To illustrate why this is a problem, consider TFontIdentifierStyle, which identifies the font to use for text in style sets and line layout. A hash function was added for line layout so that TFontIdentifierStyles could be found. This Hash function was based on the name of the font. However, no one realized that TStyle already defined a Hash function based on the type of the style--it was a TFontIdentifierStyle. So, when this new Hash function was introduced, TFontIdentifierStyles stopped working in style sets.


[Contents] [Previous] [Next]
Click the icon to mail questions or corrections about this material to Taligent personnel.
Copyright©1995 Taligent,Inc. All rights reserved.

Generated with WebMaker