!!!Chapter 5 Hashing

来源：互联网发布：工程优化方法及其应用编辑：程序博客网时间：2024/06/09 23:55

The implementation of hash table is frequently called hashing.

Hashing is a technique used for performing insertions, deletions, and finds in constant average time.

Tree operations like FindMin, FindMax, printTree are not supported on on hash table.

5.1 General Idea

Hash table ADT is merely an array of some fixed size, containing the keys. We will refer to the table size asTableSize. The table run from 0 to TableSize - 1.

Each key is mapped into some number in the range 0 to TableSize - 1 and placed in the appropriate cell.

The mapping is called a hash function.

5.2 Hash Function

If the input keys are integers, then simply returning Key mod TableSize is generally a reasonable strategy.

It is usually a good idea to ensure that the table size is prime.

If input keys are strings, one option is to add up the ASCII values of the characters in the string. If the table size is large, this option does not distribute the keys well.

// Simple hash function to distribute stringtypedef unsigned int Index;Index Hash( const char *Key, int TableSize ){    unsigned int HashVal = 0;    while( *Key != '\0' )        HashVal += *Key++;    return HashVal % TableSize;}

Another Hash function assumes that Key has at least two characters plus the NULL terminator:

// A not too good hash functionIndex Hash( const char *Key, int TableSize ){    return( Key[0] + 27*Key[1] + 729*Key[2] ) % TableSize;}

A good Hash function with Horner's rule:

Index Hash( const char *Key, int TableSize ){    unsigned int HashVal = 0;        while( *Key != '\0' )// When we have one more char, we will time the value by 32        HashVal = ( HashVal << 5 ) + *Key++;    return HashVal % TableSize;}

5.3 Separate Chaining

Separate chaining is to keep a list of elements that hash to the same value.

Find: use hash function to determine which list to traverse, then traverse the list and return the position where the item is found.

Insert/delete: linked list operation.

#ifndef _HashSep_Hstruct ListNode;typedef struct ListNode *Position;struct HashTbl;typedef struct HashTbl *HashTable;HashTable InitializeTable( int TableSize );void DestroyTable( HashTable H );Position Find( ElementType Key, HashTable H );void Insert( ElementType Key, HashTable H );ElementType Retrieve( Position P );#endif// place in implementation file// ListNode is the element of linked liststruct ListNode{    ElementType Element;    Position    Next;}typedef Position List;struct HashTbl{    int TableSize;    List *TheLists;};

Initialization routine:

HashTable InitializeTable( int TableSize ){    HashTable H;    int i;    if( TableSize < MinTableSize )         // first check if table is too small ...    ...    H = malloc( sizeof(struct HashTbl) );  // check if space is enough...    H->TableSize = NextPrime( TableSsize );    H->TheLists = malloc( sizeof(List)*H->TableSize ) // check space    // allocate list headers, neglect this step if don't need header    for( i = 0; i < H->TableSize; i++ )    {        H->TheLists[i]  = malloc( sizeof(struct ListNode)); //check space        H->TheLists[i]->Next = NULL;    }    return H;}

Find routine:

Position Find( ElementType Key, HashTable H ){    Position P;    List L;    L = H->TheLists[ Hash( Key, H->TableSize )];    P = L->Next;    while( P != NULL && P->Element != Key)        P = P->Next;    return P;}

Insert routine:

void Insert( ElementType Key, HashTable H ){    Position Pos, NewCell;    List L;    Pos = Find( Key, H );    if( Pos == Null )    //key is not found    {        NewCell = malloc( sizeof(struct ListNode) ); //check space        L = H->TheLists[ Hash( Key, H->TableSize ) ];        NewCell->Next = L->Next;        NewCell->Element = Key;        L->Next = NewCell;    }    // if key is there, we do nothing}

If the hash table is large and the hash function is good, all the lists should be short. So linked list is more efficient than other implementations.(BST, hash table)
load factor, λ， is the ratio of the number of elements in the hash table to the table size.

The general rule for separate chaining hashing is to make the table size about as large as the number of elements expected (λ≈1).

5.4 Open Addressing

Open addressing hashing is an alternative to resolving collisions with linked lists.

In an open addressing hashing system, if a collision occurs, alternative cells are tried until an empty cell is found.

Cell h0(X), h1(X), h2(X), ... are tried in succession, where hi(X) = (Hash(X) + F(i)) mod TableSize, with F(0)=0.

Since all the data go inside the table, a bigger table is needed for open addressing hashing. Generally, load factor should be below λ=0.5.

5.4.1 Linear Probing

In linear probing, F is a linear function of i. F(i) = i. This method may lead toprimary clustering

5.4.2 Quadratic Probing

For Quadratic Probing, F(i) = i*i.

If the table is half empty and the table size is prime, then we are always guaranteed to be able to insert a new element.

Standard deletion cannot be performed in an open addressing hash table. we can only apply lazy deletion.

#ifndef _HashQuad_Htypedef unsigned int Index;typedef Index Position;struct HashTbl;typedef struct HashTbl *HashTable;HashTable InitializeTable( int TableSize );void DestroyTable( HashTable H );Position Find( ElementType Key, HashTable H );void Insert( ElementType Key, HashTable H );ElementType Retrieve( Position P, HashTable H );HashTable Rehash( HashTable H );#endif//place in implementation fileenum KindofEntry{ Legitimate, Empty, Deleted };struct HashEntry{    ElementType Element;    enum KindOfEntry Info;};typedef struct HashEntry Cell;struct HashTbl{    int TableSize;    Cell *TheCells;};

Routine to initialize open addressing hash table:

HashTable InitializeTable( int TableSize ){    HashTable H;    int i;    if( TableSize < MinTableSize)  // certain load factor    ......    // allocate table    H = malloc( sizeof(struct HashTbl) );  // check space       H->TableSize = NextPrime( TableSize );    H->TheCells = malloc( sizeof(Cell) * H->TableSize ); //check space        for( i = 0; i < H->TableSize; i++)        H->TheCells[i].Info = Empty;        return H;}

Find routine for hashing with quadratic probing

Position Find( ElementType Key, HashTable H ){    Position CurrentPos;  // Position is unsigned int    int CollisionNum;        CollisionNum = 0;    CurrentPos = Hash( Key, H->TableSize );// The order matters!    while( H->TheCells[CurrentPos].Info!=Empty && H->TheCells[CurrentPos].Element!=Key)    {// i*i = (i-1)*(i-1) + 2i - 1        CurrentPos +=2*++CollisionNum - 1;        if( CurrentPos >= H->TableSize)            CurrentPos -= H->TableSize;    }    return CurrentPos;}

Insert routine for hash tables with quadratic probing

void Insert( ElementType Key, HashTable H ){    Position Pos;    Pos = Find( Key, H );    if( H->TheCells[Pos].Info != Legitimate ) // can insert    {        H->TheCells[Pos].Info = Legitimate;        H->TheCells[Pos].Element = Key;    }}

5.4.3 Double Hashing

For double hashing, one popular choice is F(i) = i * hash2(X).

hash2(X) = R - (X mod R), with R a prime smaller than TableSize.

5.5 Rehashing

Rehashing: When the has table is too full, we can build another table that is about twice as big (with an associated new hash function) and scan down the entire original hash table, computing the new hash value for each (nondeleted) element and inserting it in the new table.

Rehashing can be implemented in several ways:

1. Rehash as soon as the table is half full.

2. Rehash only when an insertion fails.

3. Rehash when the table reaches a certain load factor.

HashTable Rehash( HashTable H ){    int i, OldSize;    Cell *OldCells;        OldCells = H->TheCells;    OldSize = H->TalbeSize;        H = InitializeTable( 2 * OldSize );        for( i=0; i<OldSize; i++)        if( OldCells[i].Info == Legitimate )            Insert( OldCells[i].Element, H );        free( OldCells );    return H;}