!!!Chapter 5 Hashing
来源:互联网 发布:工程优化方法及其应用 编辑:程序博客网 时间:2024/06/09 23:55
The implementation of hash table is frequently called hashing.
Hashing is a technique used for performing insertions, deletions, and finds in constant average time.
Tree operations like FindMin, FindMax, printTree are not supported on on hash table.
5.1 General Idea
Hash table ADT is merely an array of some fixed size, containing the keys. We will refer to the table size asTableSize. The table run from 0 to TableSize - 1.
Each key is mapped into some number in the range 0 to TableSize - 1 and placed in the appropriate cell.
The mapping is called a hash function.
5.2 Hash Function
If the input keys are integers, then simply returning Key mod TableSize is generally a reasonable strategy.
It is usually a good idea to ensure that the table size is prime.
If input keys are strings, one option is to add up the ASCII values of the characters in the string. If the table size is large, this option does not distribute the keys well.
// Simple hash function to distribute stringtypedef unsigned int Index;Index Hash( const char *Key, int TableSize ){ unsigned int HashVal = 0; while( *Key != '\0' ) HashVal += *Key++; return HashVal % TableSize;}Another Hash function assumes that Key has at least two characters plus the NULL terminator:
// A not too good hash functionIndex Hash( const char *Key, int TableSize ){ return( Key[0] + 27*Key[1] + 729*Key[2] ) % TableSize;}A good Hash function with Horner's rule:
Index Hash( const char *Key, int TableSize ){ unsigned int HashVal = 0; while( *Key != '\0' )// When we have one more char, we will time the value by 32 HashVal = ( HashVal << 5 ) + *Key++; return HashVal % TableSize;}
5.3 Separate Chaining
Separate chaining is to keep a list of elements that hash to the same value.Find: use hash function to determine which list to traverse, then traverse the list and return the position where the item is found.
Insert/delete: linked list operation.
#ifndef _HashSep_Hstruct ListNode;typedef struct ListNode *Position;struct HashTbl;typedef struct HashTbl *HashTable;HashTable InitializeTable( int TableSize );void DestroyTable( HashTable H );Position Find( ElementType Key, HashTable H );void Insert( ElementType Key, HashTable H );ElementType Retrieve( Position P );#endif// place in implementation file// ListNode is the element of linked liststruct ListNode{ ElementType Element; Position Next;}typedef Position List;struct HashTbl{ int TableSize; List *TheLists;};
Initialization routine:
HashTable InitializeTable( int TableSize ){ HashTable H; int i; if( TableSize < MinTableSize ) // first check if table is too small ... ... H = malloc( sizeof(struct HashTbl) ); // check if space is enough... H->TableSize = NextPrime( TableSsize ); H->TheLists = malloc( sizeof(List)*H->TableSize ) // check space // allocate list headers, neglect this step if don't need header for( i = 0; i < H->TableSize; i++ ) { H->TheLists[i] = malloc( sizeof(struct ListNode)); //check space H->TheLists[i]->Next = NULL; } return H;}
Find routine:
Position Find( ElementType Key, HashTable H ){ Position P; List L; L = H->TheLists[ Hash( Key, H->TableSize )]; P = L->Next; while( P != NULL && P->Element != Key) P = P->Next; return P;}Insert routine:
void Insert( ElementType Key, HashTable H ){ Position Pos, NewCell; List L; Pos = Find( Key, H ); if( Pos == Null ) //key is not found { NewCell = malloc( sizeof(struct ListNode) ); //check space L = H->TheLists[ Hash( Key, H->TableSize ) ]; NewCell->Next = L->Next; NewCell->Element = Key; L->Next = NewCell; } // if key is there, we do nothing}If the hash table is large and the hash function is good, all the lists should be short. So linked list is more efficient than other implementations.(BST, hash table)
load factor, λ, is the ratio of the number of elements in the hash table to the table size.
The general rule for separate chaining hashing is to make the table size about as large as the number of elements expected (λ≈1).
5.4 Open Addressing
Open addressing hashing is an alternative to resolving collisions with linked lists.
In an open addressing hashing system, if a collision occurs, alternative cells are tried until an empty cell is found.
Cell h0(X), h1(X), h2(X), ... are tried in succession, where hi(X) = (Hash(X) + F(i)) mod TableSize, with F(0)=0.
Since all the data go inside the table, a bigger table is needed for open addressing hashing. Generally, load factor should be below λ=0.5.
5.4.1 Linear Probing
In linear probing, F is a linear function of i. F(i) = i. This method may lead toprimary clustering
5.4.2 Quadratic Probing
For Quadratic Probing, F(i) = i*i.
If the table is half empty and the table size is prime, then we are always guaranteed to be able to insert a new element.
Standard deletion cannot be performed in an open addressing hash table. we can only apply lazy deletion.
#ifndef _HashQuad_Htypedef unsigned int Index;typedef Index Position;struct HashTbl;typedef struct HashTbl *HashTable;HashTable InitializeTable( int TableSize );void DestroyTable( HashTable H );Position Find( ElementType Key, HashTable H );void Insert( ElementType Key, HashTable H );ElementType Retrieve( Position P, HashTable H );HashTable Rehash( HashTable H );#endif//place in implementation fileenum KindofEntry{ Legitimate, Empty, Deleted };struct HashEntry{ ElementType Element; enum KindOfEntry Info;};typedef struct HashEntry Cell;struct HashTbl{ int TableSize; Cell *TheCells;};Routine to initialize open addressing hash table:
HashTable InitializeTable( int TableSize ){ HashTable H; int i; if( TableSize < MinTableSize) // certain load factor ...... // allocate table H = malloc( sizeof(struct HashTbl) ); // check space H->TableSize = NextPrime( TableSize ); H->TheCells = malloc( sizeof(Cell) * H->TableSize ); //check space for( i = 0; i < H->TableSize; i++) H->TheCells[i].Info = Empty; return H;}Find routine for hashing with quadratic probing
Position Find( ElementType Key, HashTable H ){ Position CurrentPos; // Position is unsigned int int CollisionNum; CollisionNum = 0; CurrentPos = Hash( Key, H->TableSize );// The order matters! while( H->TheCells[CurrentPos].Info!=Empty && H->TheCells[CurrentPos].Element!=Key) {// i*i = (i-1)*(i-1) + 2i - 1 CurrentPos +=2*++CollisionNum - 1; if( CurrentPos >= H->TableSize) CurrentPos -= H->TableSize; } return CurrentPos;}Insert routine for hash tables with quadratic probing
void Insert( ElementType Key, HashTable H ){ Position Pos; Pos = Find( Key, H ); if( H->TheCells[Pos].Info != Legitimate ) // can insert { H->TheCells[Pos].Info = Legitimate; H->TheCells[Pos].Element = Key; }}
5.4.3 Double Hashing
For double hashing, one popular choice is F(i) = i * hash2(X).
hash2(X) = R - (X mod R), with R a prime smaller than TableSize.
5.5 Rehashing
Rehashing: When the has table is too full, we can build another table that is about twice as big (with an associated new hash function) and scan down the entire original hash table, computing the new hash value for each (nondeleted) element and inserting it in the new table.
Rehashing can be implemented in several ways:
1. Rehash as soon as the table is half full.
2. Rehash only when an insertion fails.
3. Rehash when the table reaches a certain load factor.
HashTable Rehash( HashTable H ){ int i, OldSize; Cell *OldCells; OldCells = H->TheCells; OldSize = H->TalbeSize; H = InitializeTable( 2 * OldSize ); for( i=0; i<OldSize; i++) if( OldCells[i].Info == Legitimate ) Insert( OldCells[i].Element, H ); free( OldCells ); return H;}
- !!!Chapter 5 Hashing
- 5-18 Hashing
- 5-18 Hashing
- Hashing
- Hashing
- Hashing
- Hashing
- Hashing
- Hashing
- chapter 5
- Chapter 5
- Chapter 5
- Chapter 5
- Chapter 5
- Chapter 5
- Chapter 5
- Chapter 5
- Chapter 5
- 防止Entity Framework重复插入关联对象
- 函数指针实现的两种方法
- 博客园左侧导航jQuery+Css
- 面试题:腾讯2012面试题
- C++中指针和引用的区别
- !!!Chapter 5 Hashing
- 慎用_tcscpy函数
- 你也可以是天才,心有多大,舞台就有多大
- get network hotplugging working on Ubuntu 12.04 LTS
- 13、EL表达式
- Android碰撞检测——圆形检查
- C++空类的默认成员函数总结
- Structs2学习小结
- HDU 1033 Edge