Hash Table
What is a Hash Table?
A hash table (or hash map) is a data structure that provides an efficient way to store and retrieve data. It uses a technique called hashing to map keys to values, allowing for average-case constant time complexity (O(1)
) for operations like insertion, deletion, and lookup.
Key Concepts
-
Hash Function
- Purpose: A hash function takes an input (or key) and returns an integer known as a hash code. The hash code is then used as an index into an array where the value corresponding to the key is stored.
- Properties: A good hash function should distribute keys uniformly across the hash table to minimize collisions.
-
Buckets
- Purpose: The array in a hash table is often referred to as a series of buckets, where each bucket can hold multiple key-value pairs.
- Structure: Each bucket corresponds to an index in the array, derived from the hash code of the key.
-
Collisions
- Definition: Collisions occur when two keys hash to the same index in the array.
- Handling Collisions: There are several strategies to handle collisions, including chaining and open addressing.
Collision Handling Techniques
-
Chaining
- Description: Each bucket in the hash table contains a list (or another data structure like a linked list) of key-value pairs. When a collision occurs, the new key-value pair is added to the list at the appropriate bucket.
- Pros: Simple to implement and handles collisions effectively.
- Cons: Can degrade to O(n) time complexity if many collisions occur, resulting in long lists.
-
Open Addressing
- Description: Instead of storing multiple elements in a single bucket, open addressing places each element directly into the array. If a collision occurs, the algorithm probes (or searches) for the next empty slot in the array.
- Probing Methods:
- Linear Probing: Increment the index by one until an empty slot is found.
- Quadratic Probing: Use a quadratic function to determine the next slot.
- Double Hashing: Use a second hash function to determine the step size for probing.
- Pros: Can lead to better cache performance due to fewer linked lists and reduced memory overhead.
- Cons: More complex implementation and can suffer from clustering, where groups of occupied slots create long probe sequences.
Operations on Hash Tables
-
Insertion
- Calculate the hash code of the key.
- Determine the index from the hash code.
- Place the key-value pair at the appropriate bucket (or probe to find an empty slot in open addressing).
-
Deletion
- Calculate the hash code of the key.
- Determine the index from the hash code.
- Remove the key-value pair from the bucket or mark the slot as deleted in open addressing.
-
Lookup
- Calculate the hash code of the key.
- Determine the index from the hash code.
- Retrieve the value from the appropriate bucket or probe for the key in open addressing.
Advantages of Hash Tables
- Speed: Average-case constant time complexity (O(1)) for insertion, deletion, and lookup operations.
- Efficiency: Efficient memory usage when the load factor (number of elements / number of buckets) is maintained.
Disadvantages of Hash Tables
- Collisions: Handling collisions can add complexity and degrade performance.
- Memory Overhead: Requires a balance between speed and memory usage, as too many empty buckets waste space, while too few increase collisions.
- Complexity: Implementation can be complex, especially with open addressing and probing strategies.
Use Cases
- Databases: Hash tables are used in indexing and quick data retrieval.
- Caches: Storing recently accessed data for quick lookups.
- Symbol Tables: In compilers and interpreters to store variable bindings.
- Sets and Maps: Implementing abstract data types like sets and dictionaries.