Hash Table Implementation Python

Hash tables are fundamental data structures in computer science used for efficient data storage and retrieval. They offer fast insertion, deletion, and lookup operations, making them indispensable in various applications such as database indexing, caching systems, and symbol tables in compilers. In this article, we’ll explore the concept of hash tables and implement a basic version in Python.

Table of Contents

Understanding Hash Tables

A hash table, also known as a hash map, is a data structure that stores key-value pairs. It works by mapping keys to values through a hashing function, which transforms the key into an index in an array. This index is used to access the corresponding value directly, providing constant-time complexity for basic operations on average.

The key features of hash tables include:

Key-Value Mapping: Each key in a hash table is associated with a value, allowing efficient storage and retrieval of data.
Hashing Function: A hashing function is used to convert keys into indices within an array. This function should generate unique indices for each key to minimize collisions.
Collision Handling: Collisions occur when multiple keys hash to the same index. Hash tables employ collision resolution techniques to manage such situations efficiently.
Load Factor: The load factor of a hash table determines how full the table is. It’s calculated as the ratio of the number of elements stored to the size of the array. A higher load factor increases the likelihood of collisions but allows for more efficient memory usage.

Hash Table Implementation in Python

Now, let’s dive into the implementation of a hash table in Python. We’ll start with a basic version and gradually enhance it with collision handling and resizing.

python

class HashTable:

    def __init__(self, size):

        self.size = size

        self.table = [[] for _ in range(size)]
    def _hash_function(self, key):

        return hash(key) % self.size
    def insert(self, key, value):

        index = self._hash_function(key)

        self.table[index].append((key, value))
    def search(self, key):

        index = self._hash_function(key)

        for k, v in self.table[index]:

            if k == key:

                return v

        raise KeyError(f"Key {key} not found")

def delete(self, key): index = self._hash_function(key) for i, (k, v) in enumerate(self.table[index]): if k == key: del self.table[index][i] return raise KeyError(f"Key {key} not found")

In this implementation:

We initialize the hash table with a specified size, creating an array of empty lists to store key-value pairs.
The _hash_function method calculates the index for a given key by taking its hash code and applying the modulo operation with the size of the table.
The insert method inserts a key-value pair into the table by computing the index using the hash function and appending the pair to the corresponding list.
The search method looks for a key in the table, calculates the index, and iterates through the list at that index to find the matching key.
The delete method removes a key-value pair from the table by searching for the key and deleting it from the corresponding list.

Testing the Hash Table

Let’s test our implementation with some sample usage:

python

hash_table = HashTable(10)
hash_table.insert('apple', 5)

hash_table.insert('banana', 7)
print(hash_table.search('apple'))  # Output: 5
hash_table.delete('banana')

try: print(hash_table.search('banana')) except KeyError as e: print(e) # Output: Key 'banana' not found

Collision Handling and Resizing

Our basic implementation doesn’t handle collisions or resizing, which are crucial for real-world applications. Collision handling becomes necessary as the number of elements increases, while resizing helps maintain a suitable load factor to balance efficiency and memory usage.

Here’s an overview of collision resolution techniques and resizing strategies:

Collision Resolution:
- Separate Chaining: Each bucket in the hash table stores a linked list of key-value pairs that hash to the same index.
- Open Addressing: In this approach, collisions are resolved by finding an alternative location within the table itself.
Resizing:
- Load Factor Threshold: When the load factor exceeds a certain threshold (typically 0.7), the table is resized to maintain a lower load factor.
- Rehashing: During resizing, all elements are rehashed into a new, larger table to accommodate more entries and reduce collisions.

Implementing collision resolution and resizing adds complexity to the hash table but significantly improves its performance and scalability.

Conclusion

Hash tables are versatile data structures with widespread applications in computer science and software engineering. By efficiently mapping keys to values using hashing functions, hash tables offer fast access to stored data, making them invaluable in various domains.

In this article, we’ve explored the fundamentals of hash tables and implemented a basic version in Python. We’ve discussed key concepts such as hashing functions, collision handling, and resizing, highlighting their importance in designing robust hash table implementations.

Understanding hash tables and their implementation in Python equips you with essential skills for solving real-world problems efficiently and effectively. Experimenting with different strategies for collision resolution and resizing can further enhance your understanding and proficiency in working with hash tables.

hash table implementation python