In this article, we will learn about the most popular question “What is Hashing?”
Before that, let’s try to think of a solution to the below problem:
Problem: We want to design a system for storing student record keys using phone numbers. And want to perform the following queries efficiently:-
- Search a phone number and get information
- Delete a phone number and information associated with it.
- Insert a phone number and information associated with it.
We can solve the above problem using the following data structures:
- Phone number and record can be stored in the form of Arrays
- Phone numbers can be stored as keys in Balanced Binary Trees
- Phone number and record can be stored in the form of Linked List
- Phone number and record can be accessed from Table
Linked List and Arrays
- Problem with both is that searching costs Linear Time complexity which is O(n)
- Even if we store phone numbers in Arrays in Sorted Form, then search complexity can be reduced to O(logn) but the problem is to maintain that sorted order. It will become very difficult to place a new record and delete a record in the array because we want the array to be maintained in sorted order.
- In Tables, we can create a huge array where the phone number of students will be the index of the array. The array stores a pointer that points to the record associated with the phone number. If we don’t have any record then it will be NULL.
- Time complexity will be O(1) because we can fetch the information directly by using an index. Also, insertion will take O(1), we will use the phone number as an index and will store a pointer that will point to the record associated with it.
- Problem with the above solution is we need a huge extra space. The below table will explain to you how :
- In the above table, we can see we used only 5 slots to fill up the information. For this specific example, things work well. Our array needs to be big enough to store the largest string. Also, the above example may also cause a problem of Collisions. It means what if there is more than one key with the same length of the given name.
Balanced Binary Search Trees
From this, we get O(log n ) time to search, insert and delete.
Due to all the above limitations, we will be using Hashing. By using hashing we can get the search time complexity of O(1) ( under some assumptions ) and O(n) in the worst case. Also, hashing performs extremely well compared to Balanced Binary Search Trees, Arrays, etc.
The idea is to use a hash function that can convert a phone number or any other key to a smaller number and use the small number as an index in the Hash table.
Hash Function: This function will convert the phone number into a small integer which will be practical to store. Now this integer will be used as an index in the hash table. If the key is in the form of String, then also it maps it to a small integer which is called index in the hash table.
Hash Table: It’s an array that stores pointers pointing to the record corresponding to the index mapped. In the current example, the array will store pointers that point to the record associated with the phone number.
Collision Handling: In the hash function, the function gets a small number for a big key, there might be a possibility the 2 keys end up having the same value. This may lead to mapping a new key to an already occupied slot in the hash table. This is called Collision. There are 2 ways to handle such a situation :
- Open Addressing
- Chaining :
- In this technique, an array of Linked List is used. Each index has its own linked list
- Key-value pairs of the same index will be stored in the linked list of that index in the array.
- The insertion in the hash table will take O(1) time complexity.
- The Linked List can be grown as long as we have enough space for a particular index.
2. Open Addressing :
- All the elements are stored in the hash table.
- Each entry in the table either contains a record or its NULL.
- When we search an element, we visit each slot one by one until we get the final element. If we don’t get the desired element, it means it’s not present in the hash table.