How to implement own Hashtable, how Hashtables are faster than linked list and arrays

 Category: Data Structure And Algorithms Tags: Hashtable Code Files

Introduction

Hashtable is a data structure which works on concept of hashing, a hash is an index which is drawn from a key and that index is mapped to memory slots. Hashtable is created to access an element in O(1) time. as we seen in out last article, Linked List is a data structure which can take O(n) time to search an element in worst case, but if we use an array we can access a location directly in O(1) time. Here Hashtable comes which is combination of Linked List and Array using concept of hashing.
** I would recommend to read the last article on Linked List which concept is used with Hashable in this article.

Hashing: Hashing is mapping a key to a location, as well hashing is transforming data into some unique data, and hashing is also mapping arbitrary data to fixed size data.
Suppose we have 50 students and we have to store their age in memory and to do so we have an array which has 50 locations so we write a hash function which takes student name(name can be any size) and generate an index number between 1 and 50 for every name, if we suppose every name is different and hash function generates a unique index for every name, then it is easy to store every student's age in 1-D array, because no need to store unique id of student with age, hash function can map student name directly to the location where his age is stored.

Hash Function: Hash function is a logic which maps all the input data to a fixed sized data irrespective of input data length. As we seen in above example names can have 1-n number of characters, but a hash function has to generate an index value between 1-50. Output of the hash function is called "Hash".

Collision: Collision might occur, we can make a hash function which can distribute the input keys across the available space uniformly but again there are chances that two different keys might generate same hash. Because hash function is not something which will generate the unique keys, hash function we use to distribute the keys in uniform manner and decreases the chances of collision. Suppose we have an array sized 100 and having 1000 keys to the array, so average of 10 keys probably can hashed to one index. If every index has 10 collisions then it would be perfect hashing in this case, if one index has 1000 collisions then it would be worst hashing.

Handling Collisions: What to do when two or more keys are hashed to same index? One answer is to have linked list at every index, so if collision happens it can attach n number of keys in chain, chaining is perfect way to handle the collision.

Implementation

In this section, we will implement our own hash table using own hash function and our own methods to manipulate the table. We will use chaining (Linked List) to deal with collisions, where we could analyze how actually does a hash table work as well hash function. Below is the basic structure of class contains members and method signatures (Removed implementation, just a skeleton).

public class MyHashTable
{
private Node[] universe;
private readonly int tableSize;
public MyHashTable(int maxTableSize) { tableSize = maxTableSize; universe = new Node[tableSize]; }
private int HashFuncation(string key) { } public void Insert(string key, object value) { } public object GetValue(string key) { } }

Above skeleton gives an idea about what out class contains, member "universe" holds the keys (with value and pointer for chaining) and "tableSize" is used to define the universe size as well to calculate the hash. Here you can see universe is an array of "Node" type, remember the node is similar to which we created in Linked List article. Implementation of Node

public class Node
{
public string Key { get; set; }
public object Value { get; set; }
public Node Next { get; set; }
public Node Previous { get; set; }
}

Above we can see the implementation of Node, which has key, value as well pointers for chaining, Next and Previous. "universe" is an array of Node in our code so we can chain out any collision. Below is implementation of our first method HashFuncation which is the first most important function:

/// <summary>
/// Hash funcation
/// </summary>
/// <param name="key">key</param>
/// <returns>returns index from the space based on some calculation</returns>
private int HashFuncation(string key)
{
int index = 7;
int asciiVal = 0;
for (int i = 0; i < key.Length; i++)
{
asciiVal = (int)key[i] * i;
index = index * 31 + asciiVal;
}
return index % tableSize;
}

**We have to be very careful while writing hash functions because a good hash function can decrease the chances of collision and distribute data uniformly, There are set of rules to choose  hash function and some algorithms as well, Here we used division method which divides and gives reminder, here we can see we used 7(a prime) to start with and 31(again prime), primes are good make random hash, as well I multiplied ASCII value with character position in string so "aSp" and "AsP" won't be same as well "asp" and "psa". If possible make table size also prime, but here I'm taking table size from user and this is not area of concern of this article so taking whatever user gives.
Now we are going to see implementation of "Insert" function below:

/// <summary>
/// Inserts a value with key to table
/// </summary>
/// <param name="key">key</param>
/// <param name="value">value associated to key</param>
public void Insert(string key, object value)
{
int genIndex = HashFuncation(key);
Node node = universe[genIndex];
if (node == null) { universe[genIndex] = new Node() { Key = key, Value = value }; return; }
if (node.Key == key) throw new Exception("Can't use same key!");
while (node.Next != null) { node = node.Next; if (node.Key == key) throw new Exception("Can't use same key!"); }
Node newNode = new Node() { Key = key, Value = value, Previous = node, Next = null }; node.Next = newNode; }

Above Insert function first generates the hash index then whether on the index some node is there or it is null, if null then creates a node with value, if already a node then it is collision then it uses linked list logic to iterate until  reach to end and at end we chain a new node, meanwhile we check also if some key is exactly same, if we find same key then we throw an exception saying can't use same key again. Now we will implement "GetValue" method below:

/// <summary>
/// fetch value of a key
/// </summary>
/// <param name="key">key</param>
/// <returns>value</returns>
public object GetValue(string key)
{
int genIndex = HashFuncation(key);
Node node = universe[genIndex];
while (node != null)
{
if (node.Key == key)
{
return node.Value;
}
node = node.Next;
}
throw new Exception("Don't have the key in hash!"); }

Above method is to get values by their key, it would first calculate the hash index where it might be in array then it would search the linked list to match the key, if it has it returns else throws an exception. Below the code how can we test out hash table

MyHashTable hashTable = new MyHashTable(50);
hashTable.Insert("ab", 5);
hashTable.Insert("bb", 6);
hashTable.Insert("aB", 7);
hashTable.Insert("Bc", 8);
hashTable.Insert("cA", 9);
hashTable.Insert("CC", 10);

Console.WriteLine("Value of key 'bb' is: " + hashTable.GetValue("bb"));

Above code can be used to test the hash table, the output of the code will be:
Value of key 'bb' is: 6

Analysis

Let's suppose we are given a array size 100 and we have 1000 keys to store so the best hash function would be indexing 1000/100 = 10 keys per index and worst function will hash all the 1000 keys to any one location of 100. So If we compare the hash table with array, for array we need an array sized 1000 to store all the keys but we can access any one of them in O(1) time but again we have to loop over the array to find the specific value so searching would be O(n) time, but if we see in hash table with best hashing, a key might chain so in would take O(11) time to reach to the end if we know the key.
The hashtables are designed to access a key value in O(1) time but if number of keys are greater than locations then we use Linked List along hashing. In that case, we define a expected length element which is α = n/m where n is number of keys and m is universe (length of array technically where to store keys). So the number of elements chain in single index is α. and searching for last element in chain cost you O(1 + α). where 1 is to find index via hash function and α is to loop over linked list to reach last element because average α elements are chained in the linked list.
Same if we compare hashtable with linked list again hashtable wins because in previous article on linked list we have seen the searching is O(n) time in linked list but in hash table its only O(1 + α).

 Like 0 People Nikhil Joshi Ceo & Founder at Dotnetlovers Atricles: 133 Questions: 9 Given Best Solutions: 9 *

Login via:   x 