Java Collections:: Custom Implementation of Collections and Data Structure
Java Collections:: Custom Implementation of Collections and Data Structure
Going for an Job Interview? be prepared with Java Collections framework. Java Collection framework is
the most preferred topic by interviewers. It gives them the idea of how much effort the interviewee as
taken to understand this framework and clear the interview.
And guess what, similar questions are now also asked in US for H1B visa holders, or MS post graduates
looking for job.
This blog explains internal working of the Collection framework. For Interview Question on Collection
refer Java Collection Interview Questions.
Recently Blockchain has also made place in Java Interviews. Blockchain internally is also a Data
Structure. To know more click here : Implementing Blockchain using Java
Java Collection internally uses the primitive and core elements like Arrays and datastructures like Linked
List, Tree etc. So if you are asked a question to explain the internal working of any of the Collection
classes, don't be surprised. Be it an interview for an Junior Java developer or even for an Architect, Java
Collection is always something that you will have on you plate.
Java provides many collection classes that can be used to store data. Knowing which collection class to
use for best performance and optimum result is the key.
The blue ones are the Interfaces and the red ones are the implementation classes
Below table shows the different concrete classes implementing these interfaces
List
List interface promises that the elements maintain the order in which they are added. That means it is a
ordered Collection. List implementations do not sort the elements.
add(E element)
When a new element is added the capacity of the array elementData is checked and if it is completely
filled that is all element 10 are filled, a new array is created with a new capacity by using Arrays.copyOf.
If the elementData array is not exhausted the new element is added in the array.
So adding a element in a array may take more time as a completely new array needs to be created with
greater capacity and the data in the old array is transferred into the new array.
add(index i, E element)
On adding a element at a particular index in ArrayList, ArrayList checks if a element is already present at
that index. If no than the parameter is added at that index, otherwise a new array is created with the
index kept vacant and the remaining element shifted to right.
For Eg:
for(int i:l){
System.out.println(i);
}
Output
1
3
2
4
Here above we are trying to add 3 and position 1, since position 1 already has value '2'. A new array is
created with value at index 1 kept vacant and the remaining elements are shifted to right. Than the
element 3 is added at index 1.
get(int index)
The element present at that index in that array is returned. This is very fast. Why? Because there is no
need to traverse through the nodes as is the case with other collection classes.
Linked List
Ordered
As opposed to ArrayList, LinkedList does not store elements in a array. Linked List is a actually a
collection of objects linked together using a reference to each other. So in assence a java.util.LinkedList
is a Doubly Linked List.
For more details about the Data structure of Singly Link List and Doubly Linked List refer this link Internal
Working of LinkedList
add(E element )
Every Time we call add(var); a new instance of 'Entry' class is created and attached at the end of the
list.
add(var, position)
Inserts the specified element at the specified position in this list.
Shifts the element currently at that position (if any) and any subsequent elements to the right. This is not
as fast as ArrayList.
get(int index)
It iterates through the list and returns the element. This is very expensive and time consuming as
opposed to ArraList.get(int index)
Map
As we saw above, the List allows us to add values in it. But to find that value you need to traverse
through the complete list. Map is a special collection provided by Java. It helps to find a added element
quickly.
Instead of just adding the value or element in a collection, Map allows you to add 2 elements. One called
as key and the other as value.
Consider the key as employee id and the value as the employee object. The logic to save and fetch
a value is based upon the key.
HashMap
HashMap works on the principal of hashing. It stores values in the form of key-value pair and to access a
value you need to provide the key.
HashMap is basically a 2 dimensional Singly Linked List. It can grow in both directions.
For efficient use of HashMap the 'key' element should implement equals() and hashcode() method.
equals() method define that two objects are meaningfully equal. hashcode() helps HashMap to arrange
elements separately in a bucket. So elements with same hascode are kept in the same bucket together.
Now what the hell is a bucket?
Observe the diagram below. All elements that are stored horizontally are said to be in the same bucket.
So when we want to fetch a element using get(K key), HashMap first identifies the bucket in which all
elements of the same hascode as the hashcode of the 'key' passed are present. Now since it knows the
bucket, it will only have to traverse through that bucket to fetch the actual object.
Then it uses the equals() method to identify the actual object present in the bucket.
Lets see how HashMap implements this logic internally.
class Entry {
K key;
V value;
Entry next;
int hash;
}
HashMap also has some more variables which define the initial size of the array.
DEFAULT_LOAD_FACTOR = 0.75f;
DEFAULT_INITIAL_CAPACITY = 16;
For more detailed and internal understanding of HashMap refer Internal Working of HashMap blog. This
blog creates a CustomHashMap and explains step by step process.
TreeMap
Sorted
TreeMap is a structure which is designed to work as a Red - Black - Tree. Here each node has only
two child nodes and the insertion is a tree happens same as the insertion strategy of Java Binary
Search Tree explained here.
If you see below both the Binary Search Tree and Tree Map look same except for the node. The Tree
Map has Key and Value in the node, while the BST only has the value.
TreeMap internally maintains a List of Nodes with each node being a Entry<K,V> class which is
actually a implementation of Map.Entry<K,V> interface.
The basic structure of this Entry class is
class Entry{
K key;
V value;
Entry left = null;
Entry right = null;
Entry parent;
boolean color = BLACK;
}
Set
Set is a collection that can not contain duplicate values. So if we want to have a unique collection, Set is
the obvious choice.
add(E element)
When we add a value in Hashset, Hashset internally adds a value in 'map' by calling put(E,o);
where E that is the key is the element passed in add(E element) of HashSet and 'o' as the value which is
a dummy Object creted by doing Object o = new Object; which is common for all key's entered in
HashMap 'map'.
HashSet internally checks wether the Key that is 'element' is already present by calling the equals
method of 'element'.
This method returns false if the Key is already present in HashMap.
TreeSet
Non Duplicate
Sorted
Like HashSet uses HashMap internally, TreeSet uses TreeMap internally. TreeSet ensures that elements
added are not duplicate and they are sorted. Sorting is done using TreeMap.
add(E element)
When we add a value in TreeSet, TreeSet internally adds a value in 'map' by calling put(E,o);
where E that is the key is the element passed in add(E element) of TreeSetand 'o' as the value which is a
dummy Object creted by doing Object o = new Object; which is common for all key's entered in TreeMap
'map'.
TreeSet internally checks wether the Key that is 'element' is already present by calling the equals method
of 'element'.
This method returns false if the Key is already present in TreeMap.
LinkedHashSet
Non Duplicate
Ordered
LinkedHashSet extends HashSet that means it is a HashMap without duplicates. But the difference here
with HashSet is that LinkedHashSet is ordered.
It uses a Doubly Linked List that runs through the Set holding the order together.