An Emphasized Dual Similarity Measure Integration For Online Image Retrieval System Using SimRank
An Emphasized Dual Similarity Measure Integration For Online Image Retrieval System Using SimRank
An Emphasized Dual Similarity Measure Integration For Online Image Retrieval System Using SimRank
Abstract-In the real world scenario the use of image grows rapidly, the then use visual similarity to search for visually relevant
image rich network is the one that comprises of billions of images. images. However, existing works cannot handle the link
The social media websites, such as Picasa, Flickr and Facebook structure. We are using the Mok-SimRank algorithm to
comprises billions of end user posted images along with their estimate the link structure similarity. When consider the
annotations. Similarly the electronic commerce website such as
Flipkart, Myntra and Amazon are also furnished with product related
images in the network, image similarity can actually also be
images. In this paper, we introduce how to perform efficient and estimated by content features, such as RGB histogram and
optimum information retrieval in online image rich system. We SIFT. Then, we propose algorithm DSIL to provide a novel
propose a Mok-SimRank to compute link-based similarity and a dual way of integrating both link and content information.
similarity integration algorithm for both link and content based
similarity. Experimental results on online electronic commerce site
show that our approach is significantly better than traditional methods
in terms of relevance.
I. INTRODUCTION
feasible. In many uses, we need to select a very small set of solely based on human annotations[4] may also lead to
images to show from potentially millions of images. Unlike unsatisfying results if the annotation is wrong, too general, or
ranking, the goal is not to reorder the full set of images but to incomplete. In addition, if the image does not link to any object
select only the “best” ones to show. in the information network, then only based on link
information cannot work. Fig.3 shows several examples that
In “Image Retrieval: Current Techniques, Promising are all linked to tag “flower” but they are not visually similar.
Directions, and Open Issues, by Y. Rui, T.S.Haung”State that
the image retrieval from different views, one is purely text A. Similarity Metric
based and another one is based on the visual features. Most Image similarity can be estimated from image content
popular framework of image retrieval then was to first annotate features , such as colour histogram, edge histogram , Colour
the images by text and then use text-based database Correlogram , CEDD , GIST, texture features , Gabor
management systems (DBMS) to perform image features , shape[3] and SIFT .Normalize feature F RD, where
retrieval.Image Meta search is a type of search engine D is the number ofdimensions in the feature space, to be of unit
specialised on finding pictures, images, animations etc. Like length: for anyfd, the value of feature F on dimension d (d =
the text search, image search is an information retrieval system 1; . . .; D),divide it by the sum of values on all dimensions
designed to help to find information on the Internet and it
allows the user to look for images etc. using keywords or
search phrases and to receive a set of thumbnail images, sorted The chi-square test statistic distance between two feature
by relevancy. A common misunderstanding when it comes to vectors Fi and Fj is defined as:
image search is that the technology is based on detecting
information in the image itself. But most image search works
as other search engines. The metadata of the image is indexed
and stored in a large database and when a search query is
performed the image search engine looks up the index, and
queries are matched with the stored information. The results
are presented in order of relevancy.
III. CONTENT BASED IMAGE SIMILARITIES Figure :3 Images annotated by the tag “flower,” but with low visualsimilarity.
require human-built hierarchies, SimRank is applicable to any The overview of the proposed architecture for the dual
domain with object-to-object relationships, including the Web similarity measure starts with the input of the search query
Nevertheless, existing work on SimRank lacks two important which compares with the database for the link and content
issues. Firstly, although SimRank iterative similarity scores are based similarity. From here the match type is seen and the
known to converge, a real-life computationnaturally involves requested web search done which gives the relevant search
performing a finite number of iterations. Secondly, result as shown in Fig 4..In this case the system architecture
optimization issue of SimRank computation is not the primary shows how the flow is processing in this work. In image
focus of the original SimRank proposal.For a node in a content-based retrieval, most methods and systems compute
image similarity based on image content features. Hybrid
directed graph, we denote by and the set of in-
approach combine text features and image content features
neighbours and out-neighboursof , respectively. Individual in-
together. Most commercial image search engines use textual
neighboursare denoted as , for , similarity to return semantically relevant images and then use
visual similarity to search for visually relevant images.
and individual out-neighbours are denoted as ,
Integration-based approaches use linear or nonlinear
for . combination of the textual and visual features.
Let us denote the similarity between objects and by
A. Link -Based Similarity
. Following the earlier motivation, a SimRank [2] is one of the most popular link-based algorithms
recursive equation is written for . If for evaluating similarity between nodes in information
networks. It computes node similarity based on the idea that
then is defined to be . Otherwise, “two nodes are similar if they are linked by similar nodes in the
network.” Inspirit of PageRank, SimRank computes the
similarity between each pair of nodes in an iterative fashion
with a theoretical guarantee of the convergence.Similar images
are likely to link to similar tags and groups, so we define the
link-based semantic similarity between images as combination
Where is a constant between and . A slight technicality
of similarity of group and similarity of tags. It is defined as
here is that either or may not have any in-neighbours. follows This module iteratively calculate the similarity
Since there is no way to infer any similarity between and between image pairs, similarity between group pairs of images
in this case, similarity is set to , so the and similarity between tag pairs of image until the convergence
summation in the above equation is defined to be is reached.
when or .
integrated feature space has fixed number of dimensions, our similarities: First perform HMok-SimRank to compute the
approach is also applicable. The image vector information is link-based similarities and second perform feature learning
extracted from the image content based on colour and considering the link-based similarity to update the feature
histogram and this vector information is used by the cosine weights, and then update the node similarities based on the
similarity function to measure the similarity. Cosine similarity new content similarity.
is a measure of similarity between two vectors of an inner
product space that measures the cosine of the angle between
Algorithm: Dual Similarity Integration (DSI)
them. The cosine of 0° is 1, and it is less than 1 for any other
angle. It is thus a judgment of orientation and not magnitude: Input: G, the image-rich information network.
two vectors with the same orientation have a Cosine similarity
of 1, two vectors at 90° have a similarity of 0, and two vectors 1. Construct kd-tree (or LSH and cv-tree index) over the
diametrically opposed have a similarity of -1, independent of
image features;
their magnitude.
2. Find top k similar candidates of each object;
Algorithm: Content-Based Similarity Measures
3. Initialize similarity scores;
Input: Social information So
4. Iterate {
Output: Requested images.
Method: 5. Calculate the link similarity for image pairs via HMok-
Step3. Calculate histogram by using following 8. Compute link-based similarity for all group and tag pairs
function: hist(R, G, B) = Ө
viaHMok-SimRank;
Step4. Calculate Cos (Ө) = 0, 1,-1 similarity
9.} until converge or stop criteria satisfied.
function
Output: S, Similarity scores.
Algorithm: Link-Based Similarity Measures
VII. CONCLUSIONS
REFERENCES
Figure: 6 The query image from our Dual Similarity [1] Botterill, Mills,Green,”Speeded-Up Bag-Of-Words Algorithm for Robot
Integration System Localization through Scene Recognition”, IEEE Conference on Image and
Vision Computing,pp.1-6,Year 2008.
[2] Dmitry Lizorkin, PavelVelikhov, Maxim Grinev, Denis Turdakov,
The images retrieved by the electronic commerce website “Accuracy Estimate and Optimization Techniques for SimRank
“FlipKart” for the given keyword iphone6, clearly Computation”, The VLDB Journal, Volume 19, Issue 1, pp. 45-66, Year
demonstrates that the existing system has some irrelevancy 2010.
[3] Greg Mori, Serge Belongie, Jitendra Malik, “Efficient Shape Matching
associated with-it.The top K-most image is retrieved for the Using Shape Context”, IEEE Transactions on Pattern Analysis and
specified keyword; the system obtains the best results in terms Machine Intelligence, Volume 27, Issue 11, p7p.1832-1837,Year 2005.
of the relevance for both semantic and visual appearances. The [4] Jinhui Tang, Haojie Li, Guo Jun Qi, Tat Seng Chua, “Image Annotation
object is tagged with “iPhone, invisible shield, accessories” by Graph-Based Inference with Integrated Multiple/Single Instance
Representations”, IEEE Transactions on Multimedia, Volume: 12, Issue:
and belongs to category “Smart phone.” 2, pp131-141, Year 2010.
[5] KalervoJarvelin and JaanaKekalainen, “IR Evaluation Methods for
C. Performance Evaluation Method. Retrieving Highly Relevant Documents”, Proceedings of the 23rd Annual
The mean average precision is used to measure the retrieval International ACM SIGIR Conference on Research and Development in
Information Retrieval, pp. 41-48,Year 2008.
performance of the various algorithms. For each image in the [6] Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, “The
synthetic data set, we gather a ranking list of relevant images Pagerank Citation Ranking: Bringing Order to the Web”, technical report,
computed by each algorithm and compute the average Stanford University,Year 2008.
precision based on the approximate ground truth before [7] Raghu Krishnapuram, SwarupMedasani, Sung Hwan Jung , Young Sik
Choi, Rajesh Balasubramaniam, “Content-based image retrieval based on
removing tags. The final MAP score for each algorithm is a fuzzy approach”,IEEE Transactions on Knowledge and Data
estimated as the mean average precision ofeach image. There is Engineering Volume:16 Issue:10, pp. 1185-1199,Year 2010.
no need of training data set, i.e. all the algorithms are [8] ShumeetBaluja, Yushi Jing, “VisualRank: Applying PageRank to Large-
unsupervised one. Figure 7 show the result on Amazon data, Scale Image Search”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, Volume: 30, Issue: 11, pp.1877-1890, Year 2008.
respectively. We can observe that link-based similarity [9] Siddiquie, Feris, Davis, “Image Ranking and Retrieval based on Multi-
performs better than text-based similarity; VLWC achieves Attribute Queries”, IEEE Conference on Computer Vision and Pattern
better performance than traditional algorithms by linearly Recognition, pp.801-808, Year 2011.
combining visual and link information together. Algorithm [10] Xin Jin, JieboLuo, Jie Yu, Gang Wang, Dhiraj Joshi and
JiaweiHan,”Reinforced Similarity Integration in Image-Rich Information
DSIL further improves the performance by introducing a novel Networks”, IEEE Transactions on Knowledge and Data Engineering,
way of integrating content and link information. Vol. 25, No. 2, pp.448-460,Year 2013.
5
Integrated Intelligent Research (IIR) International Journal of Computing Algorithm
Volume: 04 Issue: 01 June 2015 Pages:1-5
ISSN: 2278-2397