Euclidean Distance Score

Finding similarities using euclidean distance score


Category: Machine Learning Tags: Python, Python 3


Introduction

    Every day we visit many websites for shopping and entertainment like Amazon, YouTube etc. So many people have similar taste like some people listen songs by Taylor Swift then those people are considered having similar taste in music. If two people having same taste in music then one might like songs he never listened but other person listens. So based on that, we can create recommendations but before we need to find similar people. To find similarities we can use distance score, distance score is something measured between 0 and 1, 0 means least similar and 1 is most similar.

Implementation

    Let's start with data, suppose we have a set of data where users rated singers, create a file OnlineMusic.py and paste below data:

online_music = {
'Donald':{'Taylor Swift':3.5,'Rihanna':3.0,'Justin Bieber':4.0},
'Chandler':{'Taylor Swift':3,'Rihanna':3.5,'Justin Bieber':4.5},
'Ruby':{'Rihanna':5.0,'Justin Bieber':2.0,'Demi Lovato':3.5, 'MJ':3.0},
'Zoya':{'Taylor Swift': 3.0, 'Rihanna':2.0, 'Justin Bieber':4.0,'Demi Lovato':3.0},
'Sam': {'Rihanna':3.0, 'Justin Bieber':3.5, 'MJ':4.0},
'Robert': {'Rihanna':1.0,'Justin Bieber':2.5,'Demi Lovato':2.5}
}

    Above is a nested dictionary having 6 users and 5 singers rated by different users. We can see Donald rated 3 singers and Sam also rated 3 singers where Justin Bieber and Rihanna is common. We want to find people who has similar taste to Donald. For that create other file EuclideanDistanceScore.py and create function euclidean_distance:

#importing required packages
import sys
import math
sys.path.append('/playPython/Data') #path where OnlineMusic file is stored
from OnlineMusic import online_music

#function calculates distance start
def euclidean_distnce(music_data, person1, person2):
common_item = {}
#common buy in person1 and person2
for item in music_data[person1]:
if item in music_data[person2]:
common_item[item] = True

#if no item is common
if len(common_item) == 0: return 0

#calculate distance
#√((x1-x2)^2 + (y1-y2)^2)
distance = sum([math.pow(music_data[person1][itm] - music_data[person2][itm], 2) for itm in common_item.keys()])
distance = math.sqrt(distance)
#return result
return 1/(distance + 1)

    Above code at first filtering what are common singers rated by person1 and person2 and appending into common_item object. if nothing is common it will return 0 means zero distance score else it will calculate distance. Remember formula used we read in school finding distance between two points P1(X1, Y1) and (X2, Y2)in 2d geometry:

    Distance = √((X1 - X2)2 + (Y1 - Y2)2)

    Let's suppose we are representing Taylor Swift with X-axis and Rihanna with Y-axis then we plot ratings by users:


  Euclidean Distance

    In above 2-D representation we can see how people are plotted Chandler(3, 3.5), Zoya(3, 2) and Donald(3.5, 3).  If we calculate using distance formula Chandler is closed to Donald than Zoya.

    Same calculation we did in above code, we are summing up squares of difference and then square root of result. So less distance means more similar, we can divide 1 by distance which will give us score between 0 and 1 as well the definition will be changed like more score means more similar. We are adding one just to escape from divide by zero exception.

    Now let's execute this function, in same file we will write:

#calling funcrtion
similarities = [(euclidean_distnce(online_music, 'Donald', other), other) for other in online_music.keys() if 'Donald' != other]
similarities.sort()
similarities.reverse()
print(similarities)

here we are making array of score by getting score of others in comparison to Donald then we are sorting and reversing it for higher score first. The array will be printed by last line:

[(0.6666666666666666, 'Sam'), (0.5358983848622454, 'Chandler'), (0.4721359549995794, 'Zoya'), (0.2857142857142857, 'Robert'), (0.2612038749637414, 'Ruby')]

Conclusion

    We have seen most similar person to Donald is Sam, we didn't plot Sam in graph because they have only one common rating "Taylor Swift" but with distance formula if we run above code for Sam instead of Donald then Chandler and Ruby will be most similar to him.

    This method works well and gives us distance scores of two people, we can see distance scores and tell which people may have same taste and can use that data to recommend things to users.


Like 0 People
Last modified on 1 September 2018
Nikhil Joshi

Nikhil Joshi
Ceo & Founder at Dotnetlovers
Atricles: 127
Questions: 9
Given Best Solutions: 8 *

Comments:

No Comments Yet

You are not loggedin, please login or signup to add comments:

Existing User

Login via:

New User



x