Abstracting data reveals many important properties which can’t be seen by analyzing it in record form. Like its always more informative to present summarized graphs of a survey than a tabular data. In this article we will learn how we can draw data points in 2D.
Below we have given adjacency matrix of a graph with four vertices (you can learn about adjacency matrix here), only difference we have given distance between two vertices instead of just a 0/1 flag.
From above table distance between vertex (A, B) = 0.1, (A, C) = 0.4, (B, D) = 0.6 and so on. If we want to draw a graph on paper based on above table would be:
Usually in Data Science data will be given in relative form like distance score or similarity score, we won’t be given any coordinates to draw data points. To overcome this situation we have to put random points and rearrange these to satisfy their similarity scores, like for above table data, we know we have to draw four data points and we have distance data between points so we have to put four random points (points which have random coordinates) and keep arranging these until all placed at correct distance from each other.
To arrange data points, we must pick each random point and calculate its distance with all other random data points and based on difference of actual distance (given in table above) and calculated distance, data points should be moved towards or moved away.
In above figure, we can see A, B, C and D has placed at random locations and after calculating we figure out B has to be moved towards A by 0.2 so it can satisfy given data where distance between (A, B) = 0.1, same for C which has to be moved towards A by 0.3 and D has to be moved away by 0.2.
Now We will create a method to place random data points and arrange these based on given data:
def getCoordinates(data, gradientRate=0.1): #number of data points given n = len(data) #initializing n random points in 2d randomPoints = [[random.random(), random.random()] for i in range(n)] randomDistance = [[0.0 for i in range(n)] for j in range(n)] lastError = None for k in range(1000): #calculating distance between random points for i in range(n): for j in range(n): #distance calculated between random points by √(x1-x2)^2 + (y1-y2)^2 randomDistance[i][j] = math.sqrt(sum([pow(randomPoints[i][x] - randomPoints[j][x], 2) for x in range(len(randomPoints[i]))])) #how many gradient to move to reduce error, initializing to zero gradMoveBy = [[0.0, 0.0] for i in range(n)] totalError = 0.0 #calculating how many gradient to move towards or far, based on given distance and random distance for i in range(n): for j in range(n): if i == j: continue actualDist = data[i][j] # from given data randomDist = randomDistance[i][j] # calculated from randomly placed data points #calculating error percentage in distance (random-actual)/actual errorPercentage = (randomDist-actualDist)/actualDist #every point has to be moved away or move towards in propotion to error percentage gradMoveBy[i] += ((randomPoints[i] - randomPoints[j])/randomDist)*errorPercentage gradMoveBy[i] += ((randomPoints[i] - randomPoints[j])/randomDist)*errorPercentage totalError += abs(errorPercentage) print(totalError) if lastError and lastError < totalError: break lastError = totalError #correcting positions for i in range(n): randomPoints[i] -= gradientRate*gradMoveBy[i] randomPoints[i] -= gradientRate*gradMoveBy[i] return randomPoints
Above we have a loop up to 1000 which breaks if last error is less then total error, where we calculate total error in every iteration and check if moving data points reduced or increased the error, we keep arranging data points until error is getting reduced, the point it increases we break the loop and that will become final position of random points. Rest of the logic is similar what we have already discussed with illustrations.
Now we will write method to draw these coordinates in JPEG Image
def draw2d(data,labels,jpeg='DataPoints2D.jpg'): img=Image.new('RGB',(2000,2000),(255,255,255)) draw=ImageDraw.Draw(img) font = ImageFont.truetype("arial.ttf", 20) for i in range(len(data)): x=data[i]*500 #500 times of each point to see it clear in Image y=data[i]*500 draw.text((x,y),labels[i],(0,0,0),font) img.save(jpeg,'JPEG')
And now we run above methods:
#given data in array of array data = [ [0.0, 0.1, 0.4, 0.7], [0.1, 0.0, 0.8, 0.6], [0.4, 0.8, 0.0, 0.5], [0.7, 0.6, 0.5, 0.0] ] #generated coordinates from random points pointsOn2d = getCoordinates(data)
print("Created coordinates:") print(pointsOn2d) #creating image file draw2d(pointsOn2d, ['A', 'B', 'C', 'D'])
The output will be printed on console as below as well DataPoints2D.jpg file will be created in directory:
[[0.4998823452942812, 0.6392305175643206], [0.3589745795332138, 0.53751508621101
77], [0.8397926615825899, 0.304522095212853], [0.282124871482567, 0.030701644666
Above output we are printing total error in every iteration and it is reducing from 19.81 to 3.43 and once it is increased to 3.45 we break the loop and print final coordinates, below is the image created by code:
There are many packages already exist to display data points in two dimension, above code is given to understand how actually all those packages works or how we can create our own method to display data. We drew A, B, C, D in image file and we can see the similarity between Fig 2 and Fig 4.