Thought leadership from the most innovative tech companies, all in one place.

Covariance Calculation Using Python

A guide on how to calculate covariance without using NumPy.

Covariance is the joint variability of two random variables, i.e. if the value of a variable x_ii increases how the value of x_ij changes. If the value of x_ii gets larger with the value of x_ij getting larger (and gets smaller together) we can say covariance is positive. If they move in opposite, then covariance is negative and if they are not correlation, covariance is 0 (zero).

For calculating covariance, we can use NumPy's covariance method:

import numpy as np
a = [[1,2,3],[6,7,8]]
c1 = np.cov(a)
print(c1)

>>
[[1. 1.]
 [1. 1.]]

We can implement it without using NumPy or any external package in Python. First of all, we need to understand how to calculate covariance. The covariance of a matrix can be calculated using below formula (source: Wikipedia):

Covariance calculation (source: Wikipedia)

q_jk is the element in the covariance matrix (j-th row, k-th column). So basically, we calculate the mean of each column vector and sum the products of differences of column vectors from their mean.

To implement this, we first define a helper function to find individual q_jk covariance values given two column vectors of a matrix; i.e. v_j, v_k

def cov_value(x,y):

	mean_x = sum(x) / float(len(x))
	mean_y = sum(y) / float(len(y))

	sub_x = [i - mean_x for i in x]
	sub_y = [i - mean_y for i in y]

	sum_value = sum([sub_y[i]*sub_x[i] for i in range(len(x))])
	denom = float(len(x)-1)

	cov = sum_value/denom
	return cov

After defining our helper function, we are ready to calculate covariance values for each cell in the matrix. For this, our function will get a matrix as input and produce a covariance matrix:

def covariance(arr):
	c = [[cov_value(a,b) for a in arr] for b in arr]
	return c

This is it, we are done! Additionally, we can add a few extra controls to check whether the format and size of the input are correct. We can write a helper function like below to check vector lengths before calculating covariance matrix:

def check_vectors(arr):
	length = len(arr[0])
	x=[1 for a in arr if len(a)!=length]

	if(sum(x)>0):
		raise Exception(f'length of vectors not same')

The final code will look like below:

def cov_value(x,y):

	mean_x = sum(x) / float(len(x))
	mean_y = sum(y) / float(len(y))

	sub_x = [i - mean_x for i in x]
	sub_y = [i - mean_y for i in y]

	sum_value = sum([sub_y[i]*sub_x[i] for i in range(len(x))])
	denom = float(len(x)-1)

	cov = sum_value/denom
	return cov

def check_vectors(arr):
	length = len(arr[0])
	x=[1 for a in arr if len(a)!=length]

	if(sum(x)>0):
		raise Exception(f'length of vectors not same')

def covariance(arr):
	check_vectors(arr)
	c = [[cov_value(a,b) for a in arr] for b in arr]
	return c



Continue Learning