NumPy Autocorrelation
In data science, variables of a dataset can be related to each other in some way or the other. The relationship could be directly proportional or indirectly proportional. A simple change in one variable might change some variable slightly or maybe, drastically. This phenomenon is known as correlation.
Autocorrelation refers to a correlation between a set of time signals with an outdated or old version of itself. The two sets of time signals have some time difference between them.
Calculate Autocorrelation in NumPy
The robust data science library, NumPy, has an in-built function, correlate()
, that can be used to find a correlation between two 1D sequences. It accepts two 1D arrays and a type of mode.
The mode type can be valid
, same
, and full
, and this parameter is optional. The default value for this parameter is valid
.
To learn more about this function, refer to the official documents
import numpy
myArray = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
myArray = numpy.array(myArray)
result = numpy.correlate(myArray, myArray, mode="full")
result = result[result.size // 2 :]
print(result)
Output:
[385 330 276 224 175 130 90 56 29 10]
In the above code, we first define a list of numbers and then convert it to a NumPy array using the NumPy’s array()
method. Then we call our method of interest correlate()
to compute our data’s autocorrelation. We are using the full
mode for the calculations.
The results are stored in a result
variable and then sliced. The slicing part is crucial since the correlate()
method returns an array of size 2 * length of our array - 1
, and the values of our interest lie in the second half, that is, [(result.size // 2), result.size)
.