How to Use Axis Argument to Manipulate a NumPy Array in Python
This article explains how to work with NumPy axis
arguments and see what an axis is in NumPy. We will also learn how to use an axis
argument as a powerful operation to manipulate a NumPy array in Python quickly.
Use an axis
Argument to Manipulate a NumPy Array in Python
To demonstrate, we need some data to work with, but we do not want anything too large and complicated; that is why we have done something we do very frequently. When we are learning something about NumPy, the first thing that comes up is called arrays, so in this case, we already created out to test arrays.
import numpy as np
Temperature_Array = np.array(
[
[[26, 25, 24], [24, 25, 26]],
[[27, 25, 23], [25, 28, 24]],
[[27, 24, 26], [24, 27, 25]],
[[27, 26, 24], [28, 25, 26]],
]
)
Timeseries_Temperature = np.array(
[[23, 24, 25, 24, 23, 25], [25, 26, 27, 29, 25, 23], [20, 23, 21, 22, 25, 29]]
)
The first test array, Temperature_Array
, is supposed to represent a gridded forecast. Let’s say we have our gridded observations and are trying to emulate four-time steps with six stations arranged in two rows and three columns.
The following sub-array would be the first time-step and so on.
[[26, 25, 24], [24, 25, 26]]
You will notice that we made a different number of elements for each of these axes, and there are four-time steps and three columns or three stations in each row and two rows per time step.
A common mistake when trying something with NumPy is making a 3x3 array or a 3x3x3 array, and you think you know what is going on, but when you try it in the real world, it does not work.
That is because your real data does have different numbers of elements in these different directions, and you did not have your slice or whatever you are trying to do.
The second called Timeseries_Temperature
, is simpler. It represents three stations that observe temperature every hour and have six hours.
Our rows are stations, and columns are time.
If you have a five-dimensional array, then you will have a row, column, and depth may be time, but these dimensions are axes of the array. The axis is just an individual part of this NumPy array; it is a direction to go through it.
Let’s look at our Timeseries_Temperature
to get its dimension using the ndim
attribute, which is the number of dimensions of an array.
Timeseries_Temperature.ndim
Output:
2
Let’s say we want to get some information about minimum values. Then we will do something like that:
Timeseries_Temperature.min()
And we get 20 back because 20 is indeed the lowest value in this array, but that is probably not what we want. We want to know what station experienced the lowest temperature at any time in the data, and maybe we want to know the lowest temperature each station experienced.
Or, maybe we want to know the minimum temperature at each time and where it was the coldest at any given time in those 6 hours. This is where the axis
argument can come in and help us a lot.
We do not have to do looping do not have to do manual slicing.
But, to understand it, let’s make a couple of slices here.
Timeseries_Temperature[0, :]
We will get the 0th element in the 0th dimension or the 0th axis that gives us the first row.
array([23, 24, 25, 24, 23, 25])
Let’s look at what happens if we say give us everything, the colon indicates along the zeroth axis and gives us the zeroth items along the one axis.
Timeseries_Temperature[:, 0]
This gives us the 0th column and all rows.
array([23, 25, 20])
Now let’s work with Timeseries_Temperature
again and call the min()
function. If we press the shift + tab, we see that we have an axis
argument, and by default it is None
.
Now we are going to pass the axis
equals 0.
Timeseries_Temperature.min(axis=0)
This gives us the minimum value in the array but the individual element.
array([20, 23, 21, 22, 23, 23])
We had the same shapes in both cases, but instead of using slices, we used an axis
argument, which is the column-wise minimum temperature of any station at every hour.
Now we will collapse axis 1, represented as columns, and get a minimum hour of each.
Timeseries_Temperature.min(axis=1)
Output:
array([23, 23, 20])
Now let’s look at the more complicated case, so we will print out our Temperature_Array
again to show you what it looks like.
Temperature_Array
Output:
array([[[26, 25, 24],
[24, 25, 26]],
[[27, 25, 23],
[25, 28, 24]],
[[27, 24, 26],
[24, 27, 25]],
[[27, 26, 24],
[28, 25, 26]]])
In Temperature_Array
, we have three dimensions row, column, and depth. If we type Temperature_Array[0,:,:]
, then we get the first block, the 0th axis representing the time steps in this case, and each square bracket effectively is an axis.
array([[26, 25, 24],
[24, 25, 26]])
This time, instead of using minimum, we will take some means of Temperature_Array
using the mean()
function.
Temperature_Array.mean()
Output:
25.458333333333332
Now, we will use an axis equal to 0, which means we will collapse the 0th axis, which was our time step’s outermost set of square brackets.
Temperature_Array.mean(axis=0)
We got two-row and three-column arrays which is the overall average of time steps from Temperature_Array
.
array([[26.75, 25. , 24.25],
[25.25, 26.25, 25.25]])
If your data are arranged differently, we might have to use a different axis; in our case, we use axis
equals 1.
Temperature_Array.mean(axis=1)
Here we collapse row numbers which is why we are getting the mean at all time steps of the columns.
array([[25. , 25. , 25. ],
[26. , 26.5, 23.5],
[25.5, 25.5, 25.5],
[27.5, 25.5, 25. ]])
Now we will pass 2 to the axis
argument, and using axis
equals 2, we are collapsing the innermost dimension, represented by columns. It is a row-wise average at each time step or a 4x2 array.
Temperature_Array.mean(axis=2)
Output:
array([[25. , 25. ],
[25. , 25.66666667],
[25.66666667, 25.33333333],
[25.66666667, 26.33333333]])
Full Code:
# In[1]:
import numpy as np
Temperature_Array = np.array(
[
[[26, 25, 24], [24, 25, 26]],
[[27, 25, 23], [25, 28, 24]],
[[27, 24, 26], [24, 27, 25]],
[[27, 26, 24], [28, 25, 26]],
]
)
Timeseries_Temperature = np.array(
[[23, 24, 25, 24, 23, 25], [25, 26, 27, 29, 25, 23], [20, 23, 21, 22, 25, 29]]
)
# In[2]:
Timeseries_Temperature.ndim
# In[3]:
Timeseries_Temperature.min()
# In[4]:
Timeseries_Temperature[0, :]
# In[5]:
Timeseries_Temperature[:, 0]
# In[6]:
Timeseries_Temperature.min(axis=0)
# In[7]:
Timeseries_Temperature.min(axis=1)
# In[8]:
Temperature_Array
# In[9]:
Temperature_Array.ndim
# In[10]:
Temperature_Array[0, :, :]
# In[11]:
Temperature_Array.mean()
# In[12]:
Temperature_Array.mean(axis=0)
# In[13]:
Temperature_Array.mean(axis=1)
# In[14]:
Temperature_Array.mean(axis=2)
Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.
LinkedIn