Selection Methods for a Random Sample From Matrix or Array With Dataset in MATLAB
-
Extract Random Samples Using the
randsample
Function in MATLAB -
Extract Random Samples Using the
datasample
Function in MATLAB -
Extract Random Sample Subsets of a Column From a Dataset Matrix Using
datasample
in MATLAB
We will look at different methods to select random samples from any dataset, array, or matrix using different commands of MATLAB.
To clear your concepts and give you a full insight on how to obtain random samples, we will explain the functions like Randn
, randsample
, datasample
by giving examples of codes to extract random samples from your dataset with replacement as well as without replacement/substitution along with snippets showing how your output will look like.
Let us assume that we have a matrix containing our dataset with 50,000 rows. We want to select a random sample containing 50 entities from our matrix. We can perform this task using more than one random sampling method. Before starting to list these methods, keep in mind that a random sample/data/dataset is some data that is randomly chosen from a matrix of a dataset. To eliminate bias and other undesirable possible repercussions, we use random sampling. But we have to keep in mind that it’s not quite as straightforward as it appears to us. To select a random sample from dataset is more complicated than selecting 10 entities from a dataset consisting of 500 entities. Also, we must ensure whether the random sample is indeed random or not!
Continuing with our assumption, we can use MATLAB to extract random samples from our dataset. MATLAB provides us with several functions to select random samples/data from a given dataset. For example, we can use the function randsample
in MATLAB to choose samples at random out of any array or matrix containing data, both with and without replacement/substitution.
Extract Random Samples Using the randsample
Function in MATLAB
Assuming that N_obs
observations are uniformly picked at random with replacements from entries in the dataset, we use the function:
O_put = randsample(ourdata,N_obs)
Where N_obs
represents the number of observations. If ourdata
is a vector, our output O_put
will also be a vector comprising of N_obs
random samples from the dataset.
Let us use this function to solve our assumed problem.
Code:
%Let's assume we have 50,000 entries in a dataset "ourdata".
ourdata=50000;
%We want to obtain 5 random samples from this dataset
N_obs=5;
%Let's follow the above-explained concept and write our code
O_put = randsample(ourdata,N_obs);
Output:
O_put =
46700
33937
42457
32788
1786
Extract Random Samples Using the datasample
Function in MATLAB
If we want to keep the dimensions in mind while extracting random samples, then we use the below function.
y = datasample(ourdata,N_obs,'Replace',false)
If Replace
is true
, we choose the sample with replacement; otherwise, we choose the sample without replacement. If Replace
is set to false
, we restrict N_obs
so that it is not more than our set number of elements in dataset.
Replace
is true
by default.
true
= sample with replacement.
false
= sample without replacement.
We can accomplish this by writing a single-line code. Keeping the above assumptions in mind, we formulate our code as below.
%Let's assume we have 50,000 entries in a dataset "ourdata".
%We want to obtain 5 random samples from this dataset
%Let's follow the above-explained concept and write our code using function
%datasample
%Let's Draw five unique values from the integers 1:50000 using 1 line code.
O_put = datasample(1:50000,5,'Replace',false);
Output:
O_put =
24489 22279 32315 35467 37732
Extract Random Sample Subsets of a Column From a Dataset Matrix Using datasample
in MATLAB
For this purpose, we will use the randn
function in MATLAB. It creates random values’ arrays with normal distribution.
I_put=randn(A)
produces an A-by-A matrix that contains randomly generated elements.
If A is not scalar (a vector), then MATLAB will display an error message.
Now, to get our random samples, we will use the datasample
function, giving random columns’ subsets of our given data matrix.
Code:
I_put = randn(10,100000);
O_put = datasample(I_put,5,2,'Replace',false)
Output:
O_put =
-0.5995 -0.7377 -1.1902 -0.6021 -1.0812
-0.0572 -0.7831 0.4746 0.7105 -0.8038
0.8401 1.0824 -0.3507 0.4069 -2.0817
-1.1358 -0.9041 -0.1702 0.5950 0.3954
-1.0887 -0.7766 -1.6901 -0.5047 1.1286
-0.0187 -0.3354 -0.7458 1.8554 0.8492
0.3251 -0.4219 0.2440 -0.4750 0.7628
1.4713 -1.9788 -1.6672 0.0035 -0.4316
0.6880 1.4387 -1.3525 -0.6950 0.6411
-0.2777 -0.4776 -0.9841 1.2752 0.2645
Mehak is an electrical engineer, a technical content writer, a team collaborator and a digital marketing enthusiast. She loves sketching and playing table tennis. Nature is what attracts her the most.
LinkedIn