- How to normalize vectors to unit norm in Python
- Here’s how to l2-normalize vectors to a unit vector in Python
- How to l1-normalize vectors to a unit vector in Python
- How to Normalize NumPy Arrays
- How to Use NumPy to Normalize a Vector
- Normalize a NumPy Array using Sklearn
- Normalize 2-Dimensional NumPy Arrays Using Sklearn
- Conclusion
- Additional Resources
- How to Normalize a Vector in Python
- Normalize a Vector Math Formula
- Normalize a Vector with NumPy
- Normalize a Vector in Python
- Use the Mathematical Formula to Normalize a Vector in Python
- Use the numpy.linalg.norm() Function to Normalize a Vector in Python
- Use the sklearn.preprocessing.normalize() Function to Normalize a Vector in Python
- Related Article — NumPy Vector
How to normalize vectors to unit norm in Python
There are so many ways to normalize vectors… A common preprocessing step in machine learning is to normalize a vector before passing the vector into some machine learning algorithm e.g., before training a support vector machine (SVM).
One way to normalize the vector is to apply some normalization to scale the vector to have a length of 1 i.e., a unit norm . There are different ways to define “length” such as as l1 or l2-normalization. If you use l2-normalization, “unit norm” essentially means that if we squared each element in the vector, and summed them, it would equal 1 .
(note this normalization is also often referred to as, unit norm or a vector of length 1 or a unit vector ).
So given a matrix X , where the rows represent samples and the columns represent features of the sample, you can apply l2-normalization to normalize each row to a unit norm. This can be done easily in Python using sklearn.
Here’s how to l2-normalize vectors to a unit vector in Python
import numpy as np from sklearn import preprocessing # 2 samples, with 3 dimensions. # The 2 rows indicate 2 samples. # The 3 columns indicate 3 features for each sample. X = np.asarray([[-1,0,1], [0,1,2]], dtype=np.float) # Float is needed. # Before-normalization. print(X) # Output, # [[-1. 0. 1.] # [ 0. 1. 2.]] # l2-normalize the samples (rows). X_normalized = preprocessing.normalize(X, norm='l2') # After normalization. print(X_normalized) # Output, # [[-0.70710678 0. 0.70710678] # [ 0. 0.4472136 0.89442719]]
It normalized each sample (row) in the X matrix so that the squared elements sum to 1.
We can check that this is the case:
# Square all the elements/features. X_squared = X_normalized ** 2 print(X_squared) # Output, # [[ 0.5 0. 0.5] # [ 0. 0.2 0.8]] # Sum over the rows. X_sum_squared = np.sum(X_squared, axis=1) print(X_sum_squared) # Output, # [ 1. 1.] # Yay! Each row sums to 1 after being normalized.
As we see, if we square each element, and then sum along the rows, we get the expected value of “1” for each row.
How to l1-normalize vectors to a unit vector in Python
Now you might ask yourself, well that worked for L2 normalization. But what about L1 normalization?
In L2 normalization we normalize each sample (row) so the squared elements sum to 1. While in L1 normalization we normalize each sample (row) so the absolute value of each element sums to 1.
Let’s do another example for L1 normalization (where X is the same as above)!
X_normalized_l1 = preprocessing.normalize(X, norm='l1') print(X_normalized_l1) # [[-0.5 0. 0.5] # [ 0. 0.3 0.67]]
Okay looks promising! Let’s do a quick sanity check.
# Absolute value of all elements/features. X_abs = np.abs(X_normalized_l1) print(X_abs) # [[0.5 0. 0.5] # [0 0.3 0.67]] # Sum over the rows. X_sum_abs = np.sum(X_abs, axis=1) print(X_sum_abs) # Output, # [ 1. 1.] # Yay! Each row sums to 1 after being normalized.
We can now see that taking the absolute value of each element, and then summing across each row, gives the expected value of “1” for each row.
The full code for this example is here.
How to Normalize NumPy Arrays
In this tutorial, you’ll learn how normalize NumPy arrays, including multi-dimensional arrays. Normalization is an important skill for any data analyst or data scientist. Normalizing a vector means that its vector magnitude is equal to 1, as a unit vector. This is an important and common preprocessing step that is used commonly in machine learning. This can be especially helpful when working with distance-based machine learning models, such as the K-Nearest Neighbor algorithm.
By the end of this tutorial, you’ll have learned:
- How to use NumPy functions to normalize an array
- How to normalize multi-dimensional arrays in NumPy
How to Use NumPy to Normalize a Vector
In order to normalize a vector in NumPy, we can use the np.linalg.norm() function, which returns the vector’s norm value. We can then use the norm value to divide each value in the array to get the normalized array.
We can generate a reproducible NumPy array using the np.random.rand() function, which is used to generate random values. By passing in a random seed value, we can reproduce our results:
# Generating a Random Array import numpy as np np.random.seed(123) arr = np.random.rand(10) print(arr) # Returns: # [0.69646919 0.28613933 0.22685145 0.55131477 0.71946897 0.42310646 # 0.9807642 0.68482974 0.4809319 0.39211752]
Because NumPy operations happen element-wise, we can apply the transformation directly to the array. Let’s see what this operation looks like in Python:
# Calculating a Vector Norm with NumPy import numpy as np # Generate an Array np.random.seed(123) arr = np.random.rand(10) # Calculate the vector norm vector_norm = np.linalg.norm(arr) print(vector_norm) # Returns: 1.8533621078442797
In the code above, we calculated the vector norm. Once we have this value calculated we can divide each value in the array to get the normalized vector.
# Normalizing a NumPy Vector import numpy as np np.random.seed(123) arr = np.random.rand(10) normalized_vector = arr / np.linalg.norm(arr) print(normalized_vector) # Returns: # [0.37578689 0.15438933 0.12239996 0.29746738 0.38819665 0.22829131 # 0.5291811 0.36950671 0.2594916 0.21157092]
Normalize a NumPy Array using Sklearn
When working on machine learning projects, you may be working with sklearn. Scikit-learn comes with a function that allows you to normalize NumPy arrays. The function allows your code to be a bit more explicit than the method shown above.
Let’s see how we can use the normalize() function from Scikit-learn to normalize an array:
# Normalize a NumPy Array with Scikit-learn import numpy as np from sklearn.preprocessing import normalize np.random.seed(123) arr = np.random.rand(10) print(normalize([arr])) # Returns: # [[0.37578689 0.15438933 0.12239996 0.29746738 0.38819665 0.22829131 # 0.5291811 0.36950671 0.2594916 0.21157092]]
We can see that this method returned the same array as above. It’s important to note here that the function expects multiple samples. Because of this, we reshaped the array by nested it in a list.
Normalize 2-Dimensional NumPy Arrays Using Sklearn
In this section, you’ll learn how to normalize a 2-dimensional array. We can create a reproducible array using the same function but reshaping it into multiple dimensions. Let’s see how we can do this using the reshape() method.
# Creating a 2-Dimensional NumPy Array import numpy as np from sklearn.preprocessing import normalize np.random.seed(123) arr = np.random.rand(20).reshape(2, 10) print(arr) # Returns: # [[0.69646919 0.28613933 0.22685145 0.55131477 0.71946897 0.42310646 # 0.9807642 0.68482974 0.4809319 0.39211752] # [0.34317802 0.72904971 0.43857224 0.0596779 0.39804426 0.73799541 # 0.18249173 0.17545176 0.53155137 0.53182759]]
Now that we have our array created, we can pass the array into the normalize() function from sklearn in order to create normalized arrays:
# Normalize a 2-Dimensional Array in NumPy import numpy as np from sklearn.preprocessing import normalize np.random.seed(123) arr = np.random.rand(20).reshape(2, 10) print(normalize(arr)) # Returns: # [[0.37578689 0.15438933 0.12239996 0.29746738 0.38819665 0.22829131 # 0.5291811 0.36950671 0.2594916 0.21157092] # [0.23254994 0.49403067 0.29719255 0.04043992 0.26972931 0.5000926 # 0.12366305 0.11889251 0.3601986 0.36038577]]
Conclusion
In this tutorial, you learned how to normalize a NumPy array. Normalizing arrays allows you to more easily compare arrays of different scales. You first learned how to use purely NumPy to normalize an array. Then, you learned how to use Scikit-learn to make your code more explicit. Finally, you learned how to use Scikit-learn in order to normalize multi-dimensional arrays.
Additional Resources
To learn more about related topics, check out the tutorials below:
How to Normalize a Vector in Python
To normalize a vector in Python we can either use a standard mathematical formula or use the numpy.linalg.norm() function to complete the task for us.
Normalize a Vector Math Formula
Let’s begin with an example of using the math formula to normalize a vector generated by NumPy.
import numpy as np vector = np.random.rand(5) normalised_vector = vector / np.sqrt(np.sum(vector**2)) print(normalised_vector)
[0.78077874 0.473772 0.27891699 0.09011581 0.28285882]
Note – if the array supplied is empty the above code will throw an error, so it will probably be worth putting some logic to deal with those situations unless you are absolutely sure your input data will never be empty.
Normalize a Vector with NumPy
If you don’t want to deal with writing the math formula for vector normalising, use the numpy.linalg.norm() function. Supply the original vector as the first argument and divide the result by the original vector.
import numpy as np vector = np.random.rand(5) normalised_vector = vector / np.linalg.norm(vector) print(normalised_vector)
[0.25828374 0.55423074 0.37070243 0.55211261 0.42879968]
Normalize a Vector in Python
- Use the Mathematical Formula to Normalize a Vector in Python
- Use the numpy.linalg.norm() Function to Normalize a Vector in Python
- Use the sklearn.preprocessing.normalize() Function to Normalize a Vector in Python
A prevalent notion in the world of machine learning is to normalize a vector or dataset before passing it to the algorithm.
When we talk about normalizing a vector, we say that its vector magnitude is 1, as a unit vector.
In this tutorial, we will convert a numpy array to a unit vector.
Use the Mathematical Formula to Normalize a Vector in Python
In this method, we will compute the vector norm of an array using the mathematical formula. When we divide the array with this norm vector, we get the normalized vector. The following code implements this.
import numpy as np v = np.random.rand(10) normalized_v = v / np.sqrt(np.sum(v**2)) print(normalized_v)
[0.10366807 0.05821296 0.11852538 0.42957961 0.27653372 0.36389277 0.47575824 0.32059888 0.2721495 0.41856126]
Note that this method will return some error if the length of the vector is 0.
Use the numpy.linalg.norm() Function to Normalize a Vector in Python
The NumPy module in Python has the linalg.norm() function that can return the array’s vector norm. Then we divide the array with this norm vector to get the normalized vector. For example, in the code below, we will create a random array and find its normalized form using this method.
import numpy as np v = np.random.rand(10) normalized_v = v/np.linalg.norm(v) print(normalized_v)
[0.10881785 0.32038649 0.51652046 0.05670539 0.12873248 0.52460815 0.32929967 0.32699446 0.0753471 0.32043046]
Use the sklearn.preprocessing.normalize() Function to Normalize a Vector in Python
The sklearn module has efficient methods available for data preprocessing and other machine learning tools. The normalize() function in this library is usually used with 2-D matrices and provides the option of L1 and L2 normalization. The code below will use this function with a 1-D array and find its normalized form.
import numpy as np from sklearn.preprocessing import normalize v = np.random.rand(10) normalized_v = normalize(v[:,np.newaxis], axis=0).ravel() print(normalized_v)
[0.19361438 0.36752554 0.26904722 0.10672546 0.32089067 0.48359538 0.01824837 0.47591181 0.26439268 0.33180998]
The ravel() method used in the above method is used to flatten a multi-dimensional array in Python.
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.