Genetic Algorithm in Machine Learning using Python
One of the advanced algorithms in the field of computer science is Genetic Algorithm inspired by the Human genetic process of passing genes from one generation to another.It is generally used for optimization purpose and is heuristic in nature and can be used at various places. For eg – solving np problem,game theory,code-breaking,etc.
Another trending and useful modern-day tech is Machine Learning creating a lot of impacts on mankind which involve learning and finding the pattern in the large amount of data for classification and regression.
But can we somehow involve genetic algorithm in machine learning? How will it affect the results? Let’s find out.
Here are quick steps for how the genetic algorithm works:
- Initial Population – Initialize the population randomly based on the data.
- Fitness function – Find the fitness value of the each of the chromosomes(a chromosome is a set of parameters which define a proposed solution to the problem that the genetic algorithm is trying to solve)
- Selection– Select the best fitted chromosomes as parents to pass the genes for the next generation and create a new population
- Cross-over– Create new set of chromosome by combining the parents and add them to new population set
- Mutation– Perfrom mutation which alters one or more gene values in a chromosome in the new population set generated. Mutation helps in getting more diverse oppourtinity.Obtained population will be used in the next generation
Repeat step 2-5 again for each generation
Now, let’s get our hands on the code:
Initially, we will run the Logisitcs regression algorithm on breast cancer data.
Import libraries
We will import the important python libraries required for this algorithm.
import numpy as np import pandas as pd import random import matplotlib.pyplot %matplotlib inline
Import some other important libraries for implementation of the Machine Learning Algorithm.
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score
Data
Import the dataset from the python library sci-kit-learn.
#import the breast cancer dataset from sklearn.datasets import load_breast_cancer cancer=load_breast_cancer() df = pd.DataFrame(cancer['data'],columns=cancer['feature_names']) label=cancer["target"]
Splitting dataset into test and train.
#splitting the model into training and testing set X_train, X_test, y_train, y_test = train_test_split(df, label, test_size=0.30, random_state=101)
Training using Logistics Regression Technique-
#training a logistics regression model logmodel = LogisticRegression() logmodel.fit(X_train,y_train) predictions = logmodel.predict(X_test) print(«Accuracy Accuracy score after genetic algorithm is tags-link»> Logistic Regression Machine Learning
Abhinav Choudhary works or receives funding from a company or organization that would benefit from this article. Views expressed here are supported by a university or a company.
API Reference
class genetic_selection. GeneticSelectionCV ( estimator , cv = None , scoring = None , fit_params = None , max_features = None , verbose = 0 , n_jobs = 1 , n_population = 300 , crossover_proba = 0.5 , mutation_proba = 0.2 , n_generations = 40 , crossover_independent_proba = 0.1 , mutation_independent_proba = 0.05 , tournament_size = 3 , n_gen_no_change = None , caching = False ) [source]
Feature selection with genetic algorithm.
- estimator (object) – A supervised learning estimator with a fit method.
- cv (int,cross-validation generatororan iterable,optional) – Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is a classifier or if y is neither binary nor multiclass, KFold is used.
The number of selected features with cross-validation.
The mask of selected features.
array of shape [n_features]
The maximum cross-validation score for each generation.
array of shape [n_generations]
The external estimator fit on the reduced dataset.
An example showing genetic feature selection.
>>> import numpy as np >>> from sklearn import datasets, linear_model >>> from genetic_selection import GeneticSelectionCV >>> iris = datasets.load_iris() >>> E = np.random.uniform(0, 0.1, size=(len(iris.data), 20)) >>> X = np.hstack((iris.data, E)) >>> y = iris.target >>> estimator = linear_model.LogisticRegression(solver="liblinear", multi_class="ovr") >>> selector = GeneticSelectionCV(estimator, cv=5) >>> selector = selector.fit(X, y) >>> selector.support_ array([ True True True True False False False False False False False False False False False False False False False False False False False False], dtype=bool)
Fit the GeneticSelectionCV model and the underlying estimator on the selected features.
- X (,sparse matrix>,shape =[n_samples,n_features]) – The training input samples.
- y (array-like,shape =[n_samples]) – The target values.
- groups (array-like,shape =[n_samples],optional) – Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group” cv instance (e.g., GroupKFold ).
Reduce X to the selected features and then predict using the underlying estimator.
X (array of shape [n_samples, n_features]) – The input samples.
y – The predicted target values.
Reduce X to the selected features and return the score of the underlying estimator.
© Copyright 2016-2022, Manuel Calzolari. Revision b119bad4 .