PythonPythonME

Machine learning

Grid Search CV :

Grid Search CV (Cross-Validation) is a method for locating a machine learning algorithm's ideal hyperparameters.

Hyperparameters are the parameters that are defined before training rather than being learned from the data during the training process.

The pace of learning, the quantity of hidden layers in a neural network, and the quantity of trees in a random forest are a few examples of hyperparameters.

Cross-validation is used to systematically assess the model performance for each combination of hyperparameters when utilizing Grid Search CV, which includes providing a grid of hyperparameter values.

Typically, the grid is described as a list of dictionaries, each of which contains a collection of hyperparameters that must be assessed.

For instance, a grid search for a neural network might evaluate various hidden layer sizes and learning rates.

The model is trained and assessed for each set of hyperparameters during the grid search in order to determine how well the model performs with fresh data.

Any performance statistic suitable for the specific problem, such as accuracy, F1-score, or area under the ROC curve, can be used to assess the model's performance.

The performance metric is used to pick the model with the best performance after all hyperparameter combinations have been assessed.

Then, using additional data, this model can be used for further analysis and prediction.

Grid Search CV can be computationally expensive, especially when searching over a large number of hyperparameters in complex models and huge datasets.

However, compared to using default hyperparameters, it can ensure that the model is optimized for the current situation and improve performance.

A demonsration is shown below :

Codeblock E.1. Grid search CV 1 demonstration.

In this illustration, we load the iris dataset first before defining a hyperparameter grid for a decision tree classifier.

The maximum depth of the tree, the bare minimum of samples needed to split an internal node, and the bare minimum of samples needed to be at a leaf node are the hyperparameters to be searched.

After that, we define a grid search object with GridSearchCV() and build a decision tree classifier.

We specifically state that we wish to assess each hyperparameter combination's performance using 5-fold cross-validation.

Click the below button to get access to the above ipynb file.

Download. Download the Grid Search CV 1.ipynb files used here.

Another second demo is given below :

Codeblock E.2. Grid search CV 2 demonstration.

To determine the optimum hyperparameters for a support vector machine classifier on the breast cancer dataset, this algorithm uses grid search cross-validation. The actions are:

Activate the dataset on breast cancer.
Create training and test sets from the data.
A support vector machine classifier should be made.
Establish the parameter grid for the search.
With the classifier and parameter grid, create a grid search object and specify the number of folds for cross-validation (in this case, 5).
Match the training data to the grid search object.
Print the optimal grid search parameters and results.

Download. Download the Grid Search CV 2.ipynb files used here.

A third example has the following example case :

Codeblock E.3. Grid search CV 3 demonstration.

In this illustration, we divide the scikit-learn wine dataset into training and testing sets. The regularization parameter C and the kernel coefficient gamma for an SVM model are then defined as the hyperparameters to be modified with GridSearchCV.

We specify the SVM model and utilize GridSearchCV to cross-validate the best hyperparameters.

We output the test set's most accurate hyperparameters together with their related accuracy scores. Additionally, we calculate the test set's confusion matrix and create a heatmap of it.

The decision boundary for the SVM model with the best hyperparameters is then plotted. The following plot demonstrates how the model divides the three wine groups according to the dataset's first two attributes.

Figure E.1. The Best hyperparameters: {'C': 10, 'gamma': 0.1} while the Accuracy: 0.51

Using the wine dataset, this method conducts grid search to identify the ideal hyperparameters for a support vector machine (SVM) model. The code's components are listed below:

Import the required modules, including the load_wine function from sklearn.datasets to load the wine dataset, the train_test_split and gridsearchcv functions from sklearn.model_selection to divide the data into training and testing sets, the SVC class from sklearn.svm to build the SVM model, and the confusion_matrix function from sklearn.metrics to calculate the confusion matrix.
Use the load_wine method to load the wine dataset.
Using the train_test_split function and a random seed of 0, divide the dataset into training and testing sets.
Create a dictionary of possible parameter values. In this instance, we are looking for distinct values of the kernel coefficient gamma and regularization parameter C.
Using the GridSearchCV function, create a grid search object with cross-validation using the SVM model as the estimator, the parameter grid, and 5-fold cross-validation.
Utilizing the fit method, fit the grid search object to the training data.
Print out the top hyperparameters identified by the grid search together with the model's accuracy rating in relation to the test data.
Utilizing the grid search predicted labels and the confusion_matrix function, compute the confusion matrix for the testing data.
Use the plt.imshow, plt.xticks, plt.yticks, plt.xlabel, plt.ylabel, plt.colorbar, and plt.title functions from matplotlib.pyplot to plot the confusion matrix as a heatmap.

Download. Download the Grid Search CV 3.ipynb files used here.

---- Summary ----

As of now you know all basics of Grid Search CV.

A method for locating the ideal hyperparameters for a machine learning model is called grid search CV.
Hyperparameters are variables that are explicitly set prior to training rather than being learned by the model during training.
Grid search CV entails building a grid of potential hyperparameter values and assessing the model's performance for every conceivable set of hyperparameters.
To assess how well the model performs with each set of hyperparameters, cross-validation is utilized.
The greatest performance metric obtained during cross-validation is used to determine which set of hyperparameters is the best.
Particularly for big hyperparameter grids and complicated models, grid search CV can be computationally demanding.
In contrast to manual tuning or using default hyperparameter values, it can result in noticeably higher performance.
Grid search CV is a commonly used machine learning technique that is supported by numerous tools and frameworks, including Python's scikit-learn.