Automated machine learning (AutoML) refers to automating the process of applying machine learning to solve real-world problems. AutoML software enables users to develop accurate machine learning models for their data without needing expertise in ML.
The traditional approach to machine learning involves passing data through several stages, including data pre-processing, feature engineering, and hyperparameter tuning, to make an effective ML model. These steps can be challenging and sometimes very time-consuming.
AutoML software runs the entire ML pipeline independently, making machine learning easier for non-experts. The process includes the following steps:
Data preprocessing: AutoML software automatically applies the necessary processing according to the data type, such as handling missing values and removing outliers.
Feature selection: The software automatically picks the required data features for correctly predicting the output. For example, the house name is unimportant for predicting house prices, so the software will ignore such features.
Model selection: The software chooses the machine learning models that give the best output on the given data. For example, convolutional neural networks (CNNs) will perform better in image detection tasks.
Hyperparameter tuning: After selecting different models, the software will vary different hyperparameters for each ML model and return the model with the best results on train and test data.
Although AutoML automates most of the work in developing an ML model, ML experts are still required to interpret the results correctly. The software may give inconsistent results, so ML expertise is needed to investigate the output.
Since the AutoML field is still in development, it may not give as good results as a professional ML engineer.
Let’s demonstrate the use of AutoML model to search the best ML model for diabetes testing. The dataset contains examples of different patient features such as age and blood glucose, and we have to predict the output boolean variable, which indicates whether a patient is likely to get diabetes. We will use the AutoML Python library, PyCaret to predict the model.
Note: Click the "Run" button to execute the code.
# load sample dataset from pycaret.datasets import get_data dataset = get_data('diabetes') from pycaret.classification import ClassificationExperiment s = ClassificationExperiment() s.setup(data = dataset, target = 'Class variable', session_id = 123) best_model = s.compare_models() print("\nBest model is ", best_model)
Line 2: We import the get_data
function from the pycaret.datasets
module to fetch and load datasets that come with the PyCaret library.
Line 3: We use the get_data()
function to load the “diabetes’’ dataset from PyCaret.
Line 5: We import the ClassificationExperiment
class from the pycaret.classification
module to set up and manage classification experiments.
Line 6: We create an instance of the ClassificationExperiment
class.
Line 7: We initialize the experiment using the setup()
function, which takes the following parameters:
data
: This is the pandas DataFrame, which contains the dataset for the experiment.
target
: This specifies the name of the target column in the data. In our case, we used “Class variable,” which is a column name in the data
DataFrame that contains the labels or classes for the classification task.
session_id
: This parameter is used to control the randomness of the experiment. We set the session id to 123. Setting this ensures the reproducibility of the results whenever we re-run the experiment with the same session id.
Line 9: We use the compare_models()
function to compare and find the best model for our data. The PyCaret library returns a table containing the performance of each ML model arranged in descending order.
Line 10: We print the best predicted model.
Free Resources