What is automated machine learning?

Standard approach to machine learning

The traditional approach to machine learning involves passing data through several stages, including data pre-processing, feature engineering, and hyperparameter tuning, to make an effective ML model. These steps can be challenging and sometimes very time-consuming.

Working of AutoML

AutoML software runs the entire ML pipeline independently, making machine learning easier for non-experts. The process includes the following steps:

Data preprocessing: AutoML software automatically applies the necessary processing according to the data type, such as handling missing values and removing outliers.
Feature selection: The software automatically picks the required data features for correctly predicting the output. For example, the house name is unimportant for predicting house prices, so the software will ignore such features.
Model selection: The software chooses the machine learning models that give the best output on the given data. For example, convolutional neural networks (CNNs) will perform better in image detection tasks.
Hyperparameter tuning: After selecting different models, the software will vary different hyperparameters for each ML model and return the model with the best results on train and test data.

Disadvantages

Although AutoML automates most of the work in developing an ML model, ML experts are still required to interpret the results correctly. The software may give inconsistent results, so ML expertise is needed to investigate the output.
Since the AutoML field is still in development, it may not give as good results as a professional ML engineer.

Code example: Diabetes test

Let’s demonstrate the use of AutoML model to search the best ML model for diabetes testing. The dataset contains examples of different patient features such as age and blood glucose, and we have to predict the output boolean variable, which indicates whether a patient is likely to get diabetes. We will use the AutoML Python library, PyCaret to predict the model.

Note: Click the "Run" button to execute the code.

Code explanation

Line 2: We import the get_data function from the pycaret.datasets module to fetch and load datasets that come with the PyCaret library.
Line 3: We use the get_data() function to load the “diabetes’’ dataset from PyCaret.
Line 5: We import the ClassificationExperiment class from the pycaret.classification module to set up and manage classification experiments.
Line 6: We create an instance of the ClassificationExperiment class.
Line 7: We initialize the experiment using the setup() function, which takes the following parameters:
- data: This is the pandas DataFrame, which contains the dataset for the experiment.
- target: This specifies the name of the target column in the data. In our case, we used “Class variable,” which is a column name in the data DataFrame that contains the labels or classes for the classification task.
- session_id: This parameter is used to control the randomness of the experiment. We set the session id to 123. Setting this ensures the reproducibility of the results whenever we re-run the experiment with the same session id.
Line 9: We use the compare_models() function to compare and find the best model for our data. The PyCaret library returns a table containing the performance of each ML model arranged in descending order.
Line 10: We print the best predicted model.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources