Numerical data are values that can be measured and organized logically. Their characteristics are numbers that describe an object’s various properties.
Data in the real world can exist in different forms like:
However, to deal with real data in data science, we always convert all types of data type into numerical data.
In image data, we have pixel values. Images are stored in machines as a matrix of numbers. The size of this matrix is determined by the number of pixels in each image.
Images can be of two types:
Grayscale images have a single matrix of pixels, which has only white, black, and shades of a gray color. The grayscale image has an 8-bit color format.
Color images have three different matrices of RGB (red, green, and blue) channels. In colored images, all colors are shown using RGB with a 24-bit color format.
The value of these pixels can lie between 0 to 255, but what does this number define? The values represent the intensity or brightness of a pixel. Black is characterized by smaller numbers (closer to zero), while white is represented by larger values (closer to 255).
The Python code below will take an image from the user and then return the pixel values as the output.
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimage = plt.imread('__ed_input.png', format='jpeg')print(image)
One of the most common applications for machine learning techniques in text analysis. In machine learning, vectorization converts textual data into numerical data. It’s a crucial task because machine learning techniques can’t be used directly on text, as they only support numerical input.
Let’s take a look at the code below.
from sklearn.feature_extraction.text import CountVectorizervect = CountVectorizer()filedata=open('text.txt','r')text=[]for x in filedata.readlines():text.append(x)vect.fit(text)train = vect.transform(text)print(train.toarray())
CountVectorizer
function.Free Resources