8/20/18

# Machine Learning Matrices

by David Paper

Machine learning aficionados are heavily reliant on matrices to store, process, and even share numeric data for predictive purposes. Machine learning algorithms (estimators) use matrix operations to enable effective prediction. Simply, a matrix is a 2 dimensional array of numbers. For data science applications, a learning problem dataset (matrix) consists of a set of *n* samples with *n* features. That is, each dataset generally has two pieces: a feature set (sample) represented as a multi-dimensional matrix, and target (outcomes) represented as an array. Each element in a sample thereby has an associated target value that can be compared against predictions with an appropriate machine learning algorithm to gauge accuracy.

An illustration of a learning problem dataset should help reduce any confusion. Figure 1 shows a few data elements from the popular *iris* dataset.

^{Figure 1: The first three data elements from the iris dataset.}

Figure 1 shows the first three elements from the feature dataset and target. Each feature set element has four features – sepal length, sepal width, petal length, and petal width represented in centimeters. Each target is classified as species setosa, versicolor, or virginica. Since machine learning algorithms can only process numeric data, species is represented as 0 (setsosa), 1 (versicolor), or 2 (virginica). So, the first feature set element has sepal length 5.1 cm, sepal width 3.5 cm, petal length 1.4 cm, and petal width 0.2 cm. And, the first target is 0 (setsosa). So, a good machine learning estimator should predict 0 (setsosa) given the features from the first feature set element. Notice that the feature set is represented as a two-dimensional matrix, and target as a one-dimensional matrix.

Image data cannot be directly manipulated, because it’s structured in three-dimensional space. The image data must be transformed from three- to two-dimensional space. An illustration of transformation should help you to understand this process. Figure 2 shows the 500^{th} image in the *digits* dataset represented as an 8x8 matrix.

^{Figure 2: An image element represented as an 8x8 matrix.}

The dataset supplying the image data in Figure 2 consists of 1,797 samples of 8x8 images as shown in Figure 3. Each sample can be a digit from 0-9.

^{Figure 3: Shape of three-dimensional digits dataset.}

Since the dataset is three-dimensional (1797x8x8), each image must be transformed (flattened) into a vector as shown in Figure 4.

^{Figure 4: Flattened vector of 500th image in digits dataset.}

Figure 4 shows the image flattened into a vector that can now be used for prediction. The target (actual) digit for this vector is the digit 8. After transforming each image into flattened arrays, the feature set data and target shapes are 1,797x64 and 1,797 respectively as shown in Figure 5.

^{Figure 5: Digit samples shape and target shape.}

Figure 5 shows that there are 1,797 digits represented by 64 pixels for each sample image, and 1,797 corresponding target values. Now, we can train a machine learning classification algorithm with image samples to see how well it predicts target values. To visualize an image, we must transform it from a flattened array of 64 pixels back to a two-dimensional (8x8) image – such as in Figure 6 – so it can be displayed.

^{Figure 6. Transformed vector (exact matrix as in Figure 2.)}

The matrix in Figure 6 shows the image vector transformed from a flattened vector into an 8x8 matrix. A representation of the image can now be displayed by a plotting library function. In this case, the 500^{th} image target value is the digit 8. Let’s see if the displayed image of the matrix that represents the 500^{th} image looks like an 8. Figure 7 shows each value from the matrix translated into a shade of grey.

^{Figure 7. The image from Figure 6 plotted for display purposes.}

The matrix representation of numeric feature sets, images, and any data in my experience allows machine learning algorithms to make precise predictions. Additionally, the N samples and N features structure is scalable to any size matrix. That is, you can have a small matrix with a few samples of a few features all the way to a huge matrix with millions (or more) samples with many features (the only limitation being computational expense of large samples.)

**About the Author**

**Dr. David Paper** is a full professor at Utah State University in the Management Information Systems department. He is author of *Data Science Fundamentals for Python and MongoDB*. David has over 70 publications in refereed journals such as Organizational Research Methods, Communications of the ACM, Information & Management, Information Resource Management Journal, Communications of the AIS, Journal of Information Technology Case and Application Research, and Long Range Planning. He has also served on several editorial boards in various capacities, including associate editor. Besides growing up in family businesses, Dr. Paper has worked for Texas Instruments, DLS, Inc., and the Phoenix Small Business Administration. He has performed IS consulting work for IBM, AT&T, Octel, Utah Department of Transportation, and the Space Dynamics Laboratory. Dr. Paper's teaching and research interests include data science, process reengineering, object-oriented programming, electronic customer relationship management, change management, e-commerce, and enterprise integration.

*This article was contributed by Dr. David Paper, author of Data Science Fundamentals for Python and MongoDB*.