Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 1: What is Data

Let’s begin with the simplest question: What is data?

At a basic level, data is recorded information. It can be numbers, text, images, or signals collected from the world.

However, the meaning of “data” changes depending on the field.

Despite these differences, there is a common structure behind most data: it can often be represented as numbers. Once data is represented numerically, we can organize it into different forms such as vectors and matrices, which allow us to analyze and transform it.

Scalar

A scalar is a single numerical value.

The trem originates from physics, where it refer to a quantity fully described by its magnitude (e.g., temperature,mass).

In data science, we simplify this concept and treat a scalar as a single number.

x=1x = 1
x = 1
print(type(x), x)
<class 'int'> 1

Vector

A vector is an ordered collection of scalars.

It represents one data point with mutiple features in a multidimensional space.

For example, a person’s data might be

which together form a vector.

Mathematically, a vector is often written as:

x=(1,2,3,4,5)x = (1,2,3,4,5)

or as a column vector:

x=[12345]x = \begin{bmatrix} 1\\2\\3\\4\\5 \end{bmatrix}

These represent the same vector, but in different orientations.

lst = [1,2,3,4,5]
print(type(lst),lst)
<class 'list'> [1, 2, 3, 4, 5]

Matrix is a two-dimensionaly array of numbers.

It represents mutiple data points, where

Mathematically:

x=[12345678910]x = \begin{bmatrix} 1 & 2 &3 & 4 & 5 \\6&7& 8&9 &10 \end{bmatrix}

This can be interpreted as :

lst = [[1,2,3,4,5],
       [6,7,8,9,10]]
print(type(lst), lst)
<class 'list'> [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]

To work with data mathematically, we need structured representations. The simplest of these are scalars, vectors and matrices.