Once you have performed feature scaling on your data all the fun stuff can begin. When the temperature data is plotted over time in fahrenheit, on the y-axis we can see how the temperature is fluctuating over the year. When this data is scaled this internal structure is kept. Before scaling, this is how our data looks when plotted.
After the feature scaling the data looks very similar but with a different y-axis.
The fact that the temperature started out as fahrenheit is no longer relevant, but instead the relative change is preserved and can be seen more as a decimal percentage where 0 is the coldest observed day and 1 is the hottest observed day.
intuitively, in order to map this to how many people should come and help out in the store we can say that when the temperature is 100% we should bring 100% of the available people. In the naive example from the video these two relationships between people and temperature are linearly dependent.
If we give the program the current temperature (data that do not exist in the dataset) it can now make the first naive predictions. The prediction consists of the following steps:
- Input is given in farenheit
- Input is normalized to [0, 1]
- The normalized input is multiplied by the maximum number of people we can bring in
The last step here is particularly interesting and I suggest you take your time to really understand what is going on here. It might seem trivial, but what we have effectively have done is to move from one scale, the fahrenheit scale, to some other handcrafted scale, the employment scale. We can do this since we know the relationships between these scales. In this naive prediction it is quite easy since they are linearly dependent. Intuitively, this means e.g. that if the temperature is at 87% we should bring in 87% of our staff.
By normalizing our data we found this common ground for “muchness”, where we can go from one scale to another via the normalization procedure.