Pragmatic Data Science | 6, linear combination theory

In the previous examples we have been doing a naive prediction from the linear dependency between temperature and staff. However, the temperature, by itself, is a too simplistic model. In our csv file with weather data we have more information available, such as humidity, wind speed etc. In order to do this we are going to use something called a linear combination.

The linear combination we are going to do for our new prediction consists of two parts. Weights (theta) and values (x). Each variable we want to use for our prediction is going to be composed of a pairs of weights and values. To get the final score we simply sum them up.

score = (theta_1 * x_1) + (theta_2 * x_2)

The parenthesis is not really necessary, but is used to better show the pair for each variable we are using in our linear combination. With this model we can append any number of variable pairs to any length we want.

But what is really the thetas and x values doing with the score? A good analogy here might be recording studio. The thetas can be thought of as the sliders on a mixer board.

 

They signifies how much volume is going to be let though to the speakers (the score in our case). We can allow for more volume to be let though if we drag the slider a bit up and allow for less volume if we slide it a bit down. This slider (the theta) can therefore be seen as a weight of how much of this particular channel we are going to let effect the final score.

If the singer is screaming at the top of his lungs into the microphone the volume is going to be higher than if he is whispering. The input to the channel from the microphone is signified by our x variable. We can adjust for “screaming” in the x variable by having a lower theta for that channel if it effects the final score too much. The reverse is also true. If we want to adjust for “whispering” to amplify the input value we can set a higher theta for that x value.

By having these weights for each x we can make a linear combination of multiple variables to form a final scalar scoring.