[<< wikibooks] Linear Algebra/Topic: Line of Best Fit
Scientists are often presented with a system that
has no solution and they must find an answer anyway.
That is, they must
find a value that is as close as possible to being an answer.
For instance,
suppose that we have a coin to use in flipping.
This coin has some proportion 
  
    
      
        m
      
    
    {\displaystyle m}
   of heads
to total flips, determined by how it is
physically constructed, and we want to know if 
  
    
      
        m
      
    
    {\displaystyle m}
   is near 
  
    
      
        1
        
          /
        
        2
      
    
    {\displaystyle 1/2}
  .
We can get experimental data by flipping it
many times.
This is the result a penny experiment, including some
intermediate numbers.

  
Because of randomness, we do not find the exact proportion
with this sample — there is no solution to this system.

  
    
      
        
          
            
              
                30
                m
              
              
                =
              
              
                16
              
            
            
              
                60
                m
              
              
                =
              
              
                34
              
            
            
              
                90
                m
              
              
                =
              
              
                51
              
            
          
        
      
    
    {\displaystyle {\begin{array}{*{1}{rc}r}30m&=&16\\60m&=&34\\90m&=&51\end{array}}}
  That is, the vector of experimental data is not in the subspace
of solutions.

  
    
      
        
          
            (
            
              
                
                  16
                
              
              
                
                  34
                
              
              
                
                  51
                
              
            
            )
          
        
        ∉
        {
        m
        
          
            (
            
              
                
                  30
                
              
              
                
                  60
                
              
              
                
                  90
                
              
            
            )
          
        
        
        
          
            |
          
        
        
        m
        ∈
        
          R
        
        }
      
    
    {\displaystyle {\begin{pmatrix}16\\34\\51\end{pmatrix}}\not \in \{m{\begin{pmatrix}30\\60\\90\end{pmatrix}}\,{\big |}\,m\in \mathbb {R} \}}
  However, as described above, we want to find the 
  
    
      
        m
      
    
    {\displaystyle m}
   that most nearly works.
An orthogonal projection of the data vector into the line subspace
gives our best guess.

  
    
      
        
          
            
              
                
                  (
                  
                    
                      
                        16
                      
                    
                    
                      
                        34
                      
                    
                    
                      
                        51
                      
                    
                  
                  )
                
              
              ⋅
              
                
                  (
                  
                    
                      
                        30
                      
                    
                    
                      
                        60
                      
                    
                    
                      
                        90
                      
                    
                  
                  )
                
              
            
            
              
                
                  (
                  
                    
                      
                        30
                      
                    
                    
                      
                        60
                      
                    
                    
                      
                        90
                      
                    
                  
                  )
                
              
              ⋅
              
                
                  (
                  
                    
                      
                        30
                      
                    
                    
                      
                        60
                      
                    
                    
                      
                        90
                      
                    
                  
                  )
                
              
            
          
        
        ⋅
        
          
            (
            
              
                
                  30
                
              
              
                
                  60
                
              
              
                
                  90
                
              
            
            )
          
        
        =
        
          
            7110
            12600
          
        
        ⋅
        
          
            (
            
              
                
                  30
                
              
              
                
                  60
                
              
              
                
                  90
                
              
            
            )
          
        
      
    
    {\displaystyle {\frac {{\begin{pmatrix}16\\34\\51\end{pmatrix}}\cdot {\begin{pmatrix}30\\60\\90\end{pmatrix}}}{{\begin{pmatrix}30\\60\\90\end{pmatrix}}\cdot {\begin{pmatrix}30\\60\\90\end{pmatrix}}}}\cdot {\begin{pmatrix}30\\60\\90\end{pmatrix}}={\frac {7110}{12600}}\cdot {\begin{pmatrix}30\\60\\90\end{pmatrix}}}
  The estimate (
  
    
      
        m
        =
        7110
        
          /
        
        12600
        ≈
        0.56
      
    
    {\displaystyle m=7110/12600\approx 0.56}
  ) is a bit high but not much,
so probably the penny is fair enough.
The line with the slope 
  
    
      
        m
        ≈
        0.56
      
    
    {\displaystyle m\approx 0.56}
  
is called the line of best fit for this data.

Minimizing the distance
between the given vector and the vector used as the right-hand side
minimizes the total of these vertical lengths,
and consequently
we say that the line has been
obtained through fitting by least-squares

(the vertical scale here has been exaggerated ten times
to make the lengths visible).
We arranged the equation above so that the line
must pass through 
  
    
      
        (
        0
        ,
        0
        )
      
    
    {\displaystyle (0,0)}
  
because we take take it to be (our best guess at)
the line whose slope is this coin's true proportion
of heads to flips.
We can also handle cases where the line need not
pass through the origin.
For example, the different denominations of U.S. money have different average
times in circulation
(the $2 bill is left off as a special case).
How long should we expect a $25 bill to last?

  
The plot (see below) looks roughly linear.
It isn't a perfect line, i.e.,
the linear system with equations 
  
    
      
        b
        +
        1
        m
        =
        1.5
      
    
    {\displaystyle b+1m=1.5}
  , ..., 
  
    
      
        b
        +
        100
        m
        =
        20
      
    
    {\displaystyle b+100m=20}
   has no
solution, but we can again use orthogonal projection
to find a best approximation.
Consider the matrix of coefficients of that linear system and also its
vector of constants, the experimentally-determined values.

  
    
      
        A
        =
        
          
            (
            
              
                
                  1
                
                
                  1
                
              
              
                
                  1
                
                
                  5
                
              
              
                
                  1
                
                
                  10
                
              
              
                
                  1
                
                
                  20
                
              
              
                
                  1
                
                
                  50
                
              
              
                
                  1
                
                
                  100
                
              
            
            )
          
        
        
        
          
            
              v
              →
            
          
        
        =
        
          
            (
            
              
                
                  1.5
                
              
              
                
                  2
                
              
              
                
                  3
                
              
              
                
                  5
                
              
              
                
                  9
                
              
              
                
                  20
                
              
            
            )
          
        
      
    
    {\displaystyle A={\begin{pmatrix}1&1\\1&5\\1&10\\1&20\\1&50\\1&100\end{pmatrix}}\qquad {\vec {v}}={\begin{pmatrix}1.5\\2\\3\\5\\9\\20\end{pmatrix}}}
  The ending result in the subsection on Projection into a Subspace says
that coefficients 
  
    
      
        b
      
    
    {\displaystyle b}
   and 
  
    
      
        m
      
    
    {\displaystyle m}
   so that the linear combination
of the columns of 
  
    
      
        A
      
    
    {\displaystyle A}
   is as close as possible to the vector 
  
    
      
        
          
            
              v
              →
            
          
        
      
    
    {\displaystyle {\vec {v}}}
  
are the entries of 
  
    
      
        (
        
          
            
              A
            
            
              
                t
                r
                a
                n
                s
              
            
          
        
        A
        
          )
          
            −
            1
          
        
        
          
            
              A
            
            
              
                t
                r
                a
                n
                s
              
            
          
        
        ⋅
        
          
            
              v
              →
            
          
        
      
    
    {\displaystyle ({{A}^{\rm {trans}}}A)^{-1}{{A}^{\rm {trans}}}\cdot {\vec {v}}}
  .
Some calculation
gives an intercept of 
  
    
      
        b
        =
        1.05
      
    
    {\displaystyle b=1.05}
   and a slope
of 
  
    
      
        m
        =
        0.18
      
    
    {\displaystyle m=0.18}
  .

Plugging 
  
    
      
        x
        =
        25
      
    
    {\displaystyle x=25}
   into the equation of the line shows that such a
bill should last between five and six years.
We close by considering
the times for the men's mile race (Oakley & Baker 1977).
These are the world records that were in force on January first
of the given years.
We want to project when a 3:40 mile
will be run.

  
We can see below that the data is surprisingly linear.
With this input

  
    
      
        A
        =
        
          
            (
            
              
                
                  1
                
                
                  1860
                
              
              
                
                  1
                
                
                  1870
                
              
              
                
                  ⋮
                
                
                  ⋮
                
              
              
                
                  1
                
                
                  1990
                
              
              
                
                  1
                
                
                  2000
                
              
            
            )
          
        
        
        
          
            
              v
              →
            
          
        
        =
        
          
            (
            
              
                
                  280.0
                
              
              
                
                  268.8
                
              
              
                
                  ⋮
                
              
              
                
                  226.3
                
              
              
                
                  223.1
                
              
            
            )
          
        
      
    
    {\displaystyle A={\begin{pmatrix}1&1860\\1&1870\\\vdots &\vdots \\1&1990\\1&2000\end{pmatrix}}\qquad {\vec {v}}={\begin{pmatrix}280.0\\268.8\\\vdots \\226.3\\223.1\end{pmatrix}}}
  the Python program at this Topic's end gives

  
    
      
        
          slope
        
        =
        −
        0.35
      
    
    {\displaystyle {\text{slope}}=-0.35}
  
and 
  
    
      
        
          intercept
        
        =
        925.53
      
    
    {\displaystyle {\text{intercept}}=925.53}
  
(rounded to two places; the original data
is good to only about a quarter of a second
since much of it was hand-timed).

When will a 
  
    
      
        220
      
    
    {\displaystyle 220}
   second mile be run?
Solving the equation of the line of best fit
gives an estimate of the year 
  
    
      
        2008
      
    
    {\displaystyle 2008}
  .
This example is amusing, but serves as a caution — obviously the
linearity of the data will break down someday (as indeed it does
prior to 1860).


== Exercises ==
The calculations here are best done on a computer. In addition, some of the problems require more data, available in your library, on the net,  in the answers to the exercises, or in the section following the exercises.

Solutions
Computer Code


== Additional Data ==
Data on the progression of the world's records (taken from the Runner's World web site) is below.


== References ==
Bennett, William (March 15, 1993), "Quantifying America's Decline", Wall Street Journal 
Dalal, Siddhartha; Folkes, Edward; Hoadley, Bruce (Fall 1989), "Lessons Learned from Challenger: A Statistical Perspective", Stats: the Magazine for Students of Statistics:  14-18 
Gardner, Martin (April 1970), "Mathematical Games, Some mathematical curiosities embedded in the solar system", Scientific American:  108-112 
Oakley, Cletus; Baker, Justine (April 1977), "Least Squares and the 3:40 Mile", Mathematics Teacher