1015SCG
Lecture 4
|
|
|
|
Slope and intercept form: \(y = mx + b\) Simple linear regression: \(\hat{y} = \beta_1 x + \beta_0\) If \(\hat{y_i} = \beta_1 x_i + \beta_0,\) we define the residuals as \(e_i= y_i - \hat{y_i}\) Then the residual sum of squares (RSS) is \( \ds \sum_{i} e_i^2\) The least squares approach chooses $\beta_1\,$ and $\,\beta_0$ to minimize the RSS. |
|
For a dataset of pairs \( (x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n), \) we assume that the relationship between \(x\) and \(y\) can be described by a straight line plus a random "error":
\( y_i = \beta_1 x_i + \beta_0 + \varepsilon_i \)
For a dataset of pairs \( (x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n), \) we assume that the relationship between \(x\) and \(y\) can be described by a straight line plus a random "error":
\( y_i = \beta_1 x_i + \beta_0 + \varepsilon_i \)
Least-squares coefficient estimates
\(\;\ds \beta_1 = \frac{\ds\sum_{i=1}^{n} (x_i-\bar{x})(y_i-\bar{y})}{\ds\sum_{i=1}^{n} (x_i-\bar{x})^2},\) \(\;\;\;\; \beta_0 = \bar{y} - \beta_1 \,\bar{x} \)
where \(\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i\;\) and \(\;\bar{y} = \frac{1}{n}\sum_{i=1}^{n}y_i\)
The data in the Lecture sheets β workshop tab gives the workshop attendance, workshop mark, Inference/Maths Task mark, Scientific Critique Task mark and overall course mark (total marks).
Excel Instructions:
The line of best fit to our data is \[ \hat{y} = \beta_1 x + \beta_0 \]
Source: Randal Munroe xkcd.com/1725
Linear regression is used in AI π€
AI relies on Calculus, Probability, Statistics and Linear Algebra.
Look at the Lecture sheets Excel file, tab fatalities.
|
Group 1 results: \( x = 1.643 \pm 0.015 \, \text{s}\;\) π Group 2 results: \( x = 1.64882 \pm 0.00040 \, \text{s}\;\) π The theory says that \(t = \ds \sqrt{\dfrac{2L^2}{h \times g}}\) \(= 1.648731\, \text{s}\) where $g = 9.81\, \text{m}/\text{s}^2,$ $h= 0.3\, \text{m}$ and $L = 2\,\text{m}.$ |
| Result (seconds) | |
| Group 1 | \( 1.643 \pm 0.015\) |
| Group 2 | \( 1.64882 \pm 0.00040 \) |
| Theory | \(\ds 1.648731\) |
Claim: Experimental results agree with the theory.
π€ Is this claim true? False?
We will use theoretical probability distributions to test our hypothesis.
Hypothesis testing - part of the scientific method.
Hypothesis testing - part of the scientific method.
In statistics, we frame the questions as a hypothesis:
We start with Null hypothesis $\text{H}_0$ - hypothesis that matches our predictions.
Then we use statistical methods to accept or reject the hypothesis.
Do the experimental results agree with the expected value?
There is true/expected value for our experimental result $\mu_T.$
Repeat measurement $n$ times.
It is more likely than the measurements are scattered about the true value than biased.
I can estimate how likely it is that the true value matches our experiment with given $\mu_T$ and $\text{SE}.$
How likely is it that the true value \( \mu_T \) matches our experiment?
Example: The true value lies in the range \[ 1.1 \pm 0.4 \,\text{m},\] with 95% confidence level ($\text{CL}$).
How likely is it that the true value \( \mu_T \) matches our experiment?
Example: The true value lies in the range \( 1.1 \pm 0.4 \,\text{m} \), 95% $\text{CL}$
What it means:
There are two ways to approach confidence levels and confidence intervals:
Goal: To determine and compare two values
1. $t$-critical value and 2.
$t$-statistic value.
Does the estimate \( \mu \) match the expected or theoretical value \( \mu_0 \)?
If $t\leq t_c,$ the estimate \( \mu \) agrees
with the
expected/theoretical value \( \mu_0 .\)
If not, the estimate \( \mu \) does not agree with
the expected/theoretical value \( \mu_0.\)
| $\text{df}$ | 50% | 60% | 70% | 80% | 90% | 95% | 98% | 99% | 99.5% | 99.8% | 99.9% |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1.000 | 1.376 | 1.963 | 3.078 | 6.314 | 12.706 | 31.821 | 63.657 | 127.321 | 318.309 | 636.619 |
| 2 | 0.816 | 1.061 | 1.386 | 1.886 | 2.920 | 4.303 | 6.965 | 9.925 | 14.089 | 22.327 | 31.599 |
| 3 | 0.765 | 0.978 | 1.250 | 1.638 | 2.353 | 3.182 | 4.541 | 5.841 | 7.453 | 10.215 | 12.924 |
| 4 | 0.741 | 0.941 | 1.190 | 1.533 | 2.132 | 2.776 | 3.747 | 4.604 | 5.598 | 7.173 | 8.610 |
| 5 | 0.727 | 0.920 | 1.156 | 1.476 | 2.015 | 2.571 | 3.365 | 4.032 | 4.773 | 5.893 | 6.869 |
| 20 | 0.687 | 0.860 | 1.064 | 1.325 | 1.725 | 2.086 | 2.528 | 2.845 | 3.153 | 3.552 | 3.850 |
| 100 | 0.677 | 0.845 | 1.042 | 1.290 | 1.660 | 1.984 | 2.364 | 2.626 | 2.871 | 3.174 | 3.390 |
| β | 0.674 | 0.842 | 1.036 | 1.282 | 1.645 | 1.960 | 2.326 | 2.576 | 2.807 | 3.090 | 3.291 |
When comparing the best estimate of $n$ experiments with a single value, $\text{df} = n-1$:
=T.INV.2T(probability, deg_freedom)
probability - number from 0 to 1, statistical significance
probability = \(\ds 1 -\frac{\text{CL}}{100\%}\)
π \(\text{CL}=\) Confidence level
π deg_freedom $=n-1$
|
A research study was conducted to examine the differences between older and younger adults on perceived life satisfaction. Several older adults and several younger adults were given a life satisfaction test. Scores on the test range from 0 to 60, with high scores indicative of high life satisfaction, low scores indicative of low life satisfaction. The scores from 20 years ago were 45 for old people and 37 for young people. Compare if life satisfaction scores for both old and young adults are the same now? Use \(\text{CL}\) 95%. |
β¨Note: Data available in your Excel file. |
Do the two experiments agree?
If yes - the experiments agree with each other to the
$\text{CL}$
If not - the experiments do not agree with each
other.
π $t$-critical =T.INV.2T(probability, deg_freedom)
Welch $t$-test Data β Data Analysis
β t-Test: Two-Sample Assuming Unequal Variances
|
A research study was conducted to examine the differences between older and younger adults on perceived life satisfaction. Several older adults and several younger adults were given a life satisfaction test. Scores on the test range from 0 to 60, with high scores indicative of high life satisfaction, low scores indicative of low life satisfaction. Can you say with 98% confidence level that the life satisfaction scores are the same between these two groups? |
β¨Note: Data available in your Excel file. |
|
Group 1 results: \( x = 1.643 \pm 0.015 \, \text{s}\) Group 2 results: \( x = 1.64882 \pm 0.00040 \, \text{s}\) The theory says that \(t = \ds \sqrt{\dfrac{2L^2}{h \times g}}\) \(= 1.648731\, \text{s}\) where $g = 9.81\, \text{m}/\text{s}^2,$ $h= 0.3\, \text{m}$ and $L = 2\,\text{m}.$ |
| A | B | C | |
| 1 | Group 1 | Group 2 | |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| 8 | 6 | 1.694 | 1.650007 |
| 9 | Average | =AVERAGE(B3:B8) | =AVERAGE(C3:C8) |
| 10 | SD | =STDEV.S(B3:B8) | =STDEV.S(C3:C8) |
| 11 | n | =COUNT(B3:B8) | =COUNT(C3:C8) |
| 12 | SE | =STDEV(B3:B8)/SQRT(B11) | =STDEV(C3:C8)/SQRT(C11) |
| 13 | Confidence Level | =0.95 | =0.95 |
| 14 | t-critical | =T.INV.2T(1-B13, B11-1) | =T.INV.2T(1-C13, C11-1) |
| 15 | t-stat | =ABS(B9-1.648731)/B12 | =ABS(C9-1.648731)/C12 |
Compare the values t-stat and t-critical.
But there are many more...
See you in Week 5!