1015SCG
Lecture 3
Every measurement comes with an error:
Measurement errors:
Sampling error:
|
Two groups of student are sliding the block down the incline. They are measuring the time it takes the block to slide on the smooth surface. Group 1 – uses stopwatch to measure time Group 2 – uses fancy photogate Each group repeats the measurement 6 times. |
|
|
Two groups of student are sliding the block down the incline. They are measuring the time it takes the block to slide on the smooth surface. Group 1 – uses stopwatch to measure time Group 2 – uses fancy photogate Each group repeats the measurement 6 times. |
|
|
|
Average is defined as
\(\mu=\) \(\bar{x} = \dfrac{x_1 + x_2 + \cdots + x_n}{n}\) \(=\ds \frac{1}{n} \sum_{i=1}^{n}x_i\)
\(n\) - number of experiments
\(x_i\) - result of the experiment number \(i,\) where $i = 1, 2, \ldots, n$
Note: It is also known as the mean or best estimate
|
Find the average \(\bar{x}=\dfrac{1}{6}\ds \sum_{i=1}^{6}x_i\) |
|
|
|
|
How far is $x_i$ from the average? |
|
For a data set of \(n\) observations, the sample variance is: \(\ds \sigma^2 = \frac{\left(x_1-\bar{x}\right)^2+ \left(x_2-\bar{x}\right)^2+\cdots + \left(x_n-\bar{x}\right)^2}{n-1}\) \(\ds =\frac{1}{n-1}\sum_{i=1}^{n}\left(x_i-\bar{x}\right)^2\qquad \qquad \qquad \) The sample standard deviation (SD) is its square root: \(\sigma = \ds \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}\left(x_i-\bar{x}\right)^2}\) |
FAQ: Why \(n-1\)? 🤔 Because "SD" is shift-invariant so you lose one "degree of freedom" (or data point). Also, you can't have the SD of only one data point! |
|
Find the SD \( \sigma = \ds \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}\left(x_i-\bar{x}\right)^2} \) |
|
|
How far is $x_i$ from the average? |
|
How far is $x_i$ from the average? |
For a data set of $n$ observations, the Standard Error of the sample average is
\(\text{SE} = \) \(\sigma_{\bar{x}}= \ds \frac{\sigma}{\sqrt{n}}\qquad\)
that is, it is the standard deviation divided by square root of sample size.
| A | B | C | |
| 1 | Group 1 | Group 2 | |
| 2 | Trial | time (s) | time (s) |
| 3 | 1 | 1.621 | 1.649906 |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| 8 | 6 | 1.694 | 1.650007 |
| 9 | Average | =AVERAGE(B3:B8) | =AVERAGE(C3:C8) |
| 10 | SD | =STDEV(B3:B8) | =STDEV(C3:C8) |
| 11 | SE | =STDEV(B3:B8)/SQRT(COUNT(B3:B8)) | =STDEV(C3:C8)/SQRT(COUNT(C3:C8)) |
Find the standard error of the sample average
SE = \( \sigma_{\bar{x}}= \ds \frac{\sigma}{\sqrt{n}}\)
|
For Group 1, the average and standard error are \(\bar{x} = 1.643333\,\) and \(\,\Delta x = 0.015198,\) respectively. Thus, the interval around the mean that indicates the precision of the estimate is \( x = 1.643333 \pm 0.015198 \, \text{s}\) For Group 2, the interval is \( x = 1.648820 \pm 0.000395 \, \text{s}\) |
Group 1: \(\, x = 1.643333 \pm 0.015198 \, \text{s}\)
Group 2: \(\, x = 1.648820 \pm 0.000395 \, \text{s}\)
Significant figures are the digits in a number that carry meaning about its precision. They include all certain digits plus the first uncertain digit.
✏️ Rule of thumb: More significant figures → more precise measurement.
How many sig. fig.?
How many sig. fig.?
Group 1: \(\, x = 1.643333 \pm 0.015198 \, \text{s}\)
Group 2: \(\, x = 1.648820 \pm 0.000395 \, \text{s}\)
Group 1: \(\, x = 1.643333 \pm 0.015198 \, \text{s}\)
- round the error to 1 or 2 significant figures
- round the average to the same number of decimal places as the error
|
1 s.f. \(\rightarrow\) \(\Delta x = 0.02,\,\) \(\bar{x} = 1.64 \) \(\qquad \Ra x = 1.64\pm 0.02 \, \text{s}\) 2 s.f. \(\rightarrow\) \(\Delta x = 0.015,\,\) \(\bar{x} = 1.643 \) \(\qquad \Ra x = 1.643\pm 0.015 \, \text{s}\) |
Group 2: \(\, x = 1.648820 \pm 0.000395 \, \text{s}\)
- round the error to 1 or 2 significant figures
- round the average to the same number of decimal places as the error
|
1 s.f. \(\rightarrow\) \(\Delta x = 0.0004,\,\) \(\bar{x} = 1.6488 \) \(\qquad \Ra x = 1.6488\pm 0.0004 \, \text{s}\) 2 s.f. \(\rightarrow\) \(\Delta x = 0.00040,\,\) \(\bar{x} = 1.64882 \) \(\qquad \Ra x = 1.64882\pm 0.00040 \, \text{s}\) |
In the case we have many samples with a normal distribution, then we have a bell curve.
|
With 68.26% confidence the true value \(x\) lies between \(\mu - \text{SE}\lt x \lt \mu +\text{SE}\) \(\mu\) is the mean/average and \(\text{SE}\) is the standard error With 95.44% confidence the true value \(x\) lies between \(\mu - 2\text{SE}\lt x \lt \mu +2\text{SE}\) With 99.72% confidence the true value \(x\) lies between \(\mu - 3\text{SE}\lt x \lt \mu +3\text{SE}\) |
|
|
A research study was conducted to examine the differences between older and younger adults on perceived life satisfaction. Several older adults and several younger adults were given a life satisfaction test. Scores on the test range from 0 to 60, with high scores indicative of high life satisfaction, low scores indicative of low life satisfaction. Find the averages and the standard errors for the life satisfaction score for the old and young adults.
AVERAGE(B1:B12)
|
✨Note: Data available in your Excel file. |
What is the difference between 12.3 and 12.30?
Significant figures reflect the precision
to which a value is reported.
If errors are not given:
If errors are given, we must consider error propagation.
Find the age of universe \(( t= 1.4 \times 10^{10}\,\text{years} )\) in weeks.
First, note that the value \(1.4 \times 10^{10}\,\text{years}\) has 2 s.f.
Now, $\,1 \,\text{year} = 52 \, \text{weeks}$ \(\;\Ra\; 1 = \dfrac{52\, \text{weeks}}{1 \,\text{year}}\) 👈 Conversion factor
Then $\, t = 1.4 \times 10^{10}\,\text{years}$ \( =1.4 \times 10^{10}\,\text{years} \times \dfrac{52\, \text{weeks}}{1 \,\text{year}}\)
\( =1.4 \times 52 \times 10^{10}\text{weeks}\) \( =72.8\times 10^{10}\,\text{weeks}\)
\(= 7.28\times 10^{11}\,\text{weeks}\;\;\) (it has 3 s.f.) 👈
Round it up so we consider 2 s.f. \(\rightarrow 7.3\times 10^{11}\,\text{weeks}\)
1. 1203 people attend a concert, and spend $74,403.20
2. 1200 people attend a concert, and spend $74,000
Given:
1203 people attend, total spend = $74,403.20
$\dfrac{74{,}403.20}{1203}$ $= 61.848046...$ $\approx 61.85$
Given:
1200 people attend, total spend = $74,000
$\dfrac{74{,}000}{1200}$ $= 61.666...$
What will be the average if we consider $1.200\times 10^3$ and $7.4000\times 10^4$? 🤔
A bucket contains $5.23 \pm 0.07$ of water. How many gallons it is?
👉 \(1\,\text{gallon} = 3.79\, \text{L}\)
Change units in both the value and the error!
\(\text{V} = 5.23 \,\text{L}\) \(=5.23 \,\text{L} \times \dfrac{1\,\text{gl}}{3.79\, \text{L}}\) \(=1.379947 \,\text{gl}\)
\(\Delta \text{V} = 0.07 \,\text{L}\) \(=0.07 \,\text{L} \times \dfrac{1\,\text{gl}}{3.79\, \text{L}}\) \(=0.018469 \,\text{gl}\)
Thus \(\text{V} = 1.38 \,\text{gl}\;\) and \(\;\Delta \text{V} = 0.02 \,\text{gl}\)
\(\Ra \text{V} = 1.38 \pm 0.02\,\text{gl} \)
When dealing with uncertainties based on a large collection of numbers the manipulation of measured quantities and the error associated with each quantity will contribute to the error in the final answer.
The following formulae are a good approximation of the error and become increasingly accurate as the number of measurements increase or when the cross terms between the contributing errors are reasonably small.
| Process | Value | Uncertainty |
|---|---|---|
| Average | \(\bar x = \dfrac{x_1+x_2+\cdots+x_n}{n}\) | \(\Delta x = \ds \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}\left(x_i-\bar{x}\right)^2}\) |
| Constant factor | \(\bar{z} = A\bar{x} ,\) with \(A\) a constant | \(\Delta z = A\Delta x \) |
| Addition / Subtraction | \(\bar{z} = A\bar{x} ± B\bar{y}\) | \(\Delta z = \sqrt{A^2\Delta x^2+B^2\Delta y^2} \) |
| Multiplication / Division | \(\bar{z} = \bar{x} \times \bar{y}\) or \(\bar{z} = \dfrac{\bar{x} }{ \bar{y}}\) | \(\Delta z = \bar{z}\sqrt{\left(\dfrac{\Delta x}{\bar{x}}\right)^2+\left(\dfrac{\Delta y}{\bar{y}}\right)^2}\) |
| Function | \(f(x)\) | \(\Delta f = \left|\dfrac{\partial f(x) }{\partial x}\right|\Delta x\) |
If we have a general function of independent variables \[f\left(x_1,x_2,\cdots, x_n\right)\]
- Each variable has an error $x_i \pm \Delta x_i$
- The error of the function is
$\ds \Delta f = \sqrt{\left(\dfrac{\partial f(x) }{\partial x_1}\Delta x_1\right)^2+ \cdots+ \left(\dfrac{\partial f(x) }{\partial x_n}\Delta x_2\right)^2} $
$=\ds \sqrt{\sum_{i=1}^{n}\left(\dfrac{\partial f(x) }{\partial x_i}\Delta x_i\right)^2}\qquad \qquad \qquad $
Given: water = \(70.0 \pm 0.5\, \text{ml}\), methanol = \(30.0 \pm 0.4\, \text{ml}\).
\(V_w = 70.0\, \text{ml}\;\) and \(\;V_m= 30.0\, \text{ml}\)
\(\Delta V_w = 0.5\, \text{ml}\;\) and \(\;\Delta V_m= 0.4\, \text{ml}\)
👉 Notice we have 1 s.f. in each uncertainty!
Sum (addition) — uncertainties combined in quadrature (independent errors):
\(V_{\text{tot}} = V_w + V_m \) \(= 70.0\, \text{ml}+ 30.0 \, \text{ml} \) \(= 100.0 \, \text{ml}\)
\( \Delta{V_{\text{tot}}}=\sqrt{(0.5\, \text{ml})^{2}+(0.4\, \text{ml})^{2}} \) \( =\sqrt{0.25\, \text{ml}^2+0.16\, \text{ml}^2}\)
\( =\sqrt{0.41\, \text{ml}^2} \) \( =0.64\, \text{ml}\;(\text{approx.}) \)
Hence, \(V_{\text{tot}} = 100.0 \pm 0.6\) ml.
Given: water = \(70.0 \pm 0.5\, \text{ml}\), methanol = \(30.0 \pm 0.4\, \text{ml}\).
\(V_w = 70.0\, \text{ml}\;\) and \(\;V_m= 30.0\, \text{ml}\) - \(\Delta V_w = 0.5\, \text{ml}\;\) and \(\;\Delta V_m= 0.4\, \text{ml}\)
Fraction of methanol: \(\displaystyle f = \frac{V_m}{V_{\text{tot}}}\) \(\displaystyle =\frac{30.0\, \text{ml}}{100.0\, \text{ml}} \times 100\%\) \(\displaystyle=30\%\).
Propagate uncertainty using partial derivatives
(since \(V_m\) appears in
numerator and
denominator):
\(\ds f=\frac{V_m}{V_w+V_m},\quad \) \(\ds \frac{\partial f}{\partial V_m}=\frac{V_w}{(V_w+V_m)^2},\quad\) \(\ds \frac{\partial f}{\partial V_w}=-\frac{V_m}{(V_w+V_m)^2} \)
Evaluate at \(V_w=70.0,\ V_m=30.0\):
\( \ds \Delta f=\sqrt{\left( \tfrac{70}{100^2}\cdot 0.4 \right)^2 + \left( \tfrac{30}{100^2}\cdot 0.5 \right)^2} \) \( \ds \approx 0.00318 \)
Given: water = \(70.0 \pm 0.5\, \text{ml}\), methanol = \(30.0 \pm 0.4\, \text{ml}\).
\(V_w = 70.0\, \text{ml}\;\) and \(\;V_m= 30.0\, \text{ml}\) - \(\Delta V_w = 0.5\, \text{ml}\;\) and \(\;\Delta V_m= 0.4\, \text{ml}\)
Fraction of methanol: \(\displaystyle f = \frac{V_m}{V_{\text{tot}}}\) \(\displaystyle =\frac{30.0\, \text{ml}}{100.0\, \text{ml}} \times 100\%\) \(\displaystyle=30\%\).
\( \ds \Delta f=\sqrt{\left( \tfrac{70}{100^2}\cdot 0.4 \right)^2 + \left( \tfrac{30}{100^2}\cdot 0.5 \right)^2} \) \( \ds \approx 0.00318 \)
Uncertainty as percentage:
\(0.00318\times100\approx0.318\%\),
round sensibly to 1. s.f. in
the uncertainty → \(0.3\%\).
Therefore, Methanol = \(30.0\pm 0.3\, \%\) (%v/v).
We remove \(15 \pm 5 \, \text{ml}\) from the total \(100.0 \pm 0.64 \, \text{ml}\).
Remaining volume:
\( R = V_{\text{tot}} - V_{\text{removed}} \) \( = 100.0\, \text{ml} - 15.0 \, \text{ml} \) \( = 85.0 \, \text{ml}\)
Subtraction - Independent errors → quadrature:
\(\Delta R=\sqrt{0.64^2\, \text{ml}^2 + 5.0^2\, \text{ml}^2} \) \(=\sqrt{0.4096\, \text{ml}^2+25\, \text{ml}^2} \)
\(=5.04\, \text{ml}\ (\text{approx.}) \)
Round uncertainty to 1 s.f. → \(5\). Align value to same precision.
Thus, the remaining mixture \(\,= 85 \pm 5\, \text{ml}\) .
Methanol fraction from (2): \(f = 0.300 \pm 0.00318\, \text{ml}\).
Remaining volume from (3): \(R = 85.0 \pm 5.04\, \text{ml}\)
Amount of methanol left: \[ M_{\text{left}} = f\cdot R = 0.300\times 85.0 = 25.5 \]
Propagate uncertainty for product (treating \(f\) and \(R\) as independent here):
\[ \sigma_{M}=\sqrt{(R\,\sigma_f)^2 + (f\,\sigma_R)^2} =\sqrt{(85\cdot 0.00318)^2 + (0.300\cdot 5.04)^2} \approx 1.54 \]
Uncertainty has leading digit 1 → keep two significant figures → \(1.5\). Align value precision to one decimal.
Answer: Methanol left ≈ \(25.5 \pm 1.5\) (same units).
Where they come from?
How to propagate them?
Measures the relationship between two, or more variables, indicating how they change together.
It is used to understand the relationship between variables.
To measure correlation between variables we use the formula:
|
\(r = \frac{\displaystyle\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\displaystyle\sum_{i=1}^{n} (x_i - \bar{x})^2} \sqrt{\displaystyle\sum_{i=1}^{n} (y_i - \bar{y})^2}}\) |
|
\(r = \frac{\displaystyle\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\displaystyle\sum_{i=1}^{n} (x_i - \bar{x})^2} \sqrt{\displaystyle\sum_{i=1}^{n} (y_i - \bar{y})^2}}\) \(-1\leq r \leq 1\) It is known also as the Pearson coefficient It is common to use also \(r^2 \in [0,1]\) In Excel we use the function: CORREL(A1:A10,B1:B10) |
|
|
🍦 |
Sales of Ice cream |
| No. of people getting ☀️ sun burn |
🥵 |
🍦 is causing more sun burns on people ❌
|
🍦 |
|
|
🥵 |
Ice cream is not causing sun burn‼️
We must consider an external factor: The Sun ☀️
The Sun ☀️ might be increasing the sales of ice cream
and also sun burns on people.
Find more examples here 👉 Spurious correlations
The correlation coefficient is not a measure of how linear the plot is!
"Don't rely solely on summary statistics—always visualise your data."
Source: Same Stats, Different Graphs
"Don't rely solely on summary statistics—always visualise your data."
Source: Same Stats, Different Graphs
The data in the Lecture sheets → workshop tab gives the workshop attendance, workshop mark, Inference/Maths Task mark, Scientific Critique Task mark and overall course mark (total marks). We want to establish if there is any relationship between students' attendance in workshops, performance in given assessment tasks, and the overall marks they get in the course.
See you in Week 4!