Statistics Review ChE 479 Winter 2022 January 19, 2022 Dr. Harding

16 Slides663.00 KB

Statistics Review ChE 479 Winter 2022 January 19, 2022 Dr. Harding

Pertinent Quote Jacob Bernoulli (1731): “For even the most stupid of men, by some instinct of nature, by himself and without any instruction (which is a remarkable thing), is convinced that the more observations have been made, the less danger there is of wandering from one’s goal.” From Lemons, An Introduction to Stochastic Processes in Physics, Johns Hopkins University Press, pg.13

Statistical Methods Repeated Data Points Comparing Averages of Measured Variables Linear Regression Confidence Prediction Sensitivity Interval Analysis Propagation about!) Interval of Error (Don’t worry

1. Repeated Data Points Use t-test based on measured st dev (s) measured mean s x t where t f , n 1 2 n true mean In Excel, T.INV( ,r) for one-tailed test ( 0.025 for 95% confidence interval) T.INV.2T( ,r) for two-tailed test ( 0.05 for 95% confidence interval) r n-1

Gaussian Distribution 68.27% of distribution lies within one 95.45% of distribution lies within two 99.73% of distribution lies within three t-test is used when we do not have enough data points ( 30) 68.27% 95.45% 99.73%

2. Comparing averages of measured variables Experiments were completed on two separate days. Day 1: x1 40.9 Day 2: x2 37.2 s x1 3.27 n x1 7 s x2 2.67 n x2 9 When comparing means at a given confidence level (e.g. 95%), is there a difference between the means?

2. Comparing averages of measured variables New formula: T Step 1 (compute T) x1 x2 (nx1 1) s x21 (nx 2 1) s x22 1 1 nx1 nx 2 2 nx1 nx 2 Larger T : More likely different For this example, T 2.5 Step 2 Compute net r r nx1 nx2-2 In Excel, T.INV( ,r) for one-tailed test ( 0.025 for 95% confidence interval) T.INV.2T( ,r) for two-tailed test ( 0.05 for 95%

2. Comparing averages of measured variables Step 3 Compute net t from net r Step 4 Compare T with t At a given confidence level (e.g. 95% or a 0.05), there is a difference if: 𝑇 𝑡 T ( 𝛼 ,𝑟 2 ) 2-tail t 2.5 2.145 95% confident there is a difference! (but not 98% confident)

3. Linear Regression y mx b Fit m and b, get r2 Find confidence intervals for m and b m 3.56 0.02, etc. Use standard error and t-statistic Excel add-on (or Igor, Matlab, Python) Find confidence intervals for line Use standard error around mean Narrow waisted curves around line Depends on n Meaning: How many ways can I draw a line through data Find prediction band for line Meaning: Where are the bounds of where the data should lie

3. Linear Regression (Confidence Interval) Confidence Interval Prediction Band

Example 25 Coefficient values 95% Confidence Interval a 0.51617 1.83 b 0.92291 0.178 20 Y Values 15 10 5 Data Linear curve fit 95% confidence interval of line 95% prediction band 0 -5 0 2 4 6 8 10 X Values 12 14 16 18

Confidence Intervals and Prediction Bands What good is the confidence interval for a line? Shows how many ways the line can fit the points Let’s you state the confidence region for any predicted point r2 still helps determine how good the fit is What good is the prediction band? Shows where the data should lie Helps identify outlying data points to consider discarding

4. Sensitivity Analysis Used to determine how independent variable values will impact a particular dependent variable under a given set of assumptions i.e., how sensitive is the output by changes in one input variable while keeping all other inputs constant Useful for the following reasons: Testing the robustness of the results of a model or system in the presence of uncertainty Increased understanding of the relationships between input and output variables in a system Uncertainty reduction, through the identification of model inputs that cause significant uncertainty in the output and should therefore be the focus of attention to increase robustness (perhaps by further research)

Sensitivity Analysis Usefulness (Con’t.): (Con’t.) Searching for errors in the model (by encountering unexpected relationships between inputs and outputs) Model simplification – fixing model inputs that have no effect on the output, or identifying and removing redundant parts of the model structure Finding regions in the space of input factors for which the model output is either maximum or minimum or meets some optimum criterion In case of calibrating models with large number of parameters, a primary sensitivity test can ease the calibration stage by focusing on the sensitive parameters. Not knowing the sensitivity of parameters can result in time being uselessly spent on non-sensitive ones To seek to identify important connections between observations, model inputs, and predictions or forecasts, leading to the development of better models

Sensitivity Analysis Mechanics First, the base case output is defined; using the average value of all input variables for a particular base case; Then the value of the output is calculated using a new value for the one input under consideration while keeping other inputs constant Find the percentage change in the output and the percentage change in the input. The sensitivity is calculated by dividing the percentage change in output by the percentage change in input. This process is repeated till the sensitivity figure for each of the inputs is obtained. The conclusion is that the higher the sensitivity figure, the more sensitive the output is to any change in that input and vice versa

QUESTIONS?

Back to top button