# statistics exam #1

Question | Answer |
---|---|

biased samples | more likely to produce some outcomes than others |

convenience samples | samples that are easy to take |

volunteer response sample | self-selected sample of people who responded to a general appeal |

simple random sample | avoids bias any group just as likely to be picked as any other group |

sampling frame | list of individuals from which we choose our sample |

probability sample | sampling methods that are based on method of randomness- no bias ex) generator, dice |

stratified random sample | population divided into groups or strata random samples taken within these groups |

cluster sample | divided into naturally occurring clusters |

systematic sample | select every kth item individual from the sampling frame |

multi-stage samples | combination of all the techniques |

bad sampling frame | subjects missing, difficult to get lists |

under-coverage | sampling does not include the whole population data entry errors |

non-response | some may not respond |

response-rate | % of the contacted individuals who gave a response |

response bias | responses given to the questions differ from the truth -sensitive questions=dishonest answers sometimes question wording |

quantitative data | summarized with the mean numbers histograms dot plots |

categorical data | summarized with the proportion pie charts bar charts how many in each category |

distribution description and the graphs "story" | shape center variability |

skewed | left (tail to left) right (tail to right) symmetrical |

bimodal | two peaks |

population mean | ? |

sample mean | ? |

1st quartile | value that 25% of the data below median of the lower half |

3rd quartile | value that has 25% of the data above median of the upper half |

box plot min, max, median, Q1, Q3 |
min—Q1[IMedian]Q3—-max —[I]—- |

parameter | a numerical summary of a variable for a population what you are trying to find p or ? |

statistic | a numerical summary of a variable for a sample ? or p? |

measures of center | mean median main chunk of data |

measures of spread | SD, IQR how spread values are |

measure of variability: inter-quartile range | IQR=Q3-Q1 the width of the middle 50% summarizes variability does not summarize all value |

population variance | ?? |

sample variance | s? |

higher standard deviation= | higher variability in the values |

distances from the mean/measures of variability type of change with a transformation (adding and subtracting) | NONE |

distances from the mean/measures of variability type of change with a multiplication transformation (multiplying) | same amount of change |

range | simplest very quick looks |

standard deviation | common sensitive to outliers |

IQR | best for comparing skewed data sets outliers present |

standard score | z-score how many standard deviations above or below the mean |

theoretical distributions | mathematical models for distributions of data |

normal distribution | model that answers questions about a distribution |

symmetric | mean=median |

center is determined by the | mean (M) |

table z gives percentages _____ a point | below |

sentence #1 | answer the question ex: no, the analysts model is not appropriate for the farmers herd |

sentence #2 | what are the appropriate facts? ex: the analysts model is a nd with a mean at 20 and sd of 3 |

sentence #3 | what does this imply? ex: this implies that almost all of the cows would produce between 11 and 29 liters |

sentence #4 | how does this implication lead to the answer? ex: since we expect very few cows over 29L and the farmer knows 20% of the cows produce more than 30L the model does not match the data. Therefore the model is not appropriate |

sampling distribution | statistics from random samples have a predictable distribution |

binary questions | have yes or no answers summarized with a proportion |

population proportion | p |

different samples= different | proportions -vary from sample to sample -need to know SD of proportion |

sd of the proportion depends on the | population proportion largest: p=0.5 |

sd of the sample proportion decreases equally on each side of | 0.5 |

the sampling distribution of the sample proportion will be centered at | the true population proportion -assumes random sample |

sample value | a good estimate of the population proportion |

sample size increases the sd of the sample proportion | decreases |

nd of the sample proportion depends on | p and n |

sample size increases sample mean= |
decreases because variability decreases and sd becomes smaller |

right skewed population | approximately normal sample distribution for n=30,100 but not for n=5 |

bell shaped populaiton | approximately normal sample distribution for all sample sizes |

bimodal population | approximately normal sample distribution for n=100 |

nd=good model | with symmetrical parent population and large sample size |

Central Limit Theorem (CLT) | sample must be random, large(n=30+) works regardless of the population shape about sample mean, not the individuals |

margin of error (MOE) | numeric indication of distance a statistic may be from the true parameter use the sampling distribution to determine MOE -only accounts for random sampling variability (not bias…) |

size of MOE depends on | SD and z-score |

confidence level | the proportion of samples that will produce a confidence interval that contains the true population parameter. % of possible samples for which MOE works z-score: confidence coefficient-chosen by user, can change size of MOE, doesn't tell us about biases |

larger sample size= | smaller MOE and no change in confidence |

MOE + confidence interval | -quantifies random sampling variability -based on sampling distribution |