statistics exam #1

January 18, 2018 Off By admin
Question Answer
biased samples more likely to produce some outcomes than others
convenience samples samples that are easy to take
volunteer response sample self-selected sample of people who responded to a general appeal
simple random sample avoids bias
any group just as likely to be picked as any other group
sampling frame list of individuals from which we choose our sample
probability sample sampling methods that are based on method of randomness- no bias
ex) generator, dice
stratified random sample population divided into groups or strata
random samples taken within these groups
cluster sample divided into naturally occurring clusters
systematic sample select every kth item individual from the sampling frame
multi-stage samples combination of all the techniques
bad sampling frame subjects missing, difficult to get lists
under-coverage sampling does not include the whole population
data entry errors
non-response some may not respond
response-rate % of the contacted individuals who gave a response
response bias responses given to the questions differ from the truth
-sensitive questions=dishonest answers sometimes
question wording
quantitative data summarized with the mean
numbers
histograms
dot plots
categorical data summarized with the proportion
pie charts
bar charts
how many in each category
distribution description and the graphs "story" shape
center
variability
skewed left (tail to left)
right (tail to right)
symmetrical
bimodal two peaks
population mean ?
sample mean ?
1st quartile value that 25% of the data below
median of the lower half
3rd quartile value that has 25% of the data above
median of the upper half
box plot
min, max, median, Q1, Q3
min—Q1[IMedian]Q3—-max
—[I]—-
parameter a numerical summary of a variable for a population
what you are trying to find
p or ?
statistic a numerical summary of a variable for a sample
? or p?
measures of center mean
median
main chunk of data
measures of spread SD, IQR
how spread values are
measure of variability: inter-quartile range IQR=Q3-Q1
the width of the middle 50%
summarizes variability
does not summarize all value
population variance ??
sample variance s?
higher standard deviation= higher variability in the values
distances from the mean/measures of variability type of change with a transformation (adding and subtracting) NONE
distances from the mean/measures of variability type of change with a multiplication transformation (multiplying) same amount of change
range simplest
very quick looks
standard deviation common
sensitive to outliers
IQR best for comparing skewed data sets
outliers present
standard score z-score
how many standard deviations above or below the mean
theoretical distributions mathematical models for distributions of data
normal distribution model that answers questions about a distribution
symmetric mean=median
center is determined by the mean (M)
table z gives percentages _____ a point below
sentence #1 answer the question
ex: no, the analysts model is not appropriate for the farmers herd
sentence #2 what are the appropriate facts?
ex: the analysts model is a nd with a mean at 20 and sd of 3
sentence #3 what does this imply?
ex: this implies that almost all of the cows would produce between 11 and 29 liters
sentence #4 how does this implication lead to the answer?
ex: since we expect very few cows over 29L and the farmer knows 20% of the cows produce more than 30L the model does not match the data. Therefore the model is not appropriate
sampling distribution statistics from random samples have a predictable distribution
binary questions have yes or no answers
summarized with a proportion
population proportion p
different samples= different proportions
-vary from sample to sample
-need to know SD of proportion
sd of the proportion depends on the population proportion
largest: p=0.5
sd of the sample proportion decreases equally on each side of 0.5
the sampling distribution of the sample proportion will be centered at the true population proportion
-assumes random sample
sample value a good estimate of the population proportion
sample size increases the sd of the sample proportion decreases
nd of the sample proportion depends on p and n
sample size increases
sample mean=
decreases
because variability decreases and sd becomes smaller
right skewed population approximately normal sample distribution for n=30,100 but not for n=5
bell shaped populaiton approximately normal sample distribution for all sample sizes
bimodal population approximately normal sample distribution for n=100
nd=good model with symmetrical parent population and large sample size
Central Limit Theorem (CLT) sample must be random, large(n=30+)
works regardless of the population shape
about sample mean, not the individuals
margin of error (MOE) numeric indication of distance a statistic may be from the true parameter
use the sampling distribution to determine MOE
-only accounts for random sampling variability (not bias…)
size of MOE depends on SD and z-score
confidence level the proportion of samples that will produce a confidence interval that contains the true population parameter. % of possible samples for which MOE works
z-score: confidence coefficient-chosen by user, can change size of MOE, doesn't tell us about biases
larger sample size= smaller MOE and no change in confidence
MOE + confidence interval -quantifies random sampling variability
-based on sampling distribution