For this lab we will use a candy dataset collected by www.fivethirtyeight.com. Additional details about the dataset are available below (courtesy of Kaggle).
candy <- read_csv('http://math.montana.edu/ahoegh/teaching/stat446/candy-data.csv')
candy
What’s the best (or at least the most popular) Halloween candy? That was the question this dataset was collected to answer. Data was collected by creating a website where participants were shown presenting two fun-sized candies and asked to click on the one they would prefer to receive. In total, more than 269 thousand votes were collected from 8,371 different IP addresses.
candy-data.csv
includes attributes for each candy along with its ranking. For binary variables, 1 means yes, 0 means no. The data contains the following fields:
This dataset is Copyright (c) 2014 ESPN Internet Ventures and distributed under an MIT license. Check out the analysis and write-up here: The Ultimate Halloween Candy Power Ranking. Thanks to Walt Hickey for making the data available.
Assume we are interested in understanding the winpercentage
for four groups of candies:
Compare and contrast stratified sampling with domain estimation. How are they similar and how are they different.
A stratified sample with ten samples from each strata has been taken for you. Compute the point estimates for mean winpercentage
for each strata.
stratified_sample <- candy %>% group_by(chocolate, pluribus) %>% sample_n(10) %>% ungroup()
An SRS sample of size 40 is also taken. Compute the point estimates for mean winpercentage
within each strata.
srs_sample <- candy %>% sample_n(40)
Compute the variance of the mean winpercentage
for each domain. You can assume that N and N_d are known.