Design with Sampling Distribution Segments

Hagar, Luke

Design with Sampling Distribution Segments

dc.contributor.advisor	Stevens, Nathaniel
dc.contributor.author	Hagar, Luke
dc.date.accessioned	2024-07-09T15:00:51Z
dc.date.available	2024-07-09T15:00:51Z
dc.date.issued	2024-07-09
dc.date.submitted	2024-07-04
dc.description.abstract	In most settings where data-driven decisions are made, these decisions are informed by two-group comparisons. Characteristics – such as median survival times for two cancer treatments, defect rates for two assembly lines, or average satisfaction scores for two consumer products – quantify the impact of each choice available to decision makers. Given estimates for these two characteristics, such comparisons are often made via hypothesis tests. This thesis focuses on sample size determination for hypothesis tests with interval hypotheses, including standard one-sided hypothesis tests, equivalence tests, and noninferiority tests in both frequentist and Bayesian settings. To choose sample sizes for nonstandard hypothesis tests, simulation is used to estimate sampling distributions of e.g., test statistics or posterior summaries corresponding to various sample sizes. These sampling distributions provide context as to which estimated values for the two characteristics are plausible. By considering quantiles of these distributions, one can determine whether a particular sample size satisfies criteria for the operating characteristics of the hypothesis test: power and the type I error rate. It is standard practice to estimate entire sampling distributions for each sample size considered. The computational cost of doing so impedes the adoption of non-simplistic designs. However, only quantiles of the sampling distributions must be estimated to assess operating characteristics. To improve the scalability of simulation-based design, we could focus only on exploring the segments of the sampling distributions near the relevant quantiles. This thesis proposes methods to explore sampling distribution segments for various designs. These methods are used to determine sample sizes and decision criteria for hypothesis tests with orders of magnitude fewer simulation repetitions. Importantly, this reduction in computational complexity is achieved without compromising the consistency of the simulation results that is guaranteed when estimating entire sampling distributions. In parametric frequentist hypothesis tests, test statistics are often constructed from exact pivotal quantities. To improve sample size determination in the absence of exact pivotal quantities, we first propose a simulation-based method for power curve approximation with such hypothesis tests. This method leverages low-discrepancy sequences of sufficient statistics and root-finding algorithms to prompt unbiased sample size recommendations using sampling distribution segments. We also propose a framework for power curve approximation with Bayesian hypothesis tests. The corresponding methods leverage low-discrepancy sequences of maximum likelihood estimates, normal approximations to the posterior, and root-finding algorithms to explore segments of sampling distributions of posterior probabilities. The resulting sample size recommendations are consistent in that they are suitable when the normal approximations to the posterior and sampling distribution of the maximum likelihood estimator are appropriate. When designing Bayesian hypothesis tests, practitioners may need to specify various prior distributions to generate and analyze data for the sample size calculation. Specifying dependence structures for these priors in multivariate settings is particularly difficult. The challenges with specifying such dependence structures have been exacerbated by recommendations made alongside recent advances with copula-based priors. We prove theoretical results that can be used to help select prior dependence structures that align with one's objectives for posterior analysis. We lastly propose a comprehensive method for sample size determination with Bayesian hypothesis tests that considers our recommendations for prior specification. Unlike our framework for power curve approximation, this method recommends probabilistic cutoffs that facilitate decision making while controlling both power and the type I error rate. This scalable approach obtains consistent sample size recommendations by estimating segments of two sampling distributions - one for each operating characteristic. We also extend our design framework to accommodate more complex two-group comparisons that account for additional covariates.	en
dc.identifier.uri	http://hdl.handle.net/10012/20714
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	experimental design	en
dc.subject	sample size determination	en
dc.subject	quasi-Monte Carlo methods	en
dc.subject	hypothesis testing	en
dc.title	Design with Sampling Distribution Segments	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	Statistics and Actuarial Science	en
uws-etd.degree.discipline	Statistics	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Stevens, Nathaniel
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hagar_Luke.pdf
Size:: 4.25 MB
Format:: Adobe Portable Document Format
Description:: Luke Hagar's thesis with British spelling used

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Statistics and Actuarial Science