Design with Sampling Distribution Segments

dc.contributor.authorHagar, Luke
dc.date.accessioned2024-07-09T15:00:51Z
dc.date.available2024-07-09T15:00:51Z
dc.date.issued2024-07-09
dc.date.submitted2024-07-04
dc.description.abstractIn most settings where data-driven decisions are made, these decisions are informed by two-group comparisons. Characteristics – such as median survival times for two cancer treatments, defect rates for two assembly lines, or average satisfaction scores for two consumer products – quantify the impact of each choice available to decision makers. Given estimates for these two characteristics, such comparisons are often made via hypothesis tests. This thesis focuses on sample size determination for hypothesis tests with interval hypotheses, including standard one-sided hypothesis tests, equivalence tests, and noninferiority tests in both frequentist and Bayesian settings. To choose sample sizes for nonstandard hypothesis tests, simulation is used to estimate sampling distributions of e.g., test statistics or posterior summaries corresponding to various sample sizes. These sampling distributions provide context as to which estimated values for the two characteristics are plausible. By considering quantiles of these distributions, one can determine whether a particular sample size satisfies criteria for the operating characteristics of the hypothesis test: power and the type I error rate. It is standard practice to estimate entire sampling distributions for each sample size considered. The computational cost of doing so impedes the adoption of non-simplistic designs. However, only quantiles of the sampling distributions must be estimated to assess operating characteristics. To improve the scalability of simulation-based design, we could focus only on exploring the segments of the sampling distributions near the relevant quantiles. This thesis proposes methods to explore sampling distribution segments for various designs. These methods are used to determine sample sizes and decision criteria for hypothesis tests with orders of magnitude fewer simulation repetitions. Importantly, this reduction in computational complexity is achieved without compromising the consistency of the simulation results that is guaranteed when estimating entire sampling distributions. In parametric frequentist hypothesis tests, test statistics are often constructed from exact pivotal quantities. To improve sample size determination in the absence of exact pivotal quantities, we first propose a simulation-based method for power curve approximation with such hypothesis tests. This method leverages low-discrepancy sequences of sufficient statistics and root-finding algorithms to prompt unbiased sample size recommendations using sampling distribution segments. We also propose a framework for power curve approximation with Bayesian hypothesis tests. The corresponding methods leverage low-discrepancy sequences of maximum likelihood estimates, normal approximations to the posterior, and root-finding algorithms to explore segments of sampling distributions of posterior probabilities. The resulting sample size recommendations are consistent in that they are suitable when the normal approximations to the posterior and sampling distribution of the maximum likelihood estimator are appropriate. When designing Bayesian hypothesis tests, practitioners may need to specify various prior distributions to generate and analyze data for the sample size calculation. Specifying dependence structures for these priors in multivariate settings is particularly difficult. The challenges with specifying such dependence structures have been exacerbated by recommendations made alongside recent advances with copula-based priors. We prove theoretical results that can be used to help select prior dependence structures that align with one's objectives for posterior analysis. We lastly propose a comprehensive method for sample size determination with Bayesian hypothesis tests that considers our recommendations for prior specification. Unlike our framework for power curve approximation, this method recommends probabilistic cutoffs that facilitate decision making while controlling both power and the type I error rate. This scalable approach obtains consistent sample size recommendations by estimating segments of two sampling distributions - one for each operating characteristic. We also extend our design framework to accommodate more complex two-group comparisons that account for additional covariates.en
dc.identifier.urihttp://hdl.handle.net/10012/20714
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectexperimental designen
dc.subjectsample size determinationen
dc.subjectquasi-Monte Carlo methodsen
dc.subjecthypothesis testingen
dc.titleDesign with Sampling Distribution Segmentsen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentStatistics and Actuarial Scienceen
uws-etd.degree.disciplineStatisticsen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorStevens, Nathaniel
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hagar_Luke.pdf
Size:
4.25 MB
Format:
Adobe Portable Document Format
Description:
Luke Hagar's thesis with British spelling used

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: