Doing Transparent and Reproducible Quantitative Sociology

McLevey, JohnBrowne, Pierson2024-01-082024-01-082024-01-082023-12-18http://hdl.handle.net/10012/20218The ongoing replication crisis (Baker 2016; Gelman and Loken 2016; Freese and Peterson 2017; Wiggins and Christopherson 2019; Bird 2020; Colling and Szűcs 2021) has laid bare quantitative sociology’s need for better standards of transparency and reproducibility in all published research. This dissertation’s core contribution is the proposal and articulation of a ‘foundational cycle’ of three interrelated methodological practices: Causal Inference (Rubin 1974; Pearl 2009b, 2009a; Pearl, Glymour, and Jewell 2016), Principled Data Processing (Ball 2016a, 2016b; Barrett 2022), and Bayesian Inference (Bayes 1763; Jaynes 2003). By enshrining the principles and practices of the ‘foundational cycle,’ researchers ensure that transparency and reproducibility is woven into each critical juncture of the research project – this permits other researchers to comprehend and reconstruct all aspects contributing to the published findings. The first of the dissertation’s four substantive chapters contributes an account of the development of causal inference with a particular focus on the role the ‘graphical’ paradigm played in motivating the development (and, later, dismantling) of causal methods in quantitative sociology. It also provides a brief description of Judea Pearl’s theory of inferred causation (Pearl 2009b) and argues that quantitative sociology should adopt it as the baseline model for the purposes of causal transparency and reproducibility. The second substantive chapter – which builds directly on that of the first – addresses a gap in extant sociological literature about the prevalence of explicit causal methodology in the field. This chapter contributes a review of the causal methods employed in the quantitative articles published in ‘top’ sociological journals in the year 2022 (see: Jacobs 2016). The review, which examined 283 quantitative sociological articles (out of a total 574 in the review’s corpus), found that – as judged by the criteria articulated in Pearl (2009b) – only 5 among them were ‘causally adequate.’ The third substantive chapter’s contribution takes the form of a software-based implementation of Patrick Ball’s Principled Data Processing framework (Ball 2016b, 2016a), which is designed to permit the development and maintenance of transparent, reproducible data processing pipelines, even in the context of large, distributed, collaborative, and technically-complex research efforts. The software package, titled pdpp (Browne et al. 2021), is an accessibility-oriented iteration on Ball’s original framework. The fourth and final substantive chapter makes two contributions: the first is the development and articulation of an ‘ameliorative’ class of argumentation designed to address gaps in Gelman’s typology of arguments in favour of Bayesian inference (Gelman 2008) – extant modes of argumentation focus on ‘winning’ the debate between Frequentism and Bayesianism, whereas ‘ameliorative’ arguments seek to address Frequentists’ concerns and trepidations about the Bayesian paradigm or the transition thereto. The second contribution is an ameliorative argument reified as a software package titled pyKrusch (Browne 2021), which automates the creation of – and builds upon the functionality of – John Kruschke’s Bayesian dependency structure diagrams (Kruschke 2014).enBayesian inferencecausal inferenceprincipled data processingtransparencyreproducibilitycomputational social scienceDoing Transparent and Reproducible Quantitative SociologyDoctoral Thesis