Downloads & Free Reading Options - Results
Bayesian Statistics As An Alternative To Null Hypothesis Significance Testing For The Speech%2c Language%2c And Hearing Sciences by Caroline Larson
Read "Bayesian Statistics As An Alternative To Null Hypothesis Significance Testing For The Speech%2c Language%2c And Hearing Sciences" by Caroline Larson through these free online access and download options.
Books Results
Source: The Internet Archive
The internet Archive Search Results
Available books for downloads and borrow from The internet Archive
1Bayesian Statistics As An Alternative To Null Hypothesis Significance Testing For The Speech, Language, And Hearing Sciences
By Caroline Larson, Inge-Marie Eigsti, Teresa Girolamo, David Kaplan and Sara Kover
Researchers: Anonymous for masked review The null hypothesis “…is a bad starting point because, based on previous research, “something is going on.’” (p. 843; van de Schoot et al., 2014). Replication Crisis and Speech, Language, and Hearing Sciences The replication crisis in behavioral science, widely discussed since the 2010s, refers to a failure to find similar results when studies are repeated (i.e., failure to reproduce findings; Psychology Today Staff, 2022; e.g., Camerer et al., 2018). Namely, statistically significant results are not consistently observed across repeated studies. One key contributor to the replication crisis is the use of null hypothesis significance testing, derived from the frequentist school of statistics, with arbitrary significance thresholds, the most prominent being p-values < .05. The frequentist approach has three important limitations. First, the conceptual underpinnings of null hypothesis significance testing are not well-understood and are often incorrectly deployed (e.g., use of the phrase, “trend toward significance,” for a binary test of the null hypothesis; Attanasio, 1994; Wasserstein & Lazer, 2016). Second, null hypothesis significance testing is intended to reject the null hypothesis that no effect is present (e.g., the absence of group differences in performance), rather than testing to accept an alternative hypothesis, typically the hypothesis of interest, that an effect is present (e.g., the presence of a group difference). Third, there is a bias toward extreme results more so than a “true” or meaningful result, especially for small, heterogeneous participant samples (Button et al., 2013; Wasserstein, Schirm, & Lazar, 2019). These limitations are relevant to the speech, language, and hearing sciences because accurate and credible research practices are necessary for developing evidence-based diagnostic and therapeutic practices. Evidence-based practice is informed by research that involves statistical analysis of clinically relevant hypotheses (i.e., rather than null hypotheses), oftentimes with small (e.g., n = 20), heterogeneous samples (e.g., the behavioral phenotype of participants with autism spectrum disorder; Georgiades et al., 2013; Geschwind, 2009). One analytical approach which addresses these limitations is Bayesian statistics. This manuscript will describe and demonstrate how Bayesian statistics is more intuitive and less likely to be incorrectly deployed than frequentist statistics because the conceptual underpinnings of Bayesian statistics align more closely with how researchers engage in the scientific process. Bayesian statistics offers the advantage of drawing upon the cumulative evidence base, combined with new data, to make clinically meaningful decisions. It involves testing the degree to which the evidence supports clinically relevant hypotheses and it may be more effectively implemented for small, heterogeneous samples than frequentist statistics (Kaplan, 2014; van de Schoot et al., 2014). Here, we demonstrate the use of Bayesian statistics with an example dataset related to autism and language. We argue that the Bayesian framework provides an intuitive, informative, and robust alternative to frequentist statistics, thereby promoting credible, reproducible, and clinically meaningful research in the speech, language, and hearing sciences. Null Hypothesis Significance Testing versus Bayesian Statistics Probability Concepts. Null hypothesis significance testing and Bayesian statistics are grounded in distinct conceptual underpinnings. Null hypothesis significance testing is derived from the frequentist school of statistics and founded upon the idea of the long-run frequency of events. Long-run frequency assumes that an infinite number of identical repeated experiments will be equally probable. A statistical test, such as the t-test, provides a probability (the p-value) of observing data that are at least as extreme as the data observed assuming the null hypothesis is true, and assuming a fixed probability of false positive (the Type-I error, or alpha). The null hypothesis assumes chance findings (i.e., no effect) when identical experiments are repeated an infinite number of times. One salient instantiation of this idea is the confidence interval, which tells us that when an experiment is repeated identically an infinite number of times, a percentage (e.g., 95%) of confidence intervals constructed the same way will contain the “true” effect. Bayesian statistics is founded on the idea of epistemic probability. Epistemic probability is an expression of the assumptions we make about a given event (e.g., the presence of a group difference) based on available information before collecting new data; that information being drawn from a combination of prior research or expert opinion (Kaplan, 2014). The statistical test of epistemic probability involves updating our prior assumptions with new data (e.g., experimental task performance) using Bayes theorem (i.e., how available information and new data are combined; e.g., likelihood functions) to yield a posterior distribution (van de Schoot et al., 2014). The posterior distribution represents updated knowledge about an effect (e.g., the degree of group difference in experimental task performance). One salient instantiation of this Bayes’ theorem is the posterior probability interval. The posterior probability interval is the probability (e.g., 95%) that the “true” effect falls within that interval, or any constructed interval of clinical relevance (Kaplan, 2014; van de Schoot et al., 2014). Reflecting these distinct conceptual underpinnings, Bayesian statistics is more intuitive and less likely to be incorrectly deployed than frequentist statistics. The confidence interval, for instance, is often incorrectly ascribed the interpretation of a posterior probability interval. A confidence interval is not the 95% probability that the “true” effect lies within a range of values; instead, it tells us that a percentage of intervals constructed the same way will contain the true effect. There are many examples of null hypothesis significance testing being incorrectly deployed, such as interpreting a statistically significant effect as evidence for the alternative hypothesis or interpreting a non-statistically significant effect as “trending toward significance” (i.e., a lack of statistical significance represents a lack of sufficient evidence to reject the null hypothesis; Attanasio, 1994; Wasserstein & Lazer, 2016; see Nead, Wehner, & Mitra, 2018 for a related discussion in the biomedical literature), due in part to the unintuitive conceptual underpinnings of this statistical approach. In contrast, the conceptual underpinnings of Bayesian statistics align more closely with how speech, language, and hearing researchers engage in the scientific process. Researchers do not often engage in the scientific process expecting to observe chance (i.e., null) results, per the underlying assumption in frequentist statistics. Rather, we engage in the scientific process as Bayesians, using available information to build upon prior work and to test clinically meaningful hypotheses. Using this more intuitive statistical approach may, therefore, promote reproducibility in speech, language, and hearing research by increasing the accuracy and credibility of statistical findings and by better reflecting research goals of the field. Statistical Results. Whereas null hypothesis significance testing only tests the absence of an effect (i.e., whether the null hypothesis can be rejected), Bayesian statistics provides the full posterior distribution of an effect (e.g., the degree to which language is related to social skills in autism spectrum disorder, ASD). This is the result of another major difference between frequentist and Bayesian statistics; namely that frequentist statistics treats the data as random and the parameters of interest (e.g. the relationship between language and social skills in ASD) as fixed, whereas Bayesian statistics treats the data as fixed, and the parameters as random and described by prior distributions. These prior distributions encode our assumptions and knowledge about the parameters of interest, drawing on available information from prior literature (e.g., previous research on the relationship between language and social skills in ASD). The end result is a distribution of plausible values of the parameters of interest (e.g., lower and upper thresholds of the degree to which language is related to social skills in ASD) and not a single binary determination of whether the effect is significant or not. One outcome of a Bayesian analysis is the so-called posterior probability interval (PPI). Unlike the frequentist confidence interval that provides precisely the same binary information as the significance test, the PPI provides the range of plausible values that the treatment effect can take. Thus, a PPI of, say, [-1,15] describing the relationship between standardized measures of language and social skills does contain zero as a plausible value. A confidence interval that contains zero provides the binary result that there is insufficient evidence to reject the null hypothesis. A PPI that contains zero, in contrast, may or may not provide evidence of an effect, depending on the clinically meaningful interpretation of the effect. For instance, one can easily compute the probability that the effect is greater (or less than) zero, which might be quite large in the case of the PPI of [-1,15] (e.g., >90% probability). Note that this type of calculation is not possible within the frequentist approach. Indeed, it may be the case that in speech, language, and hearing research, other bounds of interest may be more important than simply determining if zero is in the interval. For instance, a 90% probability that a standard score increase of one point on a language assessment is associated with a standard score increase of five or more points on a social skills assessment (i.e., a 90% probability that the effect is ≥5) is likely to be interpreted as sufficient justification to consider language when providing social supports to individuals with ASD. This interpretation is not possible within the frequentist approach. Furthermore, Bayesian results provide more of a basis for reproducibility than null hypothesis significance testing results. Reproducing a rejection of the absence of an effect is minimally informative in general, and arguably uninformative for speech, language, and hearing research in particular. Alternatively, reproducing an effect based on overlapping distributions of plausible values for that effect yields clinically relevant information that can be used to guide diagnosis and treatment of clinical conditions, such as the degree to which targeting language is important when providing social interaction support for individuals with ASD. Sample Size Issues. It is difficult to detect significant effects with small sample sizes using null hypothesis significance testing (Button et al., 2013). It is also easy to detect significant effects with small sample sizes. This apparent contradiction is the case because all statistical tests contain a factor N representing the sample size. Thus, simply increasing the sample size could lead to statistically significance regardless of the clinical importance of the effect. Bayesian statistics is more suited to small sample sizes than frequentist statistics (van de Schoot et al., 2014) for the main reason that, for Bayesian statistics, there is no reliance on large-sample theory on which conventional significance tests are based. However, the sample size does carry information; if sample sizes are small and we are not in possession of precise prior information about treatment effects of interest, we can expect that our PPIs will be quite wide. Note that this is exactly what one would expect if we were not in possession of much information from prior research (e.g., exploratory studies). The opposite will also be true in the Bayesian framework; small sample sizes can lead to greater precision in the presence of precise (and accurate) prior information. Thus, under the Bayesian framework, the sensitivity of results to prior information, along with posterior probability intervals, can be used to determine the degree to which current evidence supports clinically relevant hypotheses. The problem of small, heterogeneous samples is particularly relevant to the speech, language, and hearing sciences. Small samples (e.g., n = 20) are more heterogeneous than large samples (Button et al., 2013; IntHout, Ioannidis, Borm, & Goeman, 2015); examples of heterogeneity include the behavioral phenotype of a clinical sample (e.g., participants with ASD; Georgiades et al., 2013; Geschwind, 2009) and between study variance in coefficient and effect size estimates (IntHout et al., 2015). Sample heterogeneity, even in large samples (e.g., n = 300), can lead to biased effect estimates that are not reproducible (Liu, Zhao, Shaffer, Icitovic, & Case, 2005). Samples in speech, language, and hearing research are predominantly small (due in part to low prevalence, the costs of recruitment and assessment, etc.) and characterized by phenotypic heterogeneity, such as in ASD (Georgiades et al., 2013; Geschwind, 2009). These small samples provide a basis for larger-scale studies (e.g., meta-analyses, epidemiological studies) that inform diagnosis and treatment of clinical conditions. The limitations of frequentist statistics and of using p-values have implications for evidence-based practices in our field, specifically that the integrity of the cumulative evidence base directly impacts diagnosis and treatment (Attanasio, 1994). In contrast, the Bayesian framework allows us to maximize the impact of participant samples and prior research to yield clinically meaningful research findings. One critical challenge to reproducibility, as well as to speech, language, and hearing research, has been equitable inclusion of the diverse populations the field seeks to serve and accounting for heterogeneity in these populations. Black, Indigenous, and People of Color (BIPOC) and other marginalized individuals (e.g., autistic individuals with intellectual disability) may be disproportionately affected by the limitations of frequentist statistics because samples of racially and ethnically diverse individuals are exceedingly small (i.e., due in part to systematic exclusion from research; e.g., Durkin et al., 2015; Rivera-Fiueroa, Marfo, & Eigsti, 2022; Russell et al., 2019). Therefore, BIPOC are subject to greater bias in frequentist-based results than white counterparts. Furthermore, BIPOC experience more barriers to participation in research than their white counterparts, such as distrust in researchers and transportation difficulty (Woodall et al., 2010). Disproportionate demands on BIPOC to increase their participation in research could be coercive and an inappropriate response to the problem of underrepresentation (Jones & Mandell, 2020). In parallel, BIPOC researchers have advocated for community partnership, which by nature, involves investing time and energy into building a few rich relationships (Maye et al., 2021; Girolamo, Rice, & Ghali, in revision). Though community partnerships may be slower than relying on convenience sampling and may yield smaller sample sizes (Henrich et al., 2010), such time and energy leads to socio-ecologically valid knowledge and to the development of services aligned with community priorities (Maye et al., 2021). Bayesian statistics may be optimal relative to frequentist statistics for addressing these issues because it does not rely on large-sample theory to yield meaningful results that are reproducible. Bayesian statistics can yield clinically meaningful and reproducible information with small samples, whereby the benefits of participation are maximized to a greater degree than when implementing frequentist statistics. Thus, consideration of Bayesian statistical approaches is critically important to increasing the credibility and relevance of research findings for heterogeneous populations, and, therefore, critically important to the future of the field of speech, language, and hearing. Reproducibility in Speech, Language, and Hearing Research – The Current Demonstration Reproducible evidence has real-world implications for the speech, language, and hearing sciences. Evidence guides diagnosis and treatment decisions in practice; thus, accurate, meaningful, and generalizable findings are necessary for evidence-based practices. The Bayesian statistical approach promotes reproducibility to a greater degree than frequentist statistics for three primary reasons: (1) it is founded on probability theory that more intuitively aligns with the scientific process; (2) it draws on the cumulative evidence base, combined with new data, to make clinically meaningful decisions; (3) it may be effectively implemented for small, heterogeneous samples. Bayesian statistics may also represent a useful step forward in addressing issues related to underrepresentation of individuals of diverse backgrounds. On one hand, Bayesian approaches maximize the benefits of participation while minimizing undue burden on BIPOC. On the other hand, Bayesian approaches allow researchers to incrementally increase representation of underrepresented racial and ethnic groups in the scientific base, which may mitigate harm to BIPOC communities and increase the generalizability of scientific findings. The current demonstration will employ a Bayesian statistical approach to show how Bayesian statistics can be used effectively for small, heterogeneous samples in speech, language, and hearing research. The goals of the current demonstration focus on Bayesian statistical methods, and we will embed this demonstration within two common research questions pertaining to a clinical population that is characterized by heterogeneity: autism spectrum disorder (ASD). Specifically, in an ASD group, we will test the relationship between behavior that reflects the ASD phenotype (i.e., social and adaptive skills measured by the Vineland Adaptive Behavior Scales, VABS; Sparrow, Balla, & Cicchetti, 1985) and language performance on an experimental task (grammaticality judgment; e.g., Eigsti & Bennetto, 2009). To reflect the complexity of real-world research, we will also test the degree to which these relationships vary when accounting for cognitive ability (Penn Matrix Reasoning; Kurtz et al., 2004), an area of phenotypic heterogeneity in ASD with a strong evidence base (Georgiades et al., 2013; Geschwind, 2009), and sociodemographic factors (e.g., race, maternal education, household income), an area of heterogeneity in ASD with a more limited evidence base (Jones & Mandell, 2020; see also Woodall et al., 2010). To reflect the distinction between these two research questions and the goals of the Bayesian statistical analysis demonstration, we will refer to the two research questions as “research questions” and the goals of the demonstration as “goals.” We will elaborate on the research questions and relevant prior literature in the methods section of this project given that the research questions represent a component of our methods for demonstrating a Bayesian statistical approach. Here, we further outline the goals of the demonstration. Demonstration Goals The current demonstration will show how Bayesian statistics can be effectively implemented with small, heterogeneous samples. We will examine measures that reflect relationships among sample size, prior distributions (e.g., information from prior research), and results under the Bayesian framework. In particular, we will examine the sensitivity of results to the choice of prior distributions and we will examine goodness-of-fit metrics of model-predictions with the actual data. These measures provide guidance as to the degree to which the model, including prior distributions, fits the data, and therefore, how reliable and clinically meaningful results may be. For instance, a small sample combined with imprecise prior distributions will yield a wide PPI, whereas a small sample combined with precise prior distributions will yield a narrower PPI (i.e., as would be expected if analyzing a well-specified versus exploratory hypothesis). This wider PPI will have a higher probability of containing values that are not clinically meaningful than a narrower PPI. A small sample combined with imprecise information will also yield poorer model fit than a small sample combined with precise information. Poorer model fit will yield less reliable results given that the model predicts the data with a higher degree of error than a model with better fit. By examining these measures, we will demonstrate how Bayesian statistics can be effectively implemented with small, heterogeneous samples. We will also demonstrate how this information can be used to obtain clinically meaningful results for the speech, language, and hearing sciences. Goal 1 – Precise versus Imprecise Prior Information. Our first goal will be to compare models that include precise, “informative” prior knowledge (e.g., strong, relevant evidence base) versus imprecise, “noninformative” prior knowledge (e.g., no evidence base). These two models will test this research question: What is the relationship between behavior that reflects the ASD phenotype (social and adaptive skills as measured by the VABS) and language performance on an experimental task (grammaticality judgment) in individuals with ASD? These two models will also test this follow-up research question: How does this relationship vary when accounting for cognitive ability (Penn Matrix Reasoning)? Goal 2 – Weakly-Precise versus Imprecise Prior Information. Our second goal will be to compare models that include weakly-precise prior knowledge (e.g., moderate, somewhat relevant evidence base) versus noninformative prior knowledge. These two models will re-test this research question: What is the relationship between behavior that reflects the ASD phenotype (social and adaptive skills as measured by the VABS) and language performance on an experimental task (grammaticality judgment) in individuals with ASD? These two models will also test this follow-up research question: How does this relationship vary when accounting for sociodemographic factors (e.g., race, maternal education, household income)? Summary. This demonstration will show how clinically meaningful decisions can be made using a Bayesian statistical approach with a small, heterogeneous sample. In particular, we will show how examining sensitivity of results and model fit to prior information informs the degree to which the data support the hypothesis. This demonstration will show how a Bayesian statistical approach can be effectively implemented in speech, language, and hearing sciences with small samples, highlighting how Bayesian statistics promotes reproducibility.
“Bayesian Statistics As An Alternative To Null Hypothesis Significance Testing For The Speech, Language, And Hearing Sciences” Metadata:
- Title: ➤ Bayesian Statistics As An Alternative To Null Hypothesis Significance Testing For The Speech, Language, And Hearing Sciences
- Authors: Caroline LarsonInge-Marie EigstiTeresa GirolamoDavid KaplanSara Kover
Edition Identifiers:
- Internet Archive ID: osf-registrations-yx3f4-v1
Downloads Information:
The book is available for download in "data" format, the size of the file-s is: 85.56 Mbs, the file-s for this book were downloaded 4 times, the file-s went public at Tue May 31 2022.
Available formats:
Archive BitTorrent - Metadata - ZIP -
Related Links:
- Whefi.com: Download
- Whefi.com: Review - Coverage
- Internet Archive: Details
- Internet Archive Link: Downloads
Online Marketplaces
Find Bayesian Statistics As An Alternative To Null Hypothesis Significance Testing For The Speech, Language, And Hearing Sciences at online marketplaces:
- Amazon: Audiable, Kindle and printed editions.
- Ebay: New & used books.
Buy “Bayesian Statistics As An Alternative To Null Hypothesis Significance Testing For The Speech%2c Language%2c And Hearing Sciences” online:
Shop for “Bayesian Statistics As An Alternative To Null Hypothesis Significance Testing For The Speech%2c Language%2c And Hearing Sciences” on popular online marketplaces.
- Ebay: New and used books.