Week 12: previewing the "down-slope" of your one-article review
Good morning. Sunny and somewhat warm, with chance of rain on Wednesday; then, Thursday promises to be chilly. COMPLETE YOUR ER REVIEWING TASK TONIGHT!
Where are we going? We're off to see the wizard, the wonderful wizard of Oz. But, we have some lions and tigers and bears, oh my, left.
Let the lions be your body paragraphs. These are hard. I am glad to see that most of you are wrestling with that content. Now, the next risk to vanquish-->
Let the tiger (no s, is one paragraph!) be your stats paragraph, which is a combination of you reporting what stats vetting used by your researcher AND your commentary.
Let the bear(s) be your critique paragraph. I ask you to write at least ONE comment (why the bear might be plural) where you critique the findings of your researcher. Hint: they typcially self critique and you can report that.
You can see that we focus on the end of your review. Therefore, do a self-check: does your review document look like a lemon? Does the reader slide off through your critique into a few, closely related conclusory ideas?
Does your document look more like a pear? Here, the reader moves through your critique into several caveats about conclusions, complexity about conclusions, policy context etc.
Hint: most people will rely on the lemon shape, as we offer two analysis paragraphs -- stats and generall critique -- then your conclusion paragraph. The order of the stats and general critique are your choice.
Good news: I do not GRADE THE CONTENT OF YOUR STATS PARA. Aren't you relieved?
Now, onto the hardest analysis piece: evaluating the statistics used to vet the arguments made about data inference. Statistics overview in class this week. I urge you to talk about statistics/logos critical thinking with your science and math professors. To warm up, the ManU Phrasebank includes a "Describe quantities" section. Then, check out the "Reporting results" section, which will help your read your paper's use of statistics or number logos.
You will get better in the future about this critical thinking as you mature as a scientist: Promise! For example, in my field of ecology and environmental science, we are in a quiet riot over frequentist, mutivariate, and Bayesian statistics. This was an assigned reading for me, in one of my classes. Here is another.
For biomedical researchers, you may appreciate this analysis of the limits of p-values in biomedial research.
Please look at your research articles for Wednesday, noting the type of statistics tool/logos of numbers (web exhibit with short definitions) used. Look these up in some way to have a working definition for yourself. Common tools or tests from student papers over the last 15 years include:
- p-values
- confidence intervals
- Student's t test (and corrections)
- analysis of variance (ANOVA); one-tail, two-tail
- power
- sample size
- type of study/limits -- observational study, case note, double-blind
I recommend using the link above to warm up your brain with a short working definition (remember this critical analysis tool from the rain garden memo?) and then go to Wikipedia or even a text book to read about your selected term(s) for more detail.
The pre-reading activity will help you enter into the complexity. Cognitive wedge is also your thinking friend.
I simply want you to know about this area within science articles, even if you do not understand now the statistics. You would not be alone among scientists, if you don't. I don't, in many cases. However, I want you to leave this class with an understanding of this important piece of critical thinking for your field.
One key idea I can wax on about, though, is cautions about the (very limited) definition of significance testing and p-values. For fun, enjoy this comic.
More generally, your critical analysis can comment on findings, your ideas or your close reading the author critique. The Manchester University phrasebank is really helpful. Here are a few selections that I copy/paste here for you. From the "Being critical" section, see these categories-->
Introducing problems and limitations: theory or argument
Introducing problems and limitations: method or practice
Using evaluative adjectives to comment on research
Introducing general criticism
Introducing the critical stance of particular writers
Practical note on dividing your critique: use separate paragraphs for specific discussion on stats/logos of numbers vetting from your more general critique. For this class, you can pick one limiation to comment on, even though in real life, you would look at more than one weakness. In someways, to focus on one represents a short presentation at a conference. In a seminar setting, you would present more than one weakness. Again, the ManU Phrasebank is so helpful. From "Discussing findings"-->
Advising cautious interpretation of the findings
Another source of uncertainty is …
A note of caution is due here since …
These findings may be somewhat limited by …
These findings cannot be extrapolated to all patients.
These data must be interpreted with caution because …
It could be argued that the positive results were due to …
These results therefore need to be interpreted with caution.
In observational studies, there is a potential for bias from …
It is important to bear in mind the possible bias in these responses.
Although exclusion of X did not …, these results should be interpreted with caution.
However, with a small sample size, caution must be applied, as the findings might not be …
It is possible that these results | are due to … are limited to … do not represent … have been confounded by … were influenced by the lack of … may underestimate the role of … are biased, given the self-reported nature of … may not be reproducible on a wide scale across … |
Wednesday serves up some much needed rain! More this evening. Gardener, tree take-down crew, and plant scientist rejoices. Ok, more on writing about statistics and the logos of numbers.
I WILL POST THE OPTIONAL yet highly recommended) ER WRITING TASK TODAY, circa noon. You must complete by Monday evening, 11:45. Then I open up the OPTIONAL ER REVIEWING TASK. What this means is that you can see what others do but are not required to respond to others. How is that for going into the T-G week?
Using a p-value test (conditions): The primary and surprisingly common mistake here is the p-values to not fit your experiment design or the underlying distribution of the data. Another way to say: can mean that your data set does not fit the related statistical model; this means that the poor values might mane you picked the wrong test for your study design. Fix? Consult a statistician in the study design phase and perhaps in the data analysis phase. For example,
What is a p-value/significance testing anyway? (If you focus on p-values in your final document, I would place this key definition FIRST.)
P-values do not measure the probability that the studied hypothesis is true (though thinking this is helpful), or the probability that the data were produced by random chance alone. Instead, p-values really look at the null hypothesis utility.
In class, we will talk a bit about scale of vision. At high altitude, we can think of p-values and this testing in this way. However, technically, we have the step of accepting or rejecting the null hypothesis.
Human judgement matters more than p-values. P-values are part of an exacting critical analysis. Scientific conclusions, by researches and readers, as well as business and/or policy decisions should not be based solely on desired aka low p-values.
Ethics matter! Proper, robust, and intellectually responsive inference-making requires full reporting and transparency. P-hacking manages to slip through because researchers are not fully honest in their full data set choices and the timing of those choices.
Pause between p-values and power: Statistics help us make meaning. Meaningfulness is not assured by significance testing. A p-value rooted in significance testing does not signal or confirm the importance of a result. Related: a p-value does not measure the size of an effect
By itself, a p-value does not adequately nor responsibly measure evidence quality; likewise, a p-value can not confirm the intellectual integrity of a study design, supporting model/theory or even the research hypothesis.
Let's talk about power. Many of us look at sample size and conclude the robustness of a finding based in part onn a larger sample size. What is large any way? Depends on research context and even a discipline. You want to ask in the future after you look at sample size this question. How does power work here? Did the researchers even report this important statistical quality? From Editage, this short piece pairs nicely with this definition on statistical power (three-minute YouTube explainer by a biostatistian) will help you.
Bottom line: I want you to think about these ideas. Write in the way that you can. I will NOT assess the content for you. Hint: if you plan to use this piece as a writing sample for grad school, either take the stats analysis paragraph out or consult with a mentor in your field.