+1443 776-2705 panelessays@gmail.com
  
Understanding psychology researchers?

locate an empirical article from a recent peer-reviewed journal (2014 or newer). Your chosen article should feature a study and include several elements such as one of the research methods discussed in this week’s readings, as well as participants and the study’s outcome. Particular elements should include a hypothesis, method, results, and conclusion. Once you have located an appropriate study/article, develop a 2-page paper that provides a brief introduction of the study and the question that the researcher(s) attempted to address. This section should also include the hypotheses of the study.

Then describe:

· the research design used

· what the researchers discovered

· any limitations in the study

· why the researchers think their findings are important

· your critical assessment on the strengths and weaknesses of the study

Ensure to justify your responses by relating the method or findings to material you learned from this week’s readings.

BA Psychology Written Assignment Rubric

BA Psychology Written Assignment Rubric

Criteria

Ratings

Pts

This criterion is linked to a Learning OutcomeResearch Question/Approach to Research Question – The extent to which the focus of the essay is expressed and specified. This need not be in the form of a question. An example of an alternative form is a hypothesis. The extent to which the essay appropriately addresses and develops the specific research question, including the collection of any relevant information.

20 pts

Superior (20 – 16 points) The research question is clearly and precisely stated in the early part of the essay and is sharply focused, making it susceptible to effective treatment within the word limit. The approach used is well chosen and highly appropriate to the research question.

15 pts

Good (15 – 11 points) The research question is stated in the early part of the essay but one of the following is missing or not completely developed: • The precise manner or the research question is not clearly and precisely stated • It is too broad in scope to be treated effectively within the word limit. • The approach used is generally appropriate to the research question.

10 pts

Needs Improvement (10 – 6 points) The research question is stated in the early part of the essay but BOTH of the following are missing: • The precise manner or the research question is not clearly and precisely stated • It is too broad in scope to be treated effectively within the word limit. • The approach used is generally inappropriate to the research question.

5 pts

Does Not Meet Standards (5 – 0 points) The research question is not stated in the early part of the essay or does not lend itself to systematic investigation in the context of an extended essay. The approach used is completely inappropriate to the research question.

20 pts

This criterion is linked to a Learning OutcomeApplication – The extent to which relevant materials, sources, data and evidence are considered appropriately in the essay.

20 pts

Superior (20 – 16 points) The approach used is well chosen and highly appropriate to the research question.

15 pts

Good (15 – 11 points) The approach used is generally appropriate to the research question.

10 pts

Needs Improvement (10 – 6 points) The approach used is generally inappropriate to the research question.

5 pts

Does Not Meet Standards (5 – 0 points) The approach used is completely inappropriate to the research question.

20 pts

This criterion is linked to a Learning OutcomeArgument/Evaluation – The extent to which the essay develops an argument relevant to the research question from the materials/information considered. Where the research question does not lend itself to systematic investigation in the context of an extended essay, the maximum level that can be awarded is 2.

20 pts

Superior (20 – 16 points) A convincing argument, which addresses the research question, is well developed, well organized and clearly expressed. Where an evaluation is appropriate, it is fully substantiated.

15 pts

Good (15 – 11 points) An argument which addresses the research question is competently developed. Where an evaluation is appropriate, some attempt has been made to substantiate it.

10 pts

Needs Improvement (10 – 6 points) An argument is developed which addresses the research question but which is incomplete. Where an evaluation is appropriate, it is likely to be subjective, with little attempt at substantiation.

5 pts

Does Not Meet Standards (5 – 0 points) There is a limited or superficial attempt to formulate an argument relevant to the research question, or there is no argument relevant to the research question.

20 pts

This criterion is linked to a Learning OutcomeConclusion – The extent to which the essay incorporates a conclusion consistent with its argument, not necessarily in the form of a separate section.

20 pts

Superior (20 – 16 points) A strong conclusion is clearly stated, is relevant to the research question and is consistent with the argument or explanation presented in the essay. Where appropriate, the conclusion clearly indicates unresolved questions and new questions that have emerged from the research.

15 pts

Good (15 – 11 points) A conclusion is clearly stated, is relevant to the research question and is consistent with the argument or explanation presented in the essay.

10 pts

Needs Improvement (10 – 6 points) Some conclusion is attempted which is consistent with the argument presented in the essay.

5 pts

Does Not Meet Standards (5 – 0 points)Little or no attempt has been made to provide a conclusion which is consistent with the argument presented in the essay.

20 pts

This criterion is linked to a Learning OutcomeAPA Style

5 pts

No revision needed (5 points) APA style is used properly throughout the document

4 pts

Some isolated revision needed (4 – 3 points) Typical APA errors are present

2 pts

Moderate revision needed in some areas (2 – 1 points) APA style is used but inconsistently

0 pts

Substantial revision needed (0 points) No evidence of APA style awareness

5 pts

This criterion is linked to a Learning OutcomeGrammar

5 pts

No revision needed (5 points) No grammatical errors are present and document is in good shape grammatically

4 pts

Some isolated revision needed (4 – 3 points) Only a few grammatical errors appear

2 pts

Moderate revision needed in some areas (2 – 1 points) Several grammatical errors appear throughout the document

0 pts

Substantial revision needed (0 points) Major grammatical errors are present throughout the entire document

5 pts

This criterion is linked to a Learning OutcomeClarity

5 pts

No revision needed (5 points) Text is organized well and easy to follow. There is a logical order of information and transitions are present

4 pts

Some isolated revision needed (4 – 3 points) The narrative overall is clear but there is at least one element or section in need of revision

2 pts

Moderate revision needed in some areas (2 – 1 points) Logical order of ideas is present but text is lacking overall coherence

0 pts

Substantial revision needed (0 points) Text is difficult to follow

5 pts

This criterion is linked to a Learning OutcomeRequired Elements

5 pts

No revision needed (5 points) Paper included all required elements seamlessly

4 pts

Some isolated revision needed (4 – 3 points) Required elements included but paper could be developed more fully

2 pts

Moderate revision needed in some areas (2 – 1 points) Paper is missing a required element

0 pts

Substantial revision needed (0 points)Paper is incomplete and is missing most of the required elements

5 pts

Total Points: 100

Estimating publication bias in meta-analyses of peer-reviewed
studies: A meta-meta-analysis across disciplines and journal
tiers

Maya B. Mathur1, Tyler J. VanderWeele2

1Quantitative Sciences Unit, Stanford University, Palo Alto, California

2Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston,
Massachusetts

Abstract

Selective publication and reporting in individual papers compromise the scientific record, but are

meta-analyses as compromised as their constituent studies? We systematically sampled 63 meta-

analyses (each comprising at least 40 studies) in PLoS One, top medical journals, top psychology
journals, and Metalab, an online, open-data database of developmental psychology meta-analyses.

We empirically estimated publication bias in each, including only the peer-reviewed studies.

Across all meta-analyses, we estimated that “statistically significant” results in the expected

direction were only 1.17 times more likely to be published than “nonsignificant” results or those in

the unexpected direction (95% CI: [0.93, 1.47]), with a confidence interval substantially

overlapping the null. Comparable estimates were 0.83 for meta-analyses in PLoS One, 1.02 for top
medical journals, 1.54 for top psychology journals, and 4.70 for Metalab. The severity of

publication bias did differ across individual meta-analyses; in a small minority (10%; 95% CI:

[2%, 21%]), publication bias appeared to favor “significant” results in the expected direction by

more than threefold. We estimated that for 89% of meta-analyses, the amount of publication bias

that would be required to attenuate the point estimate to the null exceeded the amount of

publication bias estimated to be actually present in the vast majority of meta-analyses from the

relevant scientific discipline (exceeding the 95th percentile of publication bias). Study-level

measures (“statistical significance” with a point estimate in the expected direction and point

estimate size) did not indicate more publication bias in higher-tier versus lower-tier journals, nor

in the earliest studies published on a topic versus later studies. Overall, we conclude that the mere

act of performing a meta-analysis with a large number of studies (at least 40) and that includes

non-headline results may largely mitigate publication bias in meta-analyses, suggesting optimism

about the validity of meta-analytic results.

Correspondence Maya B. Mathur, Quantitative Sciences Unit, Stanford University, Palo Alto, CA., [email protected]
AUTHOR CONTRIBUTIONS
Maya B. Mathur and Tyler J. VanderWeele conceptualized the research. Maya B. Mathur oversaw data collection, conducted statistical
analyses, and led manuscript writing. Tyler J. VanderWeele contributed critical intellectual content to the manuscript.

CONFLICT OF INTEREST
The author reported no conflict of interest.

HHS Public Access
Author manuscript
Res Synth Methods. Author manuscript; available in PMC 2021 March 13.

Published in final edited form as:
Res Synth Methods. 2021 March ; 12(2): 176–191. doi:10.1002/jrsm.1464.

A
u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t

Keywords

meta-analysis; publication bias; reproducibility; scientific method; selective reporting

1 | INTRODUCTION

Publication bias—that is, the selective publication of “statistically significant” results1—has

compromised the integrity of the scientific record.2 Empirical results often replicate at lower

than expected rates (e.g., References 3-7), “p-hacking” (i.e., intentionally or unintentionally
rerunning analyses to attain “statistically significant” results) appears widespread,8,9 and

results in some top social sciences journals exhibit severe publication bias.10,11 Most

attention on publication bias and scientific credibility to date has focused on individual

published papers, often those in higher-tier journals. In contrast, meta-analyses represent an

arguably higher standard of scientific evidence, and the implications of publication bias in

individual papers on meta-analyses are not clear. Are meta-analyses of biased literatures

simply “garbage in, garbage out”, or are meta-analyses more robust to publication bias than

are their constituent studies?

Some existing work has investigated the prevalence of “small-study effects” (i.e.,

systematically different point estimates in small vs. large studies) in meta-analyses by

testing for funnel plot asymmetry12,13 and estimating the percentage of systematically

sampled meta-analyses with “statistically significant” funnel plot asymmetry; these

estimates include 7% to 18% among Cochrane Database meta-analyses,14 13% among meta-

analyses in Psychological Bulletin and the Cochrane Database,15 and 27% among medical
meta-analyses.16 However, the purpose of these existing studies was not to provide a pure

assessment of publication bias, as many of the asymmetry tests they used detect small-study

effects that can reflect heterogeneity in addition to publication bias.13,16 Other investigators

have reported strong publication bias in meta-analyses by applying the excess significance

test,17,18 but this method may substantially overestimate publication bias if population

effects are heterogeneous,19,20 which is the case in many meta-analyses.21 Other methods

that have been used to empirically assess publication bias often require population effects to

be homogeneous.15

We built upon prior work by conducting a new meta-analysis of meta-analyses that we

systematically collected from four sources, which spanned a range of journals and

disciplines. We used a selection model22,23 to estimate publication bias severity across all

the meta-analyses, within sources, and within disciplines. Additionally, to explore

hypothesized study-level contributors to publication bias, we assessed whether studies

published in higher-tier journals exhibit more publication bias than those in lower-tier

journals24,25 and whether the chronologically first few studies published on a topic exhibit

more publication bias than later studies (the “Proteus effect”26,27).

Mathur and VanderWeele Page 2

Res Synth Methods. Author manuscript; available in PMC 2021 March 13.

A
u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t

2 | METHODS

2.1 | Systematic search methods

We systematically searched for meta-analyses from four sources: (1) PLoS One; (2) four top
medical journals:i New England Journal of Medicine, Journal of the American Medical
Association, Annals of Internal Medicine, and Lancet; (3) three top psychology journals:
Psychological Bulletin, Psychological Science, and Perspectives on Psychological Science;
and (4) Metalab, an online, unpublished repository of meta-analyses on developmental

psychology. Metalab is a database of meta-analyses on developmental psychology whose

datasets are made publicly available and are continuously updated; these meta-analyses are

often released online prior to publication in peer-reviewed journals.28,29 We selected these

sources in order to represent a range of disciplines, particularly via the inclusion of PLoS
One meta-analyses. Additionally, because selection pressures on meta-analyses themselves
may differ by journal tier, we chose sources representing higher-tier journals, a middle-tier

journal with an explicit focus on publishing all methodologically sound papers regardless of

results (PLoS One), and a source that is not a standard peer-reviewed journal (Metalab). We
chose these specific medical and psychology journals because they are among the highest-

impact journals in these disciplines that publish original research, including meta-analyses.

For the three published sources, we reverse-chronologically reviewed each meta-analysis

published after 2013 until we had obtained data suitable for reanalysis to fulfill or surpass

prespecified sample sizes (Supporting Information). We considered meta-analyses published

after 2013 because we had first searched PLoS One reverse-chronologically until we reached
prespecified sample sizes, which resulted in meta-analyses published after 2013. Then, when

searching the other sources, we also considered only meta-analyses published after 2013 for

consistency with the PLoS One sample. Our inclusion criteria were: (1) the meta-analysis
comprised at least 40 studies to enable reasonable power and asymptotic properties to

estimate publication bias;22,23 (2) the meta-analyzed studies tested hypotheses (e.g., they

were not purely descriptive); and (3) we could obtain study-level point estimates and

standard errors as described in Section 2.2. Regarding the 40-study criterion for articles that

reported on more than one meta-analysis (e.g., because they performed meta-analyses by

subgroup), we considered only the meta-analysis with the largest number of studies. For

PLoS One, we defined three disciplinary categories (social sciences, natural sciences, and
medicine) and searched until we had obtained at least 10 usable meta-analytic estimates per

discipline.

Because relatively few meta-analyses were published in the top medical and top psychology

journals, we included all eligible meta-analyses published after 2013.ii For the unpublished

source, Metalab, we used publicly available data to include the meta-analyses30-34 meeting

the above inclusion criteria. We conducted the searches on December 20, 2018 (PLoS One),

iUltimately, no meta-analyses in New England Journal of Medicine met inclusion criteria, so this journal was not represented in
analyses.
SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.
iiWe prespecified that we would search these sources until we reached 20 medical and 20 psychology meta-analyses, but anticipated
correctly that fewer than 20 would actually have been published in the specified time frame.

Mathur and VanderWeele Page 3

Res Synth Methods. Author manuscript; available in PMC 2021 March 13.

A
u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t

May 13, 2019 (the top medical journals), May 4, 2019 (the top psychology journals), and

May 26, 2019 (Metalab). For PLoS One, we used PubMed to search “meta analysis[Title]
AND ‘PLoS One’[Journal],” restricting the search to 2013 onward. For the top medical and
top psychology journals, we either used comparable PubMed search strings provided online

(https://osf.io/cz8tr/) or we directly searched the journal’s website for papers with “meta-

analysis” in the title or abstract. For Metalab, we used Table 1 from Tsuji et al.35 to screen

10 existing Metalab meta-analyses using our inclusion criterion for the number of point

estimates.

2.2 | Data extraction

We extracted study-level data using publicly available datasets, datasets we obtained by

contacting authors, or data we manually extracted from published forest plots or tables. We

also excluded studies from the grey literature, which we defined as those that were not

published in a peer-reviewed journal or peer-reviewed conference proceeding. Grey

literature therefore included, for example, dissertations, book chapters, and statistical

estimates that the meta-analysts obtained by contacting other investigators. We excluded

grey literature for several reasons. First, we were primarily interested in the specific

selection pressures that shape the peer-reviewed literature, the cornerstone of the scientific

canon. The selection pressures affecting the grey literature may differ from those affecting

the peer-reviewed literature, for example, if the preferences of peer reviewers and journal

editors contribute strongly to publication bias. If we had included grey literature, this could

give an impression of less publication bias than actually affects the canonical, peer-reviewed

literature. Additionally, we speculated that disciplinary norms regarding the inclusion of

grey literature in meta-analyses may differ substantially, potentially complicating our

comparisons of publication bias severity across disciplines. For example, as of the year

2000, the majority of medical meta-analyses did not include grey literature,36 and this

seemed to remain true in our more recent sample of medical meta-analyses. On the other

hand, our impression is that recent meta-analyses in experimental psychology usually do

involve grey literature searches, perhaps reflecting recently heightened attention within this

discipline to publication bias and the “replication crisis.”3

To minimize data entry errors, we used independent dual coding by a team of six research

assistants (Acknowledgments) and the first author, and we used stringent quality checks to

verify data entry. Details of the data extraction process appear in the Supporting

Information, and the final corpus of meta-analyses is publicly available (excluding those for

which we could obtain data only by contacting the authors) and is documented for use in

future research (https://osf.io/cz8tr/). For each meta-analysis in the top medical and top

psychology groups, we coded each meta-analyzed study by journal, publication year, and the

journal’s Scimago impact rating.37 Scimago ratings are conceptually similar to impact

factors, but weight a journal’s citations by the impact of the citing articles rather than treating

all citations equally. Additionally, unlike impact factors, Scimago ratings are available in a

single, standardized online database.37 We coded each study by its journal’s Scimago rating

in 2019 or the most recent available rating regardless of the study’s publication year in order

to avoid conflating overall secular trends in scientific citations with relative journal rankings.

We defined “higher-tier” journals as those surpassing a Scimago rating of 3.09 for

Mathur and VanderWeele Page 4

Res Synth Methods. Author manuscript; available in PMC 2021 March 13.

A
u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t

psychology (chosen such that the lowest-ranked “higher-tier” journal was Journal of
Experimental Psychology: General and all specialty journals were considered “lower-tier”)
or 7.33 for medicine (chosen such that the lowest-ranked “higher-tier” journal was Annals of
Internal Medicine).iii All other journals were defined as “lower-tier.”

To assess whether publication bias was more severe for the first few studies published on a

topic compared to later studies, we coded studies as being published “early” vs “later” as

follows. For each meta-analysis, we considered the first chronological year in which any

study was published; if multiple studies were published that year, then all point estimates

from those studies were coded as “early.” If instead only one study was published during the

first year, then all point estimates from all studies published during the chronologically first

2 years were coded as “early.” All point estimates not coded as “early” were coded as

“later.”

2.3 | Primary statistical analyses

2.3.1 | Estimates of publication bias severity—We estimated publication bias using
selection models (e.g., References22,23,38), a class of statistical methods that assume that

publication bias selects for studies with statistically “significant” results in the expected

direction, such that these results (which we term “affirmative”) are more likely to be

published than statistically “nonsignificant” results or results in the unexpected direction

(which we term “nonaffirmative”) by an unknown ratio. This selection ratio represents the

severity of publication bias: for example, a ratio of 30 would indicate severe publication bias

in which affirmative results are 30 times more likely to be published than nonaffirmative

results, whereas a ratio of 1 would indicate no publication bias, in which affirmative results

are no more likely to be published than nonaffirmative results. This operationalization of

publication bias, in which “statistically significant” results are more likely to be published,

conforms well to empirical evidence regarding how publication bias operates in practice8,39

and provides an intuitively tractable estimate of the actual severity of publication bias itself.

Selection models essentially detect the presence of non-affirmative results arising from

analyses that were conducted but not reported; these results are therefore missing from the

published and meta-analyzed studies. Specifically, we used a selection model that specifies a

normal distribution for the population effect sizes, weights each study’s contribution to the

likelihood by its inverse-probability of publication based on its affirmative or nonaffirmative

status, and uses maximum likelihood to estimate the selection ratio.22,23 The normal

distribution of population effects could reflect heterogeneity arising because, for example,

studies recruit different populations or use different doses of a treatment; even if these

moderators are not measured, selection models can still unbiasedly estimate the severity of

iiiWe set these thresholds based on the discipline of the meta-analysis’ journal, not that of the study’s journal, because we did not have
fine-grained data on each study’s disciplinary category. Therefore, in principle, a study published in a medical journal but included in a
psychology meta-analysis might be spuriously coded as “higher-tier” because it was compared to the lower threshold for psychology.
However, the impact on analysis would likely be minimal. Of the 84% of unique journals in our dataset that were included in journal
tier analyses and that also had a topic categorization available in the Scimago database, only three journals with the string “medic*” in
the Scimago categorization were published in psychology meta-analyses, and manual review indicated these journals were genuinely
interdisciplinary rather than purely medical. Additionally, these journals would have been coded as “lower-tier” regardless of which
threshold was applied. No journals with “psych*” in the Scimago categorization were included in medical meta-analyses.

Mathur and VanderWeele Page 5

Res Synth Methods. Author manuscript; available in PMC 2021 March 13.

A
u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t
A

u
th

o
r M

a
n
u
scrip

t

publication bias as long as the type of heterogeneity that is present produces approximately

normal population effects.22,23

As in standard meta-analysis, selection models assume that studies’ point estimates are

independent, but this assumption may be violated when some studies contribute multiple

point estimates to a meta-analysis (e.g., estimates of a single intervention’s effect on

different subject populations). To minimize the possibility of non-independence, we

randomly selected one point estimate per study within each meta-analysis and then fit the

selection model to only these independent estimates. Because the “expected” effect direction

differed across meta-analyses, we first synchronized the signs of all point estimates so that

positive effects represented the expected effect direction. To this end, we first reanalyzed all

point estimates using restricted maximum likelihood estimation and the R package metafor

and, treating the sign of the resulting pooled point estimate as the expected effect direction,

reversed the sign of all point estimates for any meta-analysis with a negative pooled point

estimate. We fit a selection model to estimate the inverse of the selection ratio and its

standard error.22,23 We then used robust methods40 to meta-analyze the log-transformed

estimates of the selection ratio, approximating their variances via the delta method. We used

the R packages weightr41 and robumeta,42 respectively, to fit the selection model and robust

meta-analysis.

To characterize variability across individual meta-analyses in publication bias severity, we

calculated non-parametric calibrated estimates of the true selection ratio in