Conclusory Pleading on the U.S. Courts of Appeals After Iqbal: An Empirical Study

Volume 114June 2026Sean FarhangSymposium

Jun 30

This Article presents the first systematic empirical study of “conclusory pleading” as a form of plausibility analysis in the U.S. courts of appeals following Ashcroft v. Iqbal, which critics argued would harm plaintiffs by creating excessive demand for information before discovery, and by increasing judicial subjectivity and ideology in decision-making in Rule 12(b)(6) decisions, particularly in civil rights litigation. Contrary to the canonical two-step account of plausibility pleading, courts of appeals almost never evaluate whether pleadings are conclusory fact by fact before proceeding to assess plausibility. Instead, in a one-step analysis they assess whether allegations, on the whole, are generally too conclusory to be plausible, or they render that judgment as to some particular key assertion in the context of the full complaint, without rejecting any other pleaded fact. Conclusory pleading analyses appeared in approximately 6% of Rule 12(b)(6) cases appealed from 2002 through Twombly and in 12% of cases in the decade following Iqbal, and they were very heavily concentrated in civil rights cases. Unexpectedly, statistical models indicate that the presence of a conclusory pleading plausibility issue is associated with a higher likelihood of plaintiff success on appeal after Iqbal. Also contrary to prevailing expectations, such issues are not associated with greater ideological or identity-based voting by judges as compared to other 12(b)(6) issues. However, in a large set of civil rights claims decided before Twombly, judges’ race and gender were insignificant predictors of outcomes, whereas after Iqbal, the presence of women and non-White judges on panels were positively associated with plaintiff wins. These findings suggest that any increased judicial subjectivity after Twiqbal likely stems from aspects of plausibility doctrine other than conclusory pleading, shifts in case composition triggered by Twiqbal, or the broader politicization of pleading standards.

Table of Contents Show

Download PDF

Introduction

This Article undertakes an empirical investigation of the role of “conclusory pleading” on the U.S. courts of appeals after Ashcroft v.Iqbal.[1] In Bell Atlantic Corp. v.Twombly[2] and Iqbal,[3] the Supreme Court invoked discovery costs as a reason to change the federal pleading requirements. The Court’s reinterpretations of the pleading requirements introduced a new gatekeeping strategy, rooted in (1) denying a presumption of truth to allegations deemed “conclusory,” and (2) evaluating only the remaining (non-conclusory) allegations—dismissing claims regarded as “implausible” based on “judicial experience and common sense.”[4] Critics of Twombly and Iqbal worried that the new 12(b)(6) standard would hurt plaintiffs by introducing unwarranted fact-pleading requirements before discovery, and excessive judicial subjectivity and ideology into disposition of 12(b)(6) motions, with a particular concern about civil rights cases. This is the first empirical investigation of conclusory pleading on the courts of appeals after Iqbal (or, as far as I am aware, on any court).

It is important to be clear at the outset about what I mean by “conclusory pleading” in the plausibility framework, for the phrase lacks consistent usage, even by courts. The standard understanding that plausibility pleading requires a two-step analysis is false as a description of judicial practice on the courts of appeals. To the contrary, when court-of-appeals panels engage the issue of conclusory pleading, they almost never assess whether allegations are well-pleaded or conclusory on a fact-by-fact basis before turning to an evaluation of plausibility with respect to only well-pleaded facts. Instead, panels set out material factual allegations and reach a general judgment about whether allegations are too conclusory or render that judgment as to some particular key assertion in the context of the full complaint without rejecting any other pleaded fact. A determination of conclusoriness is a determination that the allegations fail to undergird a reasonable inference to the ultimate fact(s) that the plaintiff must later prove, like discriminatory intent or conspiratorial agreement.

When the court determines that the allegations are conclusory, that entails a determination that the claim is not “plausible” and should be dismissed. Thus, the two steps are collapsed into one. There was no instance in the data described below in which a court deemed a plaintiff’s pleadings to be conclusory and then proceeded to find that the plaintiff stated a (plausible) claim. In this sense, this study is about the issue of “conclusory pleading” as a form of plausibility analysis, not as a first step in the analysis that precedes an assessment of plausibility. I provide concrete examples of what this analysis looks like on the courts of appeals below.[5]

I find that conclusory pleading analyses were present, though rare, prior to Twombly and about doubled in frequency following Iqbal. In 12(b)(6) appeals, a conclusory pleading issue arose in 6% of cases from 2002 to Twombly, and in 12% of cases in the decade after Iqbal. Such issues were heavily concentrated in civil rights claims, where they arise at nearly triple the rate they arise in non-civil rights claims, with a particularly heavy concentration in employment discrimination and police abuse cases.

However, unexpectedly, when the conclusory pleading variable is included in statistical models of outcomes, it is significant and indicates that plaintiffs are more likely to prevail in the appeal. Also contrary to the expectations of many, the presence of a conclusory pleading issue is not associated with greater influence of judicial identity and ideology on outcomes as compared to cases that lack such an issue. Finally, I compare decision-making in a large and important category of civil rights claims and find that judge identity characteristics were uniformly insignificant predictors of outcomes pre-Twombly, but judges’ race and gender emerged as significant and substantively important predictors after Iqbal. Thus, while I do not find evidence that conclusory pleading issues, in particular, were a pathway to a heightened role for judicial subjectivity in determining outcomes on 12(b)(6) appeals, judicial identity did, in fact, emerge as significant after Iqbal in civil rights cases. This may have resulted from facets of plausibility pleading other than conclusory pleading issues, changes in the caseload after Twiqbal, or the divisive politicization of the pleading issue after Twiqbal.

I. The Supreme Court’s Pleading Decisions in Twombly and Iqbal, and Critics’ Concerns

A. The Twombly and Iqbal Decisions

The Supreme Court cited widespread discovery abuse, exorbitant discovery costs, and the claimed inability of federal judges to control discovery as grounds for altering federal pleading standards. It did so by deploying the distinction between “facts” and “conclusions” and requiring dismissal of complaints that, once stripped of conclusions, fail to allege enough facts to render the plaintiff’s claim “plausible.”[6] In order that defendants in a massive antitrust case might be spared “impositional discovery,”[7] in Twombly the Supreme Court made it more difficult for plaintiffs to survive a motion to dismiss for failure to state a claim under Rule 12(b)(6).

Twombly concerned a class action antitrust conspiracy complaint under section 1 of the Sherman Act against the regional telecommunications service providers left after the breakup of AT&T. The Court “retired” Conley v. Gibson’s statement that “a complaint should not be dismissed . . . unless it appears beyond doubt that the plaintiff can prove no set of facts . . . which would entitle him to relief.”[8] Agreeing nonetheless with Conley that a complaint must provide “fair notice of what the . . . claim is and the grounds upon which it rests,”[9] and emphasizing Rule 8’s requirement that the statement of claim “show[] that the pleader is entitled to relief,”[10] the Court construed the former as demanding “more than labels and conclusions,”[11] and the latter as requiring that its “[f]actual allegations must be enough to raise a right to relief above the speculative level.”[12] The Court then held that, for a section 1 Sherman Act claim, these standards “require[d] a complaint with enough factual matter (taken as true) to suggest that an agreement was made.”[13] Treating direct allegations of conspiracy as conclusory, the Court found the plaintiffs’ claims implausible because they were based on allegations of “parallel conduct and not on any independent allegation of actual agreement among [defendants].”[14]

Two years later, in Ashcroft v. Iqbal, the Court again voiced concern over discovery costs, this time focusing on the burden of diverting the time and attention of high-ranking government officials asserting qualified immunity.[15] The case concerned claims by a Pakistani citizen who was arrested by federal officials after the 9/11 attacks and later transferred to the federal Metropolitan Detention Center (MDC) in Brooklyn, New York, pending trial on immigration-related charges. The complaint alleged that Iqbal’s seven-month confinement in highly restrictive and egregiously abusive MDC conditions stemmed from unlawful racial and religious discrimination. It further claimed that FBI Director Robert Mueller and U.S. Attorney General John Ashcroft adopted and/or approved policies and directives under which Iqbal was confined and abused—policies that intentionally discriminated based on religion and race.[16]

Under Twombly, the Court assessed the plausibility of the inferential basis for plaintiff’s case.[17] It disregarded his allegations that defendants “knew of, condoned, and willfully and maliciously agreed to subject [him]” to harsh confinement “as a matter of policy, solely on account of [his] religion, race, and/or national origin,” that Ashcroft was the “principal architect” of this discriminatory policy, and that Mueller was “instrumental” in adopting and executing the policy.[18] The Court did so on the ground that the allegations were “conclusory”—“bare assertions,” mere “formulaic recitation[s] of the elements”—and thus not entitled to a presumption of truth without “further factual enhancement.”[19] Relying on “judicial experience and common sense,”[20] the Court found the complaint implausible. Because the Federal Rules are trans-substantive, the Court made clear that its approach applies to all types of claims.[21]

B. Normative Criticisms

There is a vast normative literature on the Court’s decisions in Twombly and Iqbal. One theme focused on the Court’s distinctions among “facts,” “conclusions,” “legal conclusions,” and “threadbare allegations,” as arbitrary distinctions that unfairly impede access to justice.[22] Another was “that when the defendant controls critical private information, Iqbal creates an apparent catch-22 for plaintiffs, requiring them to plead information they do not know but denying them a means of discovering that information.”[23] Yet another questioned the legitimacy of changing pleading law via judicial decision, given the requirements of the Rules Enabling Act as interpreted in past decisions, and, relatedly, on the lack of an adequate empirical basis to justify the logic of Twombly and Iqbal.[24]

The present inquiry focuses on a further widespread critique that the decisions injected a great deal of subjectivity into the standard for deciding 12(b)(6) motions. Scholars argued that this is inherent in the well-known elusiveness of distinguishing between “facts” and “conclusions,” thus amplifying judicial discretion.[25] Moreover, critics of Iqbal were skeptical of the notion that “judicial experience and common sense” can provide meaningful, objective legal constraints on judges evaluating whether a complaint states a plausible claim, further exacerbating the problem of subjectivity.[26] Given the lack of a factual record, critics expressed concern that the new pleading standard, as compared to the one it replaced, created ample space for unconscious bias to determine outcomes. Civil rights claims, including claims of employment discrimination, were the focus of particular concern in this regard.[27] One reason for this may be that in some civil rights claims observers will “interpret th[e] facts against the background of competing subcommunity understandings of social reality,”[28] making them strong candidates for the operation of cognitive biases.[29] For example, a judge with no personal experience of employment discrimination, or of police abuse, may be less likely to regard such allegations as plausible.

C. Previous Empirical Study of Twiqbal’s Effects

Twiqbal (the pleading regime established by Twombly and Iqbal) spawned a large empirical literature on its effects at the trial court level.[30] Several studies were executed in a manner that seriously grappled with data and selection problems, particularly the work of William Hubbard[31] and Jonah Gelbach.[32] Those studies report results ranging between statistical insignificance and relatively modest anti-plaintiff effects.[33] Together, this work seems fairly characterized as discerning an impact short of that anticipated by Twiqbal’s critics. An arguable lesson from the work is that excessive focus on doctrine-changing Supreme Court opinions, and qualitative doctrinal analysis of their impact, may present a misleading picture of what is happening in the federal civil justice system in trial courts.

Trial court studies analyzing large and complete datasets lack written opinions, so they cannot tailor their empirical analyses to focus on the issue types most affected by Twiqbal. Trial court studies using a random sample of cases, or the full universe, are based on docket sheets or Administrative Office data files that simply record the filing of motions and their outcomes. These studies estimate effects on 12(b)(6) motion frequency or grant rates in general, across a wide range of issue types, whether or not they are linked to the doctrinal changes wrought by Twiqbal. Their virtue is complete data, but the data does not include written opinions for each case that would allow identifying the subset of cases affected by Twiqbal’s doctrinal changes. The smaller the share of the 12(b)(6) docket affected by Twiqbal, the harder it will be to detect an effect using such data.

Other trial court studies have evaluated the impact of Twiqbal based on written opinions.[34] They have the virtue of offering researchers the opportunity to attempt identifying the set of cases that implicate Twiqbal’s doctrinal changes, allowing for a more targeted analysis. However, trial court studies based on the content of opinions suffer from problems of incomplete data since many such opinions are neither published nor on electronic databases.[35] As a result of this selection threat, it is not evident how representative such analyses are. These challenges are perennial ones that are inherent to trial court studies, where it is exceedingly difficult and costly to assemble random samples in which the researcher has access to written opinions associated with each decision.

Studying pleading decisions on the courts of appeals offers opportunities absent in trial court studies, while also facing distinctive and significant limitations in the inferences it can yield. In contrast to trial court studies, a courts-of-appeals study has the benefit of relatively complete data, with ready access to opinions in each case, making the issues raised and the basis of the court’s decision discernable. However, a courts-of-appeals study also has important limitations. It cannot be assumed that patterns on the courts of appeals regarding conclusory pleadings mirror those in the district courts, and the nature of selection processes that generate the appellate data are unknown. We simply do not know, for example, if certain types of issues (e.g., conclusory fact pleading versus legal sufficiency) are appealed at different rates than they appear in trial court opinions, and we cannot establish the relationship between the trial and appellate dockets at the issue level because we lack complete data on trial court opinions.

Thus, what is possible here is mostly (but not entirely) limited to a descriptive exercise characterizing aspects of how conclusory pleading has played out on the courts of appeals, though it may also sharpen questions and generate puzzles that can be investigated at the trial court level in the future. This characterization of most of the results as descriptive, however, is not true of the results related to the influence of judge identity characteristics on case outcomes. As discussed below, random assignment of cases to panels allows confident identification of the impact of judge identity characteristics, allowing a strong test of the relationship between judicial subjectivity and case outcomes when conclusory pleading issues are presented in the plausibility framework.

II. Data and Methodology

Recognizing those limitations, I turn to novel data on the analysis of conclusory pleading issues by the courts of appeals, seeking to explore these questions: (1) How frequently were conclusory pleading issues evaluated in federal appellate-court pleading decisions prior to Twombly (if they were present at all), and how frequently were these analyses present after Iqbal, which is pertinent to assessing the scope of Twiqbal’s impact? (2) How did the presence of conclusory pleading issues in federal appellate-court decisions vary across policy area, and were they more frequent in civil rights cases? (3) Was the presence of a conclusory pleading issue associated with the likelihood of a plaintiff losing, and did any such relationship vary across civil rights versus non-civil rights cases? (4) Is the presence of a conclusory pleading issue associated with greater influence of judge identity characteristics on case outcomes, as predicted by the hypothesis that conclusory pleading issues could amplify judicial subjectivity? (5) Do we find higher levels of influence of judge identity characteristics on case outcomes in civil rights pleading cases after Iqbal than we do before Twombly, also as predicted by the hypothesis that Twiqbal’s elevation of the conclusory pleading issue would amplify judicial subjectivity?

A. The Data

In this study, I begin with data Professor Stephen B. Burbank and I collected for Politics, Identity, and Pleading Decisions on the U.S. Courts of Appeals,[36] which analyzed the extent to which judges’ ideology, gender, and race are associated with federal appellate-court decisions applying Federal Rule of Civil Procedure 12(b)(6) after Iqbal. That project did not code or investigate the issue of conclusory pleading. For this paper, all cases were coded for the presence of a conclusory pleading issue, and I also collected pleading decisions from 2002 until Twombly to allow several pre- and post-Twiqbal comparisons.

The data is comprised of cases in which federal appellate panels reviewed district court rulings on motions to dismiss for failure to state a claim under Rule 12(b)(6). We excluded pro se matters and cases applying a heightened pleading standard imposed by rule or statute. From the full universe of cases decided between the Iqbal decision and the end of 2019, we randomly selected a sample of 700.[37] In this sample, 36% of cases involved a civil rights claim. To ensure sufficient data for separate analyses of discrimination claims and other civil rights claims, we drew an additional random oversample of 206 civil rights cases (covering both discrimination and non-discrimination claims) and 130 additional cases specifically asserting discrimination claims.

Only 35% of the random sample cases were precedential, so to allow for separate analysis of precedential cases, we collected an additional 942 precedential cases (with no policy area restrictions) over the same time period. These comprised all precedential cases with general Westlaw headnotes for Rule 12(b)(6) motions that were not already included in our random samples. In total, the dataset contains 1,978 cases from Iqbal through 2019.

I extended the random sample back in time to 2002, adding 393 cases from 2002 until the Iqbal decision in 2009, rendering a full random sample of 1,093 cases with pleading decisions spanning 2002 to 2019. My goal with this pre-Twiqbal random sample is simply to allow comparison of the frequency of conclusory pleading issues before and after Twiqbal.[38] I also collected all courts-of-appeals “other civil rights” cases decided between 2002 and Twombly, rendering 367 cases containing 608 causes of action subject to a 12(b)(6) decision. Each cause of action was fully coded for all variables described below and in the Appendix. Collecting these pre-Twiqbal “other civil rights” cases allows a pre- and post-Twiqbal comparison of judicial behavior under Conley versus Twiqbal in one particularly important policy area that concerned Twiqbal’s critics.

The unit of analysis is the claim, not the case. A single case often contains multiple claims; motions to dismiss address individual claims, and both district courts and courts of appeals frequently grant such motions for some claims while denying them for others in the same case. To code the data, coders read the court’s Rule 12(b)(6) analyses in each decision in full. The random sample of 700 cases included evaluations of 1,136 claims under Rule 12(b)(6) (about 1.6 claims per case). The oversample of 942 precedential cases included evaluation of 1,794 claims under Rule 12(b)(6) (about 1.9 claims per case).

The dependent variable indicates whether a decision is pro- or anti-plaintiff. We coded a decision as anti-plaintiff (=0) if the court of appeals affirmed the trial court’s grant of a motion to dismiss or reversed the trial court’s denial of such a motion. Conversely, we coded a decision as pro-plaintiff (=1) if the court of appeals reversed the trial court’s grant of a motion to dismiss or affirmed the trial court’s denial.

While I believe the dependent variable captures many important aspects of the development and application of the law governing dismissal under Rule 12(b)(6), I acknowledge its limitations. Assessing how a judge’s or panel’s characteristics affect lawmaking is a complex task. The most evident indication of influence is a shift in the chance that the appellant will prevail. Yet, much of judges’ negotiations and deliberations may center on how to frame or justify the decision once the outcome is already determined.[39] Decisions about framing and justification can have significant implications for the policy consequences of an opinion in future cases. These limitations represent a trade-off inherent in large-N empirical studies, which focus primarily on outcomes rather than the broader scope and implications of judicial reasoning explored in qualitative research.

This measurement constraint limits the conclusions that can be drawn from the data. If the data reveals that some panel composition—with respect to party, gender, or race—is not associated with either pro- or anti-plaintiff decisions, one cannot conclude that the characteristic has no directional influence on the content of opinions. However, reversing an outcome is a strong form of influence, so to the extent that some panel composition is significantly associated with dismissal rates, that panel composition is likely also shaping opinion content in more subtle ways in the same direction.

For each case, we identified the party of the appointing president,[40] gender, and race of each judge using the Federal Judicial Center’s biographical database.[41] With respect to race, we compare non-White judges to White judges.[42] The inferences that I will draw in this paper from the party, gender, and race variables are based on the assumption that case assignment to panels is random, or “as-if” random, regarding the relationship between panel composition and the merits of the motion to dismiss.[43] I incorporate a battery of controls that are detailed in the Appendix. The models also contain circuit-fixed effects and year-fixed effects, the significance of which is also discussed in the Appendix.

Finally, the data discussed and analyzed throughout the Article is limited to cases in which at least one plaintiff is an individual (or a class of individuals) and at least one defendant is a business or governmental entity. In our random sample of 700 cases, 78% meet this criterion—95% of civil rights claims and 68% of non-civil rights claims. Cases that do not meet the criterion are most often business-versus-business or business-versus-government disputes. The chief concerns about Twombly and Iqbal have centered on the pleading hurdles faced by individuals suing businesses (as in Twombly) or government entities (as in Iqbal). This party structure limitation also offers a more plausible basis for testing preferences along a liberal-conservative spectrum. For example, commercial disputes between businesses are rarely thought to turn on judges’ ideology, gender, or race.

B. Conclusory Pleading in the Plausibility Framework: A One-Step Analysis

There is no consensus among civil-procedure-doctrine scholars on how, exactly, Twiqbal changed doctrine, or what empirical features identify cases that would be affected by the change.[44] The standard understanding is that, post-Iqbal, plausibility pleading requires a two-step analysis: (1) the court parses the complaint’s factual assertions, determines which are conclusory and which are well-pleaded, and disregards conclusory pleadings; and (2) it then renders a judgment about whether the remaining well-pleaded allegations state a plausible claim for relief, guided by judicial experience and common sense.

I focus on conclusory pleading and regard it as the most significant form of plausibility analysis. Application of Twiqbal in the courts of appeals does not track the textbook account of a two-step test. When appellate-court panels engage the issue of conclusory pleading in practice, they almost never parse material factual allegations fact by fact and state which are conclusory and which are well-pleaded before then proceeding to a separate plausibility analysis focused on the allegations deemed well-pleaded.

Instead, in their analyses, which are often quite short (65% are nonprecedential), panels overwhelmingly do one of two things: (1) set out material factual allegations and render a general judgment, on balance, about whether the allegations are factually sufficient or too conclusory; or (2) render that judgment as to some particular key assertion in the context of the full complaint, without rejecting any other pleaded fact. In determining whether the pleadings are “conclusory,” the court evaluates whether they are too “sparse,” “vague,” “speculative,” or “nonspecific” to undergird a reasonable inference to the ultimate fact that the plaintiff must later prove, like discriminatory intent or conspiratorial agreement. When the court undertakes such an analysis as to any cause of action, the cause of action is coded as containing a conclusory pleading issue.

In this context, when the court determines that the allegations are conclusory, that entails a determination that the claim is not “plausible,”[45] or more commonly, that the claim must be dismissed for failure to state a claim, without use of any variant of the word “plausible.”[46] In this sense, the two steps are collapsed into one, and a determination of conclusoriness is a determination that the claim will be dismissed for failure to state a claim, sometimes using plausibility language, and sometimes not.[47]

Thus, what is captured by the “conclusory” variable is analysis by courts of whether fact pleadings are too conclusory to support a reasonable inference to some ultimate fact that plaintiff must prove. The purpose of the analysis is to decide whether to accept pleaded facts as true, where a determination that the pleadings are conclusory is a determination that the plaintiff has failed to state a plausible claim. There was no instance in the data in which a court deemed a plaintiff’s pleadings to be generally conclusory, or deemed a key allegation to be conclusory, and then proceeded to find that the plaintiff stated a (plausible) claim.

Interestingly, appellate-court deviation from Iqbal’s fact-by-fact approach to evaluating a conclusory pleading issue, and their collapse of the two steps into one, conform closely to how Professor Robert G. Bone understood Twombly, and why he regarded it as better for plaintiffs than Iqbal. Bone critiqued Iqbal’s fact-by-fact approach, arguing that Twombly’s holistic approach was superior:

[I]t is the complaint as a whole that must meet the standard, not each individual allegation taken separately. The fact that one allegation is extremely general should not matter as long as other allegations fillin the necessary detail. Moreover, the complaint is not just a list of individual allegations. The complaint is supposed to give a coherent account of the relevant events and transactions involved in the dispute. Therefore, it must be interpreted as a coherent whole, and the sufficiency of its allegations must be evaluated in a holistic way.[48]

Under this approach, Bone observes, Iqbal’s two steps collapse into one:

It follows from the holistic nature of pleading analysis that there is no conceptual distinction between the two parts of Iqbal’s two-pronged approach. The second prong is all there is to a pleading analysis. It makes no sense first to exclude certain allegations as conclusory on account of their generality and then to subject the remaining allegations to the pleading standard. The reason certain allegations are conclusory is that the complaint, interpreted with them in it, does not meet the pleading standard for the legal element the defective allegations are meant to support. For example, if the key allegations in the Iqbal complaint are conclusory, it is not because of some defect in the allegations themselves, but because the complaint that includes them, when interpreted as a whole, tells a story that does not plausibly support Ashcroft and Mueller having a discriminatory purpose.[49]

Empirically speaking, pleading decisions on the courts of appeals reflect Bone’s understanding of Twombly, not the Iqbal two-step analysis that is taught to law students in Civil Procedure. This more pro-plaintiff approach may be an example of “narrowing from below” by courts of appeals, whereby they circumscribe, in implementation, Supreme Court cases with which they disagree.[50]

C. Policy Distribution of the Claims

The policy distribution data is gathered from the post-Iqbal period. During that period, for all policy areas representing 2% or more of the data, table 1 presents the policy areas of the 1,136 claims underlying motions to dismiss in my random sample of 700 cases, as well as the 2,184 claims underlying motions to dismiss in precedential cases from both the random sample and the oversample combined. The table excludes claims oversampled in the civil rights and discrimination categories to ensure it accurately reflects the courts’ 12(b)(6) docket. Because percentages are rounded, the policy categories do not sum precisely to 100%.

Policy Area	Random (%)	Precedential (%)
All Anti-Discrimination (race, gender, age, etc.)
Employment discrimination	10	6
Education discrimination	–	2
Other discrimination (housing, voting, etc.)	2	2
All Other Civil Rights
Policing	9	11
Public Employment	5	3
Prisoner	2	5
Other	9	12
Non-Civil Rights
Consumer	15	10
Contract	12	8
Labor	7	9
Personal Injury	7	6
Antitrust	2	3
Securities	–	3
Insurance	2	2
Intellectual Property	2	2
Other	14	14

“Other civil rights” claims (excluding discrimination) play an important role in the empirical analysis below, so I provide additional details on them. They make up 25% of the claims in the random sample and 32% of claims in precedential cases. When aggregated at the case level, 25% of cases in the random sample include an “other civil rights” claim, and 28% of published cases do as well. These claims are predominantly constitutional challenges against the government. The three most common types in this category are (1) policing, (2) public employment, and (3) prisoner claims. In the large residual category under other civil rights, the next largest seven areas are (4) judicial or prosecutorial misconduct, (5) education, (6) guns, (7) speech and religion, (8) family relations (primarily constitutional claims to parental rights), (9) voting and elections, and (10) privacy.[51] These ten policy areas account for 89% of claims within the other civil rights category. Seventy-one percent of these claims are Section 1983 damages actions. This represents a large, cross-cutting, and significant segment of the courts of appeals’ 12(b)(6) docket.

D. Panel Effects

The statistical models presented in the next Section focus on the relationship between panel characteristics and claim outcomes, rather than on individual judges’ characteristics and their votes. Since the dissent rate in this data is only 2%, outcomes are very closely aligned with votes. The literature shows that when judges’ party, gender, and race influence votes, their primary explanatory power lies at the panel level.[52] A panel’s composition often explains more variation in a judge’s votes than their own individual characteristics. For instance, in many policy areas, a Democratic appointee votes more liberally when sitting with two other Democratic appointees than when sitting with two Republican appointees. The key point is that judges’ preferences, as indicated by their characteristics, may affect outcomes by shaping the votes of their co-panelists.

The theoretical literature explaining panel effects is grounded in the empirical fact that federal appellate-court panels are overwhelmingly unanimous, even though there is significant variation in case outcomes linked to panel composition.[53] One explanation for this phenomenon is that unanimity results from dissent avoidance by panel-minority judges who disagree with the majority but choose not to dissent due to workload pressures, strong norms against dissent, or isolation associated with dissenting. These factors may suppress dissents even when there is genuine disagreement, allowing the panel-majority view to prevail without influence from the panel minority.[54] By “panel minority,” I refer to a minority position on a panel with divided preferences, regardless of whether the judge belongs to a majority or minority group within the circuit.

Alternatively, the literature suggests that unanimity may result from panel minorities choosing not to dissent because they can influence decisions. Through deliberation and bargaining, panel minorities can shape the preferences and votes of the panel majority.[55] Applied to judges from groups historically underrepresented on the bench (such as women and non-White judges), this view carries more positive normative implications than the idea that they suppress dissents. This hypothesis suggests that the preferences of such judges, when systematically different from those of the majority, can influence the application and development of the law even when they are in the panel minority. Several studies focused on civil rights cases have found that a single woman or non-White judge can influence the votes of male and White judges.[56] A recent study of class certification, however, shows that this is not always the case. It found that the presence of one woman on a panel had no statistically discernible effect on the likelihood of a pro-certification outcome, but having two women on the panel significantly increased the likelihood of a pro-certification decision, showing that the decision process can be more majoritarian in some issue areas.[57]

To implement this panel-effects framework in the statistical models of outcomes below, for each case I measure panel effects using dichotomous variables indicating whether the panel included zero, one, two, or three Republicans; zero, one, two, or three women; and zero, one, two, or three racial minorities. Panels composed entirely of Democrats, men, or White judges serve as the reference categories for the party, gender, and race variables, respectively. This setup allows me to assess, for example, whether panels with one, two, or three Republicans have a statistically different probability of pro-plaintiff outcomes compared to an all-Democratic panel (the reference) and, if so, by how much.

III. Models and Analysis

A. Conclusory Pleading Analyses, Case Outcomes, and Number of Appeals, 2002–2019

I evaluate patterns in the frequency of conclusory pleading issues over time with the random sample of 1,093 federal appellate-court cases spanning 2002–2019. This allows me to assess whether conclusory pleading analysis predated Twombly and, if so, to what extent. Justice Souter expressed the view in Twombly that the conclusory pleading analysis, now widely associated with Twiqbal, was already being practiced in some lower federal courts. He wrote:

On . . . a focused and literal reading of Conley’s “no set of facts,” a wholly conclusory statement of claim would survive a motion to dismiss whenever the pleadings left open the possibility that a plaintiff might later establish some “set of [undisclosed] facts” to support recovery. . . . It seems fair to say that this approach to pleading would dispense with any showing of a “‘reasonably founded hope’” that a plaintiff would be able to make a case, see Dura, 544 U.S., at 347, 125 S.Ct. 1627 (quoting Blue Chip Stamps, 421 U.S., at 741, 95 S.Ct. 1917) . . . . Seeing this, a good many judges and commentators have balked at taking the literal terms of the Conley passage as a pleading standard.[58]

This raises empirical questions. Were conclusory pleading analyses prior to Twombly aberrant, widespread, or something in between? Did their frequency increase after Twombly and Iqbal and, if so, by how much?

In fact, such conclusory pleading analyses were present, though infrequent, in appellate-court pleading decisions during the approximately 6.5 years preceding Twombly. About 6% of the cases in the random sample of appeals for this period addressed a conclusory pleading issue, nearly always using the words “conclusory” or “conclusional” in analyses that sound identical to post-Iqbal conclusory pleading analyses.[59] The share of cases in which the court analyzed a conclusory pleading issue rose by less than a percentage point from Twombly to Iqbal, but then grew to an average of 12% in the decade between Iqbal and 2019. Thus, the post-Iqbal rate was about double the pre-Twombly rate.

Figure 1: Probability of Conclusory Pleading Issue in Cases with Motions to Dismiss for Failure to State a Claim

Figure 1 depicts the estimated probability that a case gave rise to a conclusory pleading issue from 2002 to 2019 (using Lowess regression with .6 bandwidth). Interestingly, the probability declined from 10 to 5% from 2002 to 2005. In 2002, the Supreme Court decided Swierkiewicz v. Sorema N.A., which rejected application of a heightened pleading standard in cases governed by Rule 8(a)’s “short and plain statement” standard; the case was followed by a sharp decline in the appearance of conclusory pleading analyses in appellate court cases.[60] Twombly and Iqbal were associated with a reversal of that trend—a reversal that peaked at a probability of 13% in 2014 and then declined to 9% in 2019. Obviously, in the post-Twiqbal years the patterns may be shaped not just by Twiqbal, but also by lower court interpretations, litigant behavior, and changing caseload composition in response to the new pleading standard.

Figure 2 shows the estimated probability of a plaintiff win in appeals of decisions on 12(b)(6) motions to dismiss for failure to state a claim. Plaintiffs’ likelihood of prevailing declined dramatically from 2002 to 2005, from 27 to 18%. However, the scatter plot of annual averages makes clear that the decline is driven entirely by an unusually high win rate in 2002, and thus I attribute no significance to it. In any event, from the start of 2005 (about a year and a half before Twombly) to 2019, plaintiffs’ win rate was fairly stable, ranging between 15 and 18%. As with the frequency of conclusory pleading issues, in the post-Twiqbal years the patterns may be shaped not just by Twiqbal but also by lower court interpretations and litigant behavior in response to the new pleading standard. Because the quality of cases flowing into the system may have changed pre- and post-Twiqbal in ways that are difficult to predict, a 15% win rate before and after may mask important changes in how plaintiffs are actually faring.

Figure 2: Probability of Pro-Plaintiff Outcome in Motions to Dismiss for Failure to State a Claim

Finally, changes in the frequency of appeals aids in judging whether Twiqbal changed the content of the appellate docket. After Iqbal, the frequency of 12(b)(6) appeals grew steeply. In the sample of 700 cases, there were 71 causes of action subject to appeal in 2010 (the first full year after Iqbal), and the number nearly doubled to 131 causes of action in 2014, after which it roughly stabilized at about 120. This is depicted in figure 3. This dramatic growth did not merely reflect an increasing federal civil appellate docket, which actually declined slightly from 2010–2014.[61] Any comparison of pre- and post-Twiqbal pleading decisions must be mindful that this dramatic post-Twiqbal growth is likely also associated with changes in the nature of the caseload.

Figure 3: Number of Causes of Action Per Year in Random Sample of 12(b)(6) Appeals

B. Conclusory Pleading Analyses and Policy Domains

As discussed in the Introduction, critics of Twiqbal were especially concerned that the newly prominent role of conclusory pleading analyses in pleading doctrine posed a distinctive threat to civil rights cases. It is now possible to assess variation in the frequency that such issues were implicated in appeals across policy areas, because my data contain a measure of appellate-court conclusory pleading analyses under Twiqbal, coded by policy area. Conclusory pleading issues arise in 9% of causes of action: 5% in non-civil rights claims, and 16% in civil rights claims.

I examine a logistic regression model that compares the probability of a conclusory pleading issue arising across three broad domains: civil rights claims alleging discrimination, other civil rights claims, and non-civil rights claims, conditional on the battery of controls discussed above and in the Appendix. The results are presented in table A1, model A, where non-civil rights claims are the reference category. Both types of civil rights claims are statistically significantly more likely to be subject to conclusory pleading analyses than non-civil rights claims, and the effect size is substantial. Because logistic regression coefficients are not directly interpretable, I compute predicted probabilities. Conclusory pleading issues arise with a probability of 7% in non-civil rights claims. The probability increases to 18% in discrimination claims, and to 21% in other civil rights claims (triple the figure for non-civil rights). Simply put, conclusory pleading issues are heavily concentrated in civil rights claims.

To take a more granular look, I break the three broad policy areas into nine separate policy variables containing more than 5% of the random sample listed in table 1. In the area of civil rights, these are employment discrimination, policing, public employment, and all other civil rights claims. In the area of non-civil rights, these are contracts, labor, personal injury, consumer, and all other non-civil rights claims. I leave consumer claims, the most numerous type of non-civil rights claim, out as the reference category. The results are presented in table A1, model B. Conclusory pleading issues arose with a probability of 7% in consumer claims. Among non-civil rights claims, only personal injury claims were statistically significantly more likely to give rise to a conclusory pleading issue, at a probability of 22%. The remaining non-civil rights policy variables (contract, labor, and other non-civil rights claims) were statistically indistinguishable from consumer claims’ 7% probability.

With respect to discrimination claims, the results make clear that the strong association between discrimination claims and conclusory pleading issues is driven by employment discrimination claims. These claims comprise 82% of the discrimination claims in the random sample and have a 33% likelihood of triggering a conclusory pleading analysis. The significance and magnitude of policing claims is also particularly large, with a predicted probability of 40% that a conclusory pleading issue will arise. Civil rights claims asserted by public employees are not statistically distinguishable from consumer claims, while the large residual civil rights category differs significantly, with a 25% predicted probability of implicating a conclusory pleading issue. Examining the data in smaller policy categories makes clear that the conclusory pleading issue is especially large in the two biggest civil rights classifications: policing and employment discrimination. It also indicates that, among non-civil rights policy classifications that are greater than 5% of the data, personal injury is the area in which conclusory pleading arises most frequently.

C. Statistical Models

In this Section, I turn to a series of statistical models that explore multiple dimensions of the conclusory pleading issue. Part III.C.1 evaluates whether and, if so, how the presence of a conclusory pleading issue is associated with plaintiffs’ probability of prevailing on their appeal. Part III.C.2 examines whether the impact of judicial preferences (measured by party, gender, and race) on outcomes in 12(b)(6) appeals is more pronounced in the presence of a conclusory pleading issue, as critics of Twiqbal feared. Part III.C.3 switches the dependent variable from the case outcome to the variable measuring whether a conclusory pleading issue was analyzed. This model explores whether the identity characteristics of the panel are associated with the probability that it will undertake a conclusory pleading analysis in the opinion. Finally, Part III.C.4 analyzes the relationship between judge identity characteristics and outcomes in pleading decisions in civil rights cases before and after Twiqbal with the goal of determining whether pleading decisions in general (not only those focused on conclusory pleading) became more ideological after Twiqbal, as some of its critics feared.

All logistic regression models of pro-plaintiff outcomes reported in the Appendix include the complete set of party, gender, and race panel variables, along with all control variables listed there. I present separate models for: (1) all policy areas combined; (2) discrimination claims; (3) other civil rights claims; and (4) non-civil rights claims. In each model, I pool all the cases—those from the random sample and from the oversamples—for joint analysis and use probability weights to adjust both point estimates and standard errors to make the sample represent the population. This pooling is necessary to allow analyses of the conclusory pleading issue, which is infrequent in the data.

Since below I report the marginal effects of independent variables on the probability that plaintiffs prevail, it is useful to first provide the baseline likelihood at a descriptive bivariate level. Table 2 reports plaintiffs’ win rates in 12(b)(6) appeals involving an individual suing a business or government entity in each of the four policy groupings. In the random sample, the plaintiff is the appellant in 97% of cases but prevails only 15% of the time. Plaintiff victories are significantly more likely in precedential decisions, where the win rate is roughly double. Within the random sample, win rates show little variation across discrimination, other civil rights, and non-civil rights claims. In precedential cases, however, plaintiffs achieve their highest success rate in discrimination claims.

Table 2: Claim-Level Plaintiff Win Rate in 12(b)(6) Appeals, with Individual Suing Business or Government

	Random Sample (%)	Precedential Only (%)
All Cases	15	31
Civil Rights, Discrimination	16	37
Civil Rights, Other	15	30
Non-Civil Rights	14	28

Before turning to the results, it is important to note a key limitation. In many models, the data is insufficient to draw confident conclusions about outcomes in cases decided by panels with a majority of women or non-White judges. This limitation is particularly acute in smaller policy subsets. However, several models, especially those pooling all policy areas, contain enough observations to evaluate such panels, and this will be noted in the course of model interpretation. All models provide a solid basis for inferences about panels with a single woman or a single non-White judge, as these configurations are common: One-woman panels decide 47% of claims in the random sample and 45% in precedential cases, while one-non-White panels decide 43% of random-sample claims and 38% of precedential claims.

1. The Association of Conclusory Pleading Issues with Case Outcomes

To evaluate the association of conclusory pleading issues and case outcomes, in addition to all of the independent variables discussed above and in the Appendix, I add the conclusory pleading variable as an explanatory variable. If the conclusory pleading issue is significantly associated with plaintiff losses, it will be statistically significant and negative.[62] To the contrary, the conclusory pleading issue is significant and positive in three of the four models (table A2, models A, C, & D). In the model pooling across all policy areas, the presence of a conclusory pleading issue is associated with an increase of 6 percentage points, from 17 to 23%, in the probability of a pro-plaintiff outcome. The positive association in other civil rights cases is 5 percentage points (from 18 to 23%), and in non-civil rights cases it is 9 percentage points (from 15 to 24%). The variable is insignificant in the model of discrimination claims (table A2, model B).

This does not mean, of course, that the introduction or elevation of the conclusory pleading issue was not detrimental to plaintiffs. A positive sign on the conclusory pleading variable means that plaintiffs have a higher probability of prevailing in its presence than in its absence. But because Twiqbal added or elevated a ground to dismiss relative to the Conley framework, it disadvantages plaintiffs even if that ground has a lesser probability of leading to dismissal than other grounds, on average. Put differently, it added a ground for dismissal that succeeds less than other grounds, on average, without taking away or mitigating other grounds for dismissal.

These arguably counterintuitive results, relative to conventional expectations, have several possible explanations. One is that defendants pushed the envelope on appeal, advancing weak arguments in the face of great uncertainty on how broadly the lower federal courts would read the new and highly ambiguous plausibility pleading standard, leading them to lose those arguments at a higher rate on appeal.[63] Another is that plaintiffs’ lawyers responded to Twiqbal by elevating the quality of their fact pleading, reducing defendants’ chances of success in motions to dismiss on grounds of conclusory pleading.[64] Yet another is that, as I have argued elsewhere, the conservative coalition that decided Iqbal was clearly to the right of the lower federal courts,[65] pointing to the possibility that many panels interpreted and applied the notoriously indeterminate plausibility pleading standard to limit its anti-plaintiff impact—“narrowing from below.”[66] All three pathways could have contributed to the positive association between the conclusory pleading issue and plaintiff wins. All three have also been cited as possible explanations for the unexpected growth in rates of pro-certification outcomes in class certification appeals after Comcast Corp. v. Behrend and Wal-Mart Stores, Inc. v. Dukes, at the very time that many commentators were pronouncing the death of the class action.[67] Of course, these explanations for the results are highly speculative.

2. Judge Identity and Outcomes of Conclusory Pleading Issues

The models above reveal only how the presence of a conclusory pleading issue is directly related to outcomes, controlling for other variables in the model. This Section examines whether the conclusory pleading issue’s relationship to outcomes is conditional on panel identity characteristics, meaning that identity characteristics have a different relationship to conclusory pleading issues as compared to other 12(b)(6) issues.

Many argued after Iqbal, quite logically, that the nature of the conclusory pleading issue in the plausibility framework widened the scope of judicial discretion in a manner likely to increase subjective, identity-based decision-making.[68] The logic of their critique anticipated that in the subset of civil rights causes of action presenting a conclusory pleading issue, as contrasted with those not presenting such an issue, panels composed entirely of White, male, and Republican judges would be more likely to find plaintiffs’ pleadings to be conclusory and therefore implausible as compared to panels composed of non-White, women, and Democratic judges. For instance, in a discrimination claim by a woman or non-White plaintiff based on allegations of harassment, judges that are White, male, and Republican would be less likely to perceive that the pleaded facts support a reasonable inference of discriminatory intent necessary to make the claim plausible.

Before testing that expectation empirically, I first note the results for the main effects of the panel variables without their interactions with the conclusory pleading variable (table A2). They are an inconsistent patchwork across models, as they were in my work with Professor Burbank.[69] Panels with one non-White judge, and panels with one woman, are again more likely to decide for the plaintiff in other civil rights cases (table A2, model C). As compared to an all-male panel, panels with a single woman increases the probability of a pro-plaintiff ruling from 15 to 23%. As compared to an all-White panel, panels with a single non-White judge increases the probability of a pro-plaintiff ruling from 16 to 23%. This broad and varied civil rights category amounts to 25% of 12(b)(6) appeals in our data, overwhelmingly made up of constitutional claims against governmental actors, commonly arising in such areas as policing, prisons, and public employment. In non-civil rights claims, panels with two non-White judges (of which there are eighty-three) achieve significance and are associated with a 25% probability of a pro-plaintiff outcome, as compared to from 15% for all-White panels.

There are a few differences in the discrimination model (table A2, model B) relative to my earlier work with Professor Burbank. In our previous work we found that all panel composition variables were insignificant in the discrimination model, for which we had limited data when analyzing the random sample and precedential cases separately. Here, where I pool the random sample and precedential oversample, and adjust the point estimates and standard errors with probability weights to reflect the true population, there is a much larger volume of data in the discrimination model. The three-Republican variable becomes significant. An all-Democratic panel decides in favor of discrimination plaintiffs with an estimated probability of 19%, which drops to 7% for an all-Republican panel. All-Democratic panels have nearly three times the likelihood of ruling for the plaintiff in discrimination claims.

Further, in the discrimination model, panels with two non-White judges approach but do not achieve significance, and panels with three are statistically significant, but with a smaller number of such panels. I thus examined an identical specification of the discrimination models, with the two and three non-White panels combined into a single non-White majority variable. The results for that non-White majority variable in the discrimination model are statistically significant (p=.03), with forty-six such panels in the data, still not an ample number but strongly suggestive nevertheless. An all-White panel in discrimination claims has an estimated 16% probability to decide in favor of the plaintiff, which grows to 27% with a non-White majority.[70]

I now turn to the question of whether these results on judicial identity are amplified when a conclusory pleading issue is being decided—that is, whether conclusory pleading issues have a distinctive relationship to judge identity, different from other pleading issues. To evaluate this, I interacted each of the judge identity characteristic variables with the conclusory pleading variable, isolating the causes of action in which more Democratic, women, or non-White panels were deciding cases with a conclusory pleading issue. For example, a statistically significant and positive interaction between the conclusory pleading variable and the variable measuring the presence of two Democrats on a panel would indicate that when addressing a conclusory pleading issue, such panels are more likely to rule for the plaintiff as compared to cases that lack a conclusory pleading issue.

In all four models—all policy areas, discrimination, other civil rights, and non-civil rights—all interactions with respect to all judge identity characteristics were insignificant (table A3, models A-D). The presence of more Democratic, women, or non-White judges on a panel never has a significantly different association with outcomes in the presence of a conclusory pleading issue, as compared to claims without one. This does not mean that such panel types decide pleading issues indistinguishably from all White, male, and Republican panels. As just discussed in this Section, and in my prior work with Professor Burbank, some such differences do exist, including in some civil rights models, where there is evidence that non-White, women, and Democratic judges are more likely to rule for the plaintiff. Rather, the insignificance of the interactions between judicial identity and the conclusory pleading issue indicates that, whatever those patterns may be, the influence of judge identity on outcomes is not greater (or lesser) when a conclusory pleading issue is presented.[71]

These results seem inconsistent with the expectations of many, including me. Those expectations were that conclusory pleading issues in the plausibility framework would be distinctly susceptible to influence by judges’ subjective preferences because they entail both elusive distinctions between fact and conclusion as well as judgments of “plausibility” based on judicial wisdom and common sense. Potential explanations for the null results are numerous. The first and most obvious is that the judicial behavior assumptions underpinning the hypothesis being tested are simply wrong: White, male, and Republican judges are not more likely to deem pleadings conclusory in claims by individuals against business or government, including civil rights claims. Further, the reasons noted above for the positive association between the conclusory pleading issue and plaintiff wins have the potential to dampen the role of ideology and identity in resolving conclusory pleading issues. If, in fact, defendants have responded to Twiqbal by pushing weaker arguments, plaintiffs have elevated the quality of their pleading, and appellate panels have been inclined to “narrow from below” when applying Twiqbal, such factors may have diluted any distinctively ideological features of conclusory pleading analysis in the plausibility framework.

3. Issue Steering?

In the series of post-Iqbal statistical analyses above, I treated the conclusory pleading variable as a case feature—determined by the parties’ arguments and the trial court’s reasoning—whose appearance in the case is exogenous to the party, gender, and racial composition of the panel. If panels of certain party, gender, or racial composition are more likely to steer analyses toward conclusory pleading issues, this would complicate interpretation of the results above. For example, if certain types of judges are more likely than others to engineer the appearance of conclusory pleading issues into panel opinions, they may do so strategically in light of the anticipated outcome, undermining the ability to test whether the impact of judges’ identity on outcomes is amplified by the conclusory pleading issue. The case outcome could be the cause, and the conclusory pleading issue its effect.

To evaluate this possibility, I ran the full regression specifications described above (and in the Appendix), but I substituted the conclusory pleading variable for case outcome as the dependent variable. The results are reported in table A4 of the Appendix. All of the panel composition variables were insignificant. The results indicate that no panel type was more likely than another to decide cases coded as containing a conclusory pleading issue, which is consistent with the notion that the presence of a conclusory pleading issue is a substantially exogenous case feature.

4. Other Civil Rights Cases Before and After Twiqbal

In my earlier work with Professor Burbank, which evaluated the relationship between judge identity characteristics and case outcomes in pleading decisions on appeal, we found that judge identity characteristics were sporadically associated with outcomes. The main area in which judge identity mattered in the random sample cases (distinct from the precedential cases) was among other civil rights cases. This category comprised all civil rights claims not alleging discrimination, which were overwhelmingly constitutional claims against public officials. This broad category of civil rights claims constituted 25% of the federal appellate docket of motions to dismiss for failure to state a claim, and it is described in detail above.

In this important and large set of cases, we found that panels with one woman judge and panels with one non-White judge were significantly more likely to allow plaintiffs to proceed to discovery than all-male and all-White panels, respectively. These results are consistent with the notion that the post-Twiqbal plausibility pleading standard would widen judicial discretion in disposing of motions to dismiss, increasing the space for identity characteristics to influence outcomes. However, while consistent with that notion, the results are also consistent with a scenario in which the same voting patterns existed prior to Twiqbal and merely continued unaffected after it.

Investigating this question required performing the same analysis of the relationship between panel identity characteristics and 12(b)(6) outcomes in other civil rights cases decided before the plausibility pleading standard introduced in Twombly. I then compared the results to the post-Iqbal results. I collected and coded all “other civil rights” claims from 2002 until the Twombly decision in 2007. We coded each civil rights cause of action for all the variables used in the post-Iqbal models described above, including the presence of the conclusory pleading issue. I applied the same statistical models to the pre-Twombly other civil rights cases as I applied to the post-Iqbal cases. There were 367 such cases containing 608 causes of action. The results are reported in table A5 in the Appendix.

All of the party, race, and gender panel composition variables were insignificant prior to Twombly. Thus, there was a marked change in the cases decided before Twombly, under the Conley “any set of facts” standard, and those decided after Iqbal under the “plausibility pleading” standard. Judicial identity did not play a role before Twombly but did after Iqbal. This is consistent with concerns of many of Twiqbal’s critics that the conclusory pleading issue in the plausibility framework would enlarge the role of judges’ subjective preferences in determining the factual sufficiency of a plaintiff’s claim in civil rights cases. However, it is difficult to conclude that this mechanism is at play. The empirical analysis above of the interactions between the conclusory pleading variables and the panel composition variables, including in other civil rights cases,[72] failed to find support for the view that conclusory pleading issues are distinctively associated with judicial subjectivity.

It is certainly possible that other aspects of “plausibility pleading”—distinct from the conclusory pleading issue—explain the change. As noted above, some scholars maintain that Twiqbal wrought changes in the 12(b)(6) standard with implications for all merits decisions, including pure legal sufficiency issues.[73] If this is true, plausibility pleading may explain the growth in the influence of judges’ identity on outcomes in other civil rights cases after Iqbal, notwithstanding the result that conclusory pleading issues are not associated with different outcome voting behavior for any panel type. However, there are clearly other rival or supplemental mechanisms that I cannot rule out.

Changes in the nature of the issues presented on appeal after Iqbal may have stimulated more identity-based voting by judges. Twiqbal likely triggered changes in plaintiffs’, defendants’, and trial courts’ decision-making, which resulted in changes in the composition of the appellate 12(b)(6) caseload. As already noted, that caseload grew dramatically after Iqbal, highlighting the possibility that certain qualities of the cases, in addition to their quantity, changed.[74] It also may be that the enormous attention paid to Twiqbal in the legal community, the political controversy surrounding it, and the widespread view that it was an aggressively anti-plaintiff product of the conservative coalition on an ideologically divided Supreme Court, stimulated new racial and gender divisions on panels addressing pleading issues in civil rights cases.

Thus, the data is consistent with, but does not establish, the theory that plausibility pleading drove the emergence of judges’ race and gender as significant explanatory variables after Iqbal. Plausibility pleading is one candidate explanation among others, but the data does not allow me to adjudicate among competing causal hypotheses. Nevertheless, the fact remains that material racial and gender divisions emerged among judges deciding 12(b)(6) appeals in civil rights cases following Iqbal.

Finally, in the model of pre-Twiqbal other civil rights cases, the conclusory pleading variable was statistically significant and negative, showing that plaintiffs were more likely to lose by a substantial margin when such issues arose in the pre-Twombly period. In the presence of a conclusory pleading issue, a plaintiff’s likelihood of prevailing fell by 21 percentage points. This negative relationship is the opposite of that found in three of the four post-Iqbal models, where the presence of a conclusory pleading issue was associated with a higher probability of plaintiff success. A possible explanation for this negative association is that, prior to Twombly, conclusory pleading analyses appeared to contradict a “literal” reading of Conley, as Justice Souter put it. The cases were infrequent, and thus judges may have reserved the legally suspect conclusory pleading analyses for the weakest cases from the standpoint of factual sufficiency, leading naturally to a strong association with plaintiff losses.

Conclusion

Conclusory pleading analyses, present but rare before Twombly, roughly doubled in frequency after Iqbal. Even then, they arose in only 12% of cases, and 9% of causes of action; 91% of causes of action reviewed did not implicate a conclusory pleading issue. If the appellate docket reflects the relative infrequency of the issue in trial court adjudications post-Iqbal, this may help explain the limited effects that scholars have detected in Twiqbal’s impact relative to the expectations of critics. This is not to say that the introduction or elevation of the conclusory pleading issue lacked significant consequences. Twelve percent of cases and 9% of causes of action is a significant segment of the 12(b)(6) appellate caseload.

Moreover, it is important to stress that the numbers above understate the presence of conclusory pleading issues in civil rights claims, precisely as critics feared. Models show that conclusory pleading issues arise with a predicted probability of 7% in non-civil rights claims, 18% in discrimination claims, and 21% in other civil rights claims. Further, the statistical models show that they are especially likely to arise in two very heavily litigated areas of civil rights, with a predicted probability of 33% in employment discrimination claims and 40% in police abuse claims.

However, unexpectedly, the presence of conclusory pleading issues is significantly and positively correlated with plaintiff wins on appeal. At first glance, these results seem contrary to the expectations of many, but there are several potential (and admittedly speculative) explanations. It is possible that in the aftermath of Iqbal, defendants overreached and pushed weak conclusory pleading arguments, plaintiffs responded to Iqbal with more careful pleading, and many lower federal courts sought to “narrow [Twiqbal] from below.”[75] These forces could explain why conclusory pleading issues are positively associated with plaintiff wins.

Conclusory pleading issues in the context of plausibility analyses were not associated with greater impact of judicial ideology or identity on outcomes, as some critics anticipated, due to the elusiveness of distinguishing fact from conclusion that is embedded within the plausibility framework. The same potential reasons discussed for the positive association between conclusory pleading and plaintiff wins post-Iqbal may have also dampened identity-based voting.

In “other civil rights” cases, comprising a quarter of 12(b)(6) appeals, panels with women and non-White judges were more likely to rule for the plaintiff after Iqbal, whereas all judge identity characteristics were insignificant prior to Twombly. Although conclusory pleading issues themselves do not appear to be the pathway to an enlarged role for judicial subjectivity, judge identity characteristics took on a larger role in determining outcomes post-Iqbal in a broad array of civil rights claims. It may be that the increased impact of judicial identity was activated by other aspects of the doctrinal changes associated with Twiqbal’s plausibility pleading innovation, or by changes in the composition of the appellate caseload that were triggered by changes in litigant behavior, or by the impact on judges of the increasing politicization of the pleading debate. We know that the change did occur, but we cannot pin down why.

In practice, courts of appeals diverge sharply from the canonical two-step model of plausibility pleading. That model envisions courts first winnowing out conclusory allegations on a fact-by-fact basis, then assessing whether the remaining well-pleaded facts state a plausible claim. Courts of appeals, however, almost never engage in this two-step exercise. They instead collapse the two steps into one and assess plausibility holistically—either assessing whether the complaint is generally too conclusory or evaluating key assertions in the context of the entire pleading without formally rejecting other facts as conclusory.

Appendix

A. Claim-Level Model Specifications

The unit of analysis in our primary models, discussed in the body of the Article, is the claim. We ran logit models with standard errors clustered on the case because multiple claims within the same case are not independent of one another. All of the statistical models reported below include the following control variables. These are independent variables that existing empirical literature suggests may be associated with case outcomes on the courts of appeals.[76] As a result, the variables must be accounted for in the statistical model to effectively estimate the contributions, if any, of the key independent variables (judge identity characteristics and conclusory pleading) to case outcome.

· Trial court outcome: Indicator variable reflecting whether the trial court granted or denied the motion to dismiss on the claim that is under consideration by the court of appeals.

· Trial judge sitting by designation: Indicator variable recording whether there was a trial judge sitting by designation on the panel.

· Defendant type: Non-mutually exclusive indicator variables measuring whether there was a federal defendant, state defendant, business defendant, or other type of defendant.

· Law type: Mutually exclusive indicator variables measuring whether the claim was under federal law, state law, or both.

· Policy area: Mutually exclusive indicator variables reflecting policy areas comprising 2% or more of the data. Policy areas comprising less than 2% of the data were aggregated into an “other” policy category.

· Conclusory: Indicator variable measuring whether the court analyzed a conclusory pleading issue.

· Circuit-fixed effects: Dichotomous variables for each circuit accounting for any time-varying covariates that take the same value for each judge on a panel within the circuit.

· Year-fixed effects: Dichotomous variables for each year accounting for any time-varying covariates that take the same value for each judge on a panel within the year.

Circuit-fixed effects account for any variables that change across circuits and that would take the same value for each judge on a panel within that circuit. Examples include circuit doctrines that may have a pro- or anti-dismissal slant and variations in the size and content of caseloads across circuits. Year-fixed effects account for any variables that change over time and that would take the same value for each judge on a panel within that year. Examples include national trends in caseload, the evolution of Supreme Court doctrine, the changing composition of the Supreme Court, changes in the Federal Rules of Civil Procedure, and salient features of the partisan or political environment, such as an anti-litigation posture in a party agenda. Year-fixed effects also account for trends over time in attitudes among male and White judges toward co-panelists who are women and non-White. These trends in attitudes may affect the extent to which the latter panelists influence the former. The circuit- and year-fixed effects approach leverages only variation in the relationship between panel characteristics and outcomes within a circuit and year. This approach allows us to estimate the effects of panel characteristics most effectively because it controls for the influence of any variables that would take the same value for each panel in the same circuit and each panel in the same year.

B. Tables

Table A1: Logistic Regression Predicting Presence of Conclusory Pleading Issued by Policy Area, 2009–2019

Note: Entries are logit coefficients with standard errors in parentheses. Standard errors are clustered on case. Model A reference category is non-civil rights claims. Model B reference category is consumer claims. All models include circuit-fixed effects, year-fixed effects, and independent variables measuring policy area, direction of the trial court outcome, trial judge sitting by designation, defendant type (federal government, state government, business, other), law type (federal law, state law, both), and whether the case was published. The models contain only claims in which an individual (or class of them) sues a business or government defendant.