measure critical thinking skills

Why Schools Need to Change Yes, We Can Define, Teach, and Assess Critical Thinking Skills

Jeff Heyck-Williams (He, His, Him) Director of the Two Rivers Learning Institute in Washington, DC

Today’s learners face an uncertain present and a rapidly changing future that demand far different skills and knowledge than were needed in the 20th century. We also know so much more about enabling deep, powerful learning than we ever did before. Our collective future depends on how well young people prepare for the challenges and opportunities of 21st-century life.

Critical thinking is a thing. We can define it; we can teach it; and we can assess it.

While the idea of teaching critical thinking has been bandied around in education circles since at least the time of John Dewey, it has taken greater prominence in the education debates with the advent of the term “21st century skills” and discussions of deeper learning. There is increasing agreement among education reformers that critical thinking is an essential ingredient for long-term success for all of our students.

However, there are still those in the education establishment and in the media who argue that critical thinking isn’t really a thing, or that these skills aren’t well defined and, even if they could be defined, they can’t be taught or assessed.

To those naysayers, I have to disagree. Critical thinking is a thing. We can define it; we can teach it; and we can assess it. In fact, as part of a multi-year Assessment for Learning Project , Two Rivers Public Charter School in Washington, D.C., has done just that.

Before I dive into what we have done, I want to acknowledge that some of the criticism has merit.

First, there are those that argue that critical thinking can only exist when students have a vast fund of knowledge. Meaning that a student cannot think critically if they don’t have something substantive about which to think. I agree. Students do need a robust foundation of core content knowledge to effectively think critically. Schools still have a responsibility for building students’ content knowledge.

However, I would argue that students don’t need to wait to think critically until after they have mastered some arbitrary amount of knowledge. They can start building critical thinking skills when they walk in the door. All students come to school with experience and knowledge which they can immediately think critically about. In fact, some of the thinking that they learn to do helps augment and solidify the discipline-specific academic knowledge that they are learning.

The second criticism is that critical thinking skills are always highly contextual. In this argument, the critics make the point that the types of thinking that students do in history is categorically different from the types of thinking students do in science or math. Thus, the idea of teaching broadly defined, content-neutral critical thinking skills is impossible. I agree that there are domain-specific thinking skills that students should learn in each discipline. However, I also believe that there are several generalizable skills that elementary school students can learn that have broad applicability to their academic and social lives. That is what we have done at Two Rivers.

Defining Critical Thinking Skills

We began this work by first defining what we mean by critical thinking. After a review of the literature and looking at the practice at other schools, we identified five constructs that encompass a set of broadly applicable skills: schema development and activation; effective reasoning; creativity and innovation; problem solving; and decision making.

We then created rubrics to provide a concrete vision of what each of these constructs look like in practice. Working with the Stanford Center for Assessment, Learning and Equity (SCALE) , we refined these rubrics to capture clear and discrete skills.

For example, we defined effective reasoning as the skill of creating an evidence-based claim: students need to construct a claim, identify relevant support, link their support to their claim, and identify possible questions or counter claims. Rubrics provide an explicit vision of the skill of effective reasoning for students and teachers. By breaking the rubrics down for different grade bands, we have been able not only to describe what reasoning is but also to delineate how the skills develop in students from preschool through 8th grade.

Before moving on, I want to freely acknowledge that in narrowly defining reasoning as the construction of evidence-based claims we have disregarded some elements of reasoning that students can and should learn. For example, the difference between constructing claims through deductive versus inductive means is not highlighted in our definition. However, by privileging a definition that has broad applicability across disciplines, we are able to gain traction in developing the roots of critical thinking. In this case, to formulate well-supported claims or arguments.

Teaching Critical Thinking Skills

The definitions of critical thinking constructs were only useful to us in as much as they translated into practical skills that teachers could teach and students could learn and use. Consequently, we have found that to teach a set of cognitive skills, we needed thinking routines that defined the regular application of these critical thinking and problem-solving skills across domains. Building on Harvard’s Project Zero Visible Thinking work, we have named routines aligned with each of our constructs.

For example, with the construct of effective reasoning, we aligned the Claim-Support-Question thinking routine to our rubric. Teachers then were able to teach students that whenever they were making an argument, the norm in the class was to use the routine in constructing their claim and support. The flexibility of the routine has allowed us to apply it from preschool through 8th grade and across disciplines from science to economics and from math to literacy.

Kathryn Mancino, a 5th grade teacher at Two Rivers, has deliberately taught three of our thinking routines to students using the anchor charts above. Her charts name the components of each routine and has a place for students to record when they’ve used it and what they have figured out about the routine. By using this structure with a chart that can be added to throughout the year, students see the routines as broadly applicable across disciplines and are able to refine their application over time.

Assessing Critical Thinking Skills

By defining specific constructs of critical thinking and building thinking routines that support their implementation in classrooms, we have operated under the assumption that students are developing skills that they will be able to transfer to other settings. However, we recognized both the importance and the challenge of gathering reliable data to confirm this.

With this in mind, we have developed a series of short performance tasks around novel discipline-neutral contexts in which students can apply the constructs of thinking. Through these tasks, we have been able to provide an opportunity for students to demonstrate their ability to transfer the types of thinking beyond the original classroom setting. Once again, we have worked with SCALE to define tasks where students easily access the content but where the cognitive lift requires them to demonstrate their thinking abilities.

These assessments demonstrate that it is possible to capture meaningful data on students’ critical thinking abilities. They are not intended to be high stakes accountability measures. Instead, they are designed to give students, teachers, and school leaders discrete formative data on hard to measure skills.

While it is clearly difficult, and we have not solved all of the challenges to scaling assessments of critical thinking, we can define, teach, and assess these skills . In fact, knowing how important they are for the economy of the future and our democracy, it is essential that we do.

Jeff Heyck-Williams (He, His, Him)

Director of the two rivers learning institute.

Jeff Heyck-Williams is the director of the Two Rivers Learning Institute and a founder of Two Rivers Public Charter School. He has led work around creating school-wide cultures of mathematics, developing assessments of critical thinking and problem-solving, and supporting project-based learning.

Nurturing STEM Identity and Belonging: The Role of Equitable Program Implementation in Project Invent

Alexis Lopez (she/her)

May 9, 2024

Bring Your Vision for Student Success to Life with NGLC and Bravely

March 13, 2024

For Ethical AI, Listen to Teachers

Jason Wilmot

October 23, 2023

A Model for the National Assessment of Higher Order Thinking
International Critical Thinking Essay Test
Online Critical Thinking Basic Concepts Test
Online Critical Thinking Basic Concepts Sample Test

Consequential Validity: Using Assessment to Drive Instruction

Translate this page from English...

*Machine translated pages not guaranteed for accuracy. Click Here for our professional translations.

Critical Thinking Testing and Assessment

The purpose of assessment in instruction is improvement. The purpose of assessing instruction for critical thinking is improving the teaching of discipline-based thinking (historical, biological, sociological, mathematical, etc.) It is to improve students’ abilities to think their way through content using disciplined skill in reasoning. The more particular we can be about what we want students to learn about critical thinking, the better we can devise instruction with that particular end in view.

The Foundation for Critical Thinking offers assessment instruments which share in the same general goal: to enable educators to gather evidence relevant to determining the extent to which instruction is teaching students to think critically (in the process of learning content). To this end, the Fellows of the Foundation recommend:

that academic institutions and units establish an oversight committee for critical thinking, and

that this oversight committee utilizes a combination of assessment instruments (the more the better) to generate incentives for faculty, by providing them with as much evidence as feasible of the actual state of instruction for critical thinking.

The following instruments are available to generate evidence relevant to critical thinking teaching and learning:

Course Evaluation Form : Provides evidence of whether, and to what extent, students perceive faculty as fostering critical thinking in instruction (course by course). Machine-scoreable.

Online Critical Thinking Basic Concepts Test : Provides evidence of whether, and to what extent, students understand the fundamental concepts embedded in critical thinking (and hence tests student readiness to think critically). Machine-scoreable.

Critical Thinking Reading and Writing Test : Provides evidence of whether, and to what extent, students can read closely and write substantively (and hence tests students' abilities to read and write critically). Short-answer.

International Critical Thinking Essay Test : Provides evidence of whether, and to what extent, students are able to analyze and assess excerpts from textbooks or professional writing. Short-answer.

Commission Study Protocol for Interviewing Faculty Regarding Critical Thinking : Provides evidence of whether, and to what extent, critical thinking is being taught at a college or university. Can be adapted for high school. Based on the California Commission Study . Short-answer.

Protocol for Interviewing Faculty Regarding Critical Thinking : Provides evidence of whether, and to what extent, critical thinking is being taught at a college or university. Can be adapted for high school. Short-answer.

Protocol for Interviewing Students Regarding Critical Thinking : Provides evidence of whether, and to what extent, students are learning to think critically at a college or university. Can be adapted for high school). Short-answer.

Criteria for Critical Thinking Assignments : Can be used by faculty in designing classroom assignments, or by administrators in assessing the extent to which faculty are fostering critical thinking.

Rubrics for Assessing Student Reasoning Abilities : A useful tool in assessing the extent to which students are reasoning well through course content.

All of the above assessment instruments can be used as part of pre- and post-assessment strategies to gauge development over various time periods.

Consequential Validity

All of the above assessment instruments, when used appropriately and graded accurately, should lead to a high degree of consequential validity. In other words, the use of the instruments should cause teachers to teach in such a way as to foster critical thinking in their various subjects. In this light, for students to perform well on the various instruments, teachers will need to design instruction so that students can perform well on them. Students cannot become skilled in critical thinking without learning (first) the concepts and principles that underlie critical thinking and (second) applying them in a variety of forms of thinking: historical thinking, sociological thinking, biological thinking, etc. Students cannot become skilled in analyzing and assessing reasoning without practicing it. However, when they have routine practice in paraphrasing, summarizing, analyzing, and assessing, they will develop skills of mind requisite to the art of thinking well within any subject or discipline, not to mention thinking well within the various domains of human life.

For full copies of this and many other critical thinking articles, books, videos, and more, join us at the Center for Critical Thinking Community Online - the world's leading online community dedicated to critical thinking! Also featuring interactive learning activities, study groups, and even a social media component, this learning platform will change your conception of intellectual development.

Critical Thinking About Measuring Critical Thinking

A list of critical thinking measures..

Posted May 18, 2018

What Is Cognition?
Find a therapist near me

In my last post , I discussed the nature of engaging the critical thinking (CT) process and made mention of individuals who draw a conclusion and wind up being correct. But, just because they’re right, it doesn’t mean they used CT to get there. I exemplified this through an observation made in recent years regarding extant measures of CT, many of which assess CT via multiple-choice questions. In the case of CT MCQs, you can guess the "right" answer 20-25% of the time, without any need for CT. So, the question is, are these CT measures really measuring CT?

As my previous articles explain, CT is a metacognitive process consisting of a number of sub-skills and dispositions, that, when applied through purposeful, self-regulatory, reflective judgment, increase the chances of producing a logical solution to a problem or a valid conclusion to an argument (Dwyer, 2017; Dwyer, Hogan & Stewart, 2014). Most definitions, though worded differently, tend to agree with this perspective – it consists of certain dispositions, specific skills and a reflective sensibility that governs application of these skills. That’s how it’s defined; however, it’s not necessarily how it’s been operationally defined.

Operationally defining something refers to defining the terms of the process or measure required to determine the nature and properties of a phenomenon. Simply, it is defining the concept with respect to how it can be done, assessed or measured. If the manner in which you measure something does not match, or assess the parameters set out in the way in which you define it, then you have not been successful in operationally defining it.

Though most theoretical definitions of CT are similar, the manner in which they vary often impedes the construction of an integrated theoretical account of how best to measure CT skills. As a result, researchers and educators must consider the wide array of CT measures available, in order to identify the best and the most appropriate measures, based on the CT conceptualisation used for training. There are various extant CT measures – the most popular amongst them include the Watson-Glaser Critical Thinking Assessment (WGCTA; Watson & Glaser, 1980), the Cornell Critical Thinking Test (CCTT; Ennis, Millman & Tomko, 1985), the California Critical Thinking Skills Test (CCTST; Facione, 1990a), the Ennis-Weir Critical Thinking Essay Test (EWCTET; Ennis & Weir, 1985) and the Halpern Critical Thinking Assessment (Halpern, 2010).

It has been noted by some commentators that these different measures of CT ability may not be directly comparable (Abrami et al., 2008). For example, the WGCTA consists of 80 MCQs that measure the ability to draw inferences; recognise assumptions; evaluate arguments; and use logical interpretation and deductive reasoning (Watson & Glaser, 1980). The CCTT consists of 52 MCQs which measure skills of critical thinking associated with induction; deduction; observation and credibility; definition and assumption identification; and meaning and fallacies. Finally, the CCTST consists of 34 multiple-choice questions (MCQs) and measures CT according to the core skills of analysis, evaluation and inference, as well as inductive and deductive reasoning.

As addressed above, the MCQ-format of these three assessments is less than ideal – problematic even, because it allows test-takers to simply guess when they do not know the correct answer, instead of demonstrating their ability to critically analyse and evaluate problems and infer solutions to those problems (Ku, 2009). Furthermore, as argued by Halpern (2003), the MCQ format makes the assessment a test of verbal and quantitative knowledge rather than CT (i.e. because one selects from a list of possible answers rather than determining one’s own criteria for developing an answer). The measurement of CT through MCQs is also problematic given the potential incompatibility between the conceptualisation of CT that shapes test construction and its assessment using MCQs. That is, MCQ tests assess cognitive capacities associated with identifying single right-or-wrong answers and as a result, this approach to testing is unable to provide a direct measure of test-takers’ use of metacognitive processes such as CT, reflective judgment, and disposition towards CT.

Instead of using MCQ items, a better measure of CT might ask open-ended questions, which would allow test-takers to demonstrate whether or not they spontaneously use a specific CT skill. One commonly used CT assessment, mentioned above, that employs an open-ended format is the Ennis-Weir Critical Thinking Essay Test (EWCTET; Ennis & Weir, 1985). The EWCTET is an essay-based assessment of the test-taker’s ability to analyse, evaluate, and respond to arguments and debates in real-world situations (Ennis & Weir, 1985; see Ku, 2009 for a discussion). The authors of the EWCTET provide what they call a “rough, somewhat overlapping list of areas of critical thinking competence”, measured by their test (Ennis & Weir, 1985, p. 1). However, this test, too, has been criticised – for its domain-specific nature (Taube, 1997), the subjectivity of its scoring protocol and its bias in favour of those proficient in writing (Adams, Whitlow, Stover & Johnson, 1996).

Another, more recent CT assessment that utilises an open-ended format is the Halpern Critical Thinking Assessment (HCTA; Halpern, 2010). The HCTA consists of 25 open-ended questions based on believable, everyday situations, followed by 25 specific questions that probe for the reasoning behind each answer. The multi-part nature of the questions makes it possible to assess the ability to use specific CT skills when the prompt is provided (Ku, 2009). The HCTA’s scoring protocol also provides comprehensible, unambiguous instructions for how to evaluate responses by breaking them down into clear, measurable components. Questions on the HCTA represent five categories of CT application: hypothesis testing (e.g. understanding the limits of correlational reasoning and how to know when causal claims cannot be made), verbal reasoning (e.g. recognising the use of pervasive or misleading language), argumentation (e.g. recognising the structure of arguments, how to examine the credibility of a source and how to judge one’s own arguments), judging likelihood and uncertainty (e.g. applying relevant principles of probability, how to avoid overconfidence in certain situations) and problem-solving (e.g. identifying the problem goal, generating and selecting solutions among alternatives).

Up until the development of the HCTA, I would have recommended the CCTST for measuring CT, despite its limitations. What’s nice about the CCTST is that it assesses the three core skills of CT: analysis, evaluation, and inference, which other scales do not (explicitly). So, if you were interested in assessing students’ sub-skill ability, this would be helpful. However, as we know, though CT skill performance is a sequence, it is also a collation of these skills – meaning that for any given problem or topic, each skill is necessary. By administrating an analysis problem, an evaluation problem and an inference problem, in which the student scores top marks for all three, it doesn’t guarantee that the student will apply these three to a broader problem that requires all three. That is, these questions don’t measure CT skill ability per se, rather analysis skill, evaluation skill and inference skill in isolation. Simply, scores may predict CT skill performance, but they don’t measure it.

What may be a better indicator of CT performance is assessment of CT application . As addressed above, there are five general applications of CT: hypothesis testing, verbal reasoning, argumentation, problem-solving and judging likelihood and uncertainty – all of which require a collation of analysis, evaluation, and inference. Though the sub-skills of analysis, evaluation, and inference are not directly measured in this case, their collation is measured through five distinct applications; and, as I see it, provides a 'truer' assessment of CT. In addition to assessing CT via an open-ended, short-answer format, the HCTA measures CT according to the five applications of CT; thus, I recommend its use for measuring CT.

However, that’s not to say that the HCTA is perfect. Though it consists of 25 open-ended questions, followed by 25 specific questions that probe for the reasoning behind each answer, when I first used it to assess a sample of students, I found that in setting up my data file, there were actually 165 opportunities for scoring across the test. Past research recommends that the assessment takes roughly between 45 and 60 minutes to complete. However, many of my participants reported it requiring closer to two hours (sometimes longer). It’s a long assessment – thorough, but long. Fortunately, adapted, shortened versions are now available, and it’s an adapted version that I currently administrate to assess CT. Another limitation is that, despite the rationale above, it would be nice to have some indication of how participants get on with the sub-skills of analysis, evaluation, and inference, as I do think there’s a potential predictive element in the relationship among the individual skills and the applications. With that, I suppose it is feasible to administer both the HCTA and CCTST to assess such hypotheses.

Though it’s obviously important to consider how assessments actually measure CT and the nature in which each is limited, the broader, macro-problem still requires thought. Just as conceptualisations of CT vary, so too does the reliability and validity of the different CT measures, which has led Abrami and colleagues (2008, p. 1104) to ask: “How will we know if one intervention is more beneficial than another if we are uncertain about the validity and reliability of the outcome measures?” Abrami and colleagues add that, even when researchers explicitly declare that they are assessing CT, there still remains the major challenge of ensuring that measured outcomes are related, in some meaningful way, to the conceptualisation and operational definition of CT that informed the teaching practice in cases of interventional research. Often, the relationship between the concepts of CT that are taught and those that are assessed is unclear, and a large majority of studies in this area include no theory to help elucidate these relationships.

In conclusion, solving the problem of consistency across CT conceptualisation, training, and measure is no easy task. I think recent advancements in CT scale development (e.g. the development of the HCTA and its adapted versions) have eased the problem, given that they now bridge the gap between current theory and practical assessment. However, such advances need to be made clearer to interested populations. As always, I’m very interested in hearing from any readers who may have any insight or suggestions!

Abrami, P. C., Bernard, R. M., Borokhovski, E., Wade, A., Surkes, M. A., Tamim, R., & Zhang, D. (2008). Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 78(4), 1102–1134.

Adams, M.H., Whitlow, J.F., Stover, L.M., & Johnson, K.W. (1996). Critical thinking as an educational outcome: An evaluation of current tools of measurement. Nurse Educator, 21, 23–32.

Dwyer, C.P. (2017). Critical thinking: Conceptual perspectives and practical guidelines. Cambridge, UK: Cambridge University Press.

Dwyer, C.P., Hogan, M.J. & Stewart, I. (2014). An integrated critical thinking framework for the 21st century. Thinking Skills & Creativity, 12, 43-52.

Ennis, R.H., Millman, J., & Tomko, T.N. (1985). Cornell critical thinking tests. CA: Critical Thinking Co.

Ennis, R.H., & Weir, E. (1985). The Ennis-Weir critical thinking essay test. Pacific Grove, CA: Midwest Publications.

Facione, P. A. (1990a). The California critical thinking skills test (CCTST): Forms A and B;The CCTST test manual. Millbrae, CA: California Academic Press.

Facione, P.A. (1990b). The Delphi report: Committee on pre-college philosophy. Millbrae, CA: California Academic Press.

Halpern, D. F. (2003b). The “how” and “why” of critical thinking assessment. In D. Fasko (Ed.), Critical thinking and reasoning: Current research, theory and practice. Cresskill, NJ: Hampton Press.

Halpern, D.F. (2010). The Halpern critical thinking assessment: Manual. Vienna: Schuhfried.

Ku, K.Y.L. (2009). Assessing students’ critical thinking performance: Urging for measurements using multi-response format. Thinking Skills and Creativity, 4, 1, 70- 76.

Taube, K.T. (1997). Critical thinking ability and disposition as factors of performance on a written critical thinking test. Journal of General Education, 46, 129-164.

Watson, G., & Glaser, E.M. (1980). Watson-Glaser critical thinking appraisal. New York: Psychological Corporation.

Christopher Dwyer, Ph.D., is a lecturer at the Technological University of the Shannon in Athlone, Ireland.

Find a Therapist
Find a Treatment Center
Find a Psychiatrist
Find a Support Group
Find Online Therapy
United States
Brooklyn, NY
Chicago, IL
Houston, TX
Los Angeles, CA
New York, NY
Portland, OR
San Diego, CA
San Francisco, CA
Seattle, WA
Washington, DC
Asperger's
Bipolar Disorder
Chronic Pain
Eating Disorders
Passive Aggression
Personality
Goal Setting
Positive Psychology
Stopping Smoking
Low Sexual Desire
Relationships
Child Development
Self Tests NEW
Therapy Center
Diagnosis Dictionary
Types of Therapy

At any moment, someone’s aggravating behavior or our own bad luck can set us off on an emotional spiral that threatens to derail our entire day. Here’s how we can face our triggers with less reactivity so that we can get on with our lives.

Emotional Intelligence
Gaslighting
Affective Forecasting
Neuroscience

Table of Contents
Random Entry
Chronological
Editorial Information
About the SEP
Editorial Board
How to Cite the SEP
Special Characters
Advanced Tools
Support the SEP
PDFs for SEP Friends
Make a Donation
SEPIA for Libraries
Back to Entry
Entry Contents
Entry Bibliography
Academic Tools
Friends PDF Preview
Author and Citation Info
Back to Top

Supplement to Critical Thinking

How can one assess, for purposes of instruction or research, the degree to which a person possesses the dispositions, skills and knowledge of a critical thinker?

In psychometrics, assessment instruments are judged according to their validity and reliability.

Roughly speaking, an instrument is valid if it measures accurately what it purports to measure, given standard conditions. More precisely, the degree of validity is “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (American Educational Research Association 2014: 11). In other words, a test is not valid or invalid in itself. Rather, validity is a property of an interpretation of a given score on a given test for a specified use. Determining the degree of validity of such an interpretation requires collection and integration of the relevant evidence, which may be based on test content, test takers’ response processes, a test’s internal structure, relationship of test scores to other variables, and consequences of the interpretation (American Educational Research Association 2014: 13–21). Criterion-related evidence consists of correlations between scores on the test and performance on another test of the same construct; its weight depends on how well supported is the assumption that the other test can be used as a criterion. Content-related evidence is evidence that the test covers the full range of abilities that it claims to test. Construct-related evidence is evidence that a correct answer reflects good performance of the kind being measured and an incorrect answer reflects poor performance.

An instrument is reliable if it consistently produces the same result, whether across different forms of the same test (parallel-forms reliability), across different items (internal consistency), across different administrations to the same person (test-retest reliability), or across ratings of the same answer by different people (inter-rater reliability). Internal consistency should be expected only if the instrument purports to measure a single undifferentiated construct, and thus should not be expected of a test that measures a suite of critical thinking dispositions or critical thinking abilities, assuming that some people are better in some of the respects measured than in others (for example, very willing to inquire but rather closed-minded). Otherwise, reliability is a necessary but not a sufficient condition of validity; a standard example of a reliable instrument that is not valid is a bathroom scale that consistently under-reports a person’s weight.

Assessing dispositions is difficult if one uses a multiple-choice format with known adverse consequences of a low score. It is pretty easy to tell what answer to the question “How open-minded are you?” will get the highest score and to give that answer, even if one knows that the answer is incorrect. If an item probes less directly for a critical thinking disposition, for example by asking how often the test taker pays close attention to views with which the test taker disagrees, the answer may differ from reality because of self-deception or simple lack of awareness of one’s personal thinking style, and its interpretation is problematic, even if factor analysis enables one to identify a distinct factor measured by a group of questions that includes this one (Ennis 1996). Nevertheless, Facione, Sánchez, and Facione (1994) used this approach to develop the California Critical Thinking Dispositions Inventory (CCTDI). They began with 225 statements expressive of a disposition towards or away from critical thinking (using the long list of dispositions in Facione 1990a), validated the statements with talk-aloud and conversational strategies in focus groups to determine whether people in the target population understood the items in the way intended, administered a pilot version of the test with 150 items, and eliminated items that failed to discriminate among test takers or were inversely correlated with overall results or added little refinement to overall scores (Facione 2000). They used item analysis and factor analysis to group the measured dispositions into seven broad constructs: open-mindedness, analyticity, cognitive maturity, truth-seeking, systematicity, inquisitiveness, and self-confidence (Facione, Sánchez, and Facione 1994). The resulting test consists of 75 agree-disagree statements and takes 20 minutes to administer. A repeated disturbing finding is that North American students taking the test tend to score low on the truth-seeking sub-scale (on which a low score results from agreeing to such statements as the following: “To get people to agree with me I would give any reason that worked”. “Everyone always argues from their own self-interest, including me”. “If there are four reasons in favor and one against, I’ll go with the four”.) Development of the CCTDI made it possible to test whether good critical thinking abilities and good critical thinking dispositions go together, in which case it might be enough to teach one without the other. Facione (2000) reports that administration of the CCTDI and the California Critical Thinking Skills Test (CCTST) to almost 8,000 post-secondary students in the United States revealed a statistically significant but weak correlation between total scores on the two tests, and also between paired sub-scores from the two tests. The implication is that both abilities and dispositions need to be taught, that one cannot expect improvement in one to bring with it improvement in the other.

A more direct way of assessing critical thinking dispositions would be to see what people do when put in a situation where the dispositions would reveal themselves. Ennis (1996) reports promising initial work with guided open-ended opportunities to give evidence of dispositions, but no standardized test seems to have emerged from this work. There are however standardized aspect-specific tests of critical thinking dispositions. The Critical Problem Solving Scale (Berman et al. 2001: 518) takes as a measure of the disposition to suspend judgment the number of distinct good aspects attributed to an option judged to be the worst among those generated by the test taker. Stanovich, West and Toplak (2011: 800–810) list tests developed by cognitive psychologists of the following dispositions: resistance to miserly information processing, resistance to myside thinking, absence of irrelevant context effects in decision-making, actively open-minded thinking, valuing reason and truth, tendency to seek information, objective reasoning style, tendency to seek consistency, sense of self-efficacy, prudent discounting of the future, self-control skills, and emotional regulation.

It is easier to measure critical thinking skills or abilities than to measure dispositions. The following eight currently available standardized tests purport to measure them: the Watson-Glaser Critical Thinking Appraisal (Watson & Glaser 1980a, 1980b, 1994), the Cornell Critical Thinking Tests Level X and Level Z (Ennis & Millman 1971; Ennis, Millman, & Tomko 1985, 2005), the Ennis-Weir Critical Thinking Essay Test (Ennis & Weir 1985), the California Critical Thinking Skills Test (Facione 1990b, 1992), the Halpern Critical Thinking Assessment (Halpern 2016), the Critical Thinking Assessment Test (Center for Assessment & Improvement of Learning 2017), the Collegiate Learning Assessment (Council for Aid to Education 2017), the HEIghten Critical Thinking Assessment (https://territorium.com/heighten/), and a suite of critical thinking assessments for different groups and purposes offered by Insight Assessment (https://www.insightassessment.com/products). The Critical Thinking Assessment Test (CAT) is unique among them in being designed for use by college faculty to help them improve their development of students’ critical thinking skills (Haynes et al. 2015; Haynes & Stein 2021). Also, for some years the United Kingdom body OCR (Oxford Cambridge and RSA Examinations) awarded AS and A Level certificates in critical thinking on the basis of an examination (OCR 2011). Many of these standardized tests have received scholarly evaluations at the hands of, among others, Ennis (1958), McPeck (1981), Norris and Ennis (1989), Fisher and Scriven (1997), Possin (2008, 2013a, 2013b, 2013c, 2014, 2020) and Hatcher and Possin (2021). Their evaluations provide a useful set of criteria that such tests ideally should meet, as does the description by Ennis (1984) of problems in testing for competence in critical thinking: the soundness of multiple-choice items, the clarity and soundness of instructions to test takers, the information and mental processing used in selecting an answer to a multiple-choice item, the role of background beliefs and ideological commitments in selecting an answer to a multiple-choice item, the tenability of a test’s underlying conception of critical thinking and its component abilities, the set of abilities that the test manual claims are covered by the test, the extent to which the test actually covers these abilities, the appropriateness of the weighting given to various abilities in the scoring system, the accuracy and intellectual honesty of the test manual, the interest of the test to the target population of test takers, the scope for guessing, the scope for choosing a keyed answer by being test-wise, precautions against cheating in the administration of the test, clarity and soundness of materials for training essay graders, inter-rater reliability in grading essays, and clarity and soundness of advance guidance to test takers on what is required in an essay. Rear (2019) has challenged the use of standardized tests of critical thinking as a way to measure educational outcomes, on the grounds that they (1) fail to take into account disputes about conceptions of critical thinking, (2) are not completely valid or reliable, and (3) fail to evaluate skills used in real academic tasks. He proposes instead assessments based on discipline-specific content.

There are also aspect-specific standardized tests of critical thinking abilities. Stanovich, West and Toplak (2011: 800–810) list tests of probabilistic reasoning, insights into qualitative decision theory, knowledge of scientific reasoning, knowledge of rules of logical consistency and validity, and economic thinking. They also list instruments that probe for irrational thinking, such as superstitious thinking, belief in the superiority of intuition, over-reliance on folk wisdom and folk psychology, belief in “special” expertise, financial misconceptions, overestimation of one’s introspective powers, dysfunctional beliefs, and a notion of self that encourages egocentric processing. They regard these tests along with the previously mentioned tests of critical thinking dispositions as the building blocks for a comprehensive test of rationality, whose development (they write) may be logistically difficult and would require millions of dollars.

A superb example of assessment of an aspect of critical thinking ability is the Test on Appraising Observations (Norris & King 1983, 1985, 1990a, 1990b), which was designed for classroom administration to senior high school students. The test focuses entirely on the ability to appraise observation statements and in particular on the ability to determine in a specified context which of two statements there is more reason to believe. According to the test manual (Norris & King 1985, 1990b), a person’s score on the multiple-choice version of the test, which is the number of items that are answered correctly, can justifiably be given either a criterion-referenced or a norm-referenced interpretation.

On a criterion-referenced interpretation, those who do well on the test have a firm grasp of the principles for appraising observation statements, and those who do poorly have a weak grasp of them. This interpretation can be justified by the content of the test and the way it was developed, which incorporated a method of controlling for background beliefs articulated and defended by Norris (1985). Norris and King synthesized from judicial practice, psychological research and common-sense psychology 31 principles for appraising observation statements, in the form of empirical generalizations about tendencies, such as the principle that observation statements tend to be more believable than inferences based on them (Norris & King 1984). They constructed items in which exactly one of the 31 principles determined which of two statements was more believable. Using a carefully constructed protocol, they interviewed about 100 students who responded to these items in order to determine the thinking that led them to choose the answers they did (Norris & King 1984). In several iterations of the test, they adjusted items so that selection of the correct answer generally reflected good thinking and selection of an incorrect answer reflected poor thinking. Thus they have good evidence that good performance on the test is due to good thinking about observation statements and that poor performance is due to poor thinking about observation statements. Collectively, the 50 items on the final version of the test require application of 29 of the 31 principles for appraising observation statements, with 13 principles tested by one item, 12 by two items, three by three items, and one by four items. Thus there is comprehensive coverage of the principles for appraising observation statements. Fisher and Scriven (1997: 135–136) judge the items to be well worked and sound, with one exception. The test is clearly written at a grade 6 reading level, meaning that poor performance cannot be attributed to difficulties in reading comprehension by the intended adolescent test takers. The stories that frame the items are realistic, and are engaging enough to stimulate test takers’ interest. Thus the most plausible explanation of a given score on the test is that it reflects roughly the degree to which the test taker can apply principles for appraising observations in real situations. In other words, there is good justification of the proposed interpretation that those who do well on the test have a firm grasp of the principles for appraising observation statements and those who do poorly have a weak grasp of them.

To get norms for performance on the test, Norris and King arranged for seven groups of high school students in different types of communities and with different levels of academic ability to take the test. The test manual includes percentiles, means, and standard deviations for each of these seven groups. These norms allow teachers to compare the performance of their class on the test to that of a similar group of students.

Accessibility

Support SEP

Mirror sites.

View this site from another server:

Info about mirror sites

Library of Congress Catalog Data: ISSN 1095-5054

Reference Manager
Simple TEXT file

People also looked at

Original research article, performance assessment of critical thinking: conceptualization, design, and implementation.

1 Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, United States
2 Graduate School of Education, Stanford University, Stanford, CA, United States
3 Department of Business and Economics Education, Johannes Gutenberg University, Mainz, Germany

Enhancing students’ critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves dealing with dilemmas involving ambiguity or conflicts among principles and contradictory information. We argue that performance assessment provides the most realistic—and most credible—approach to measuring CT. From this conceptualization and construct definition, we describe one possible framework for building performance assessments of CT with attention to extended performance tasks within the assessment system. The framework is a product of an ongoing, collaborative effort, the International Performance Assessment of Learning (iPAL). The framework comprises four main aspects: (1) The storyline describes a carefully curated version of a complex, real-world situation. (2) The challenge frames the task to be accomplished (3). A portfolio of documents in a range of formats is drawn from multiple sources chosen to have specific characteristics. (4) The scoring rubric comprises a set of scales each linked to a facet of the construct. We discuss a number of use cases, as well as the challenges that arise with the use and valid interpretation of performance assessments. The final section presents elements of the iPAL research program that involve various refinements and extensions of the assessment framework, a number of empirical studies, along with linkages to current work in online reading and information processing.

Introduction

In their mission statements, most colleges declare that a principal goal is to develop students’ higher-order cognitive skills such as critical thinking (CT) and reasoning (e.g., Shavelson, 2010 ; Hyytinen et al., 2019 ). The importance of CT is echoed by business leaders ( Association of American Colleges and Universities [AACU], 2018 ), as well as by college faculty (for curricular analyses in Germany, see e.g., Zlatkin-Troitschanskaia et al., 2018 ). Indeed, in the 2019 administration of the Faculty Survey of Student Engagement (FSSE), 93% of faculty reported that they “very much” or “quite a bit” structure their courses to support student development with respect to thinking critically and analytically. In a listing of 21st century skills, CT was the most highly ranked among FSSE respondents ( Indiana University, 2019 ). Nevertheless, there is considerable evidence that many college students do not develop these skills to a satisfactory standard ( Arum and Roksa, 2011 ; Shavelson et al., 2019 ; Zlatkin-Troitschanskaia et al., 2019 ). This state of affairs represents a serious challenge to higher education – and to society at large.

In view of the importance of CT, as well as evidence of substantial variation in its development during college, its proper measurement is essential to tracking progress in skill development and to providing useful feedback to both teachers and learners. Feedback can help focus students’ attention on key skill areas in need of improvement, and provide insight to teachers on choices of pedagogical strategies and time allocation. Moreover, comparative studies at the program and institutional level can inform higher education leaders and policy makers.

The conceptualization and definition of CT presented here is closely related to models of information processing and online reasoning, the skills that are the focus of this special issue. These two skills are especially germane to the learning environments that college students experience today when much of their academic work is done online. Ideally, students should be capable of more than naïve Internet search, followed by copy-and-paste (e.g., McGrew et al., 2017 ); rather, for example, they should be able to critically evaluate both sources of evidence and the quality of the evidence itself in light of a given purpose ( Leu et al., 2020 ).

In this paper, we present a systematic approach to conceptualizing CT. From that conceptualization and construct definition, we present one possible framework for building performance assessments of CT with particular attention to extended performance tasks within the test environment. The penultimate section discusses some of the challenges that arise with the use and valid interpretation of performance assessment scores. We conclude the paper with a section on future perspectives in an emerging field of research – the iPAL program.

Conceptual Foundations, Definition and Measurement of Critical Thinking

In this section, we briefly review the concept of CT and its definition. In accordance with the principles of evidence-centered design (ECD; Mislevy et al., 2003 ), the conceptualization drives the measurement of the construct; that is, implementation of ECD directly links aspects of the assessment framework to specific facets of the construct. We then argue that performance assessments designed in accordance with such an assessment framework provide the most realistic—and most credible—approach to measuring CT. The section concludes with a sketch of an approach to CT measurement grounded in performance assessment .

Concept and Definition of Critical Thinking

Taxonomies of 21st century skills ( Pellegrino and Hilton, 2012 ) abound, and it is neither surprising that CT appears in most taxonomies of learning, nor that there are many different approaches to defining and operationalizing the construct of CT. There is, however, general agreement that CT is a multifaceted construct ( Liu et al., 2014 ). Liu et al. (2014) identified five key facets of CT: (i) evaluating evidence and the use of evidence; (ii) analyzing arguments; (iii) understanding implications and consequences; (iv) developing sound arguments; and (v) understanding causation and explanation.

There is empirical support for these facets from college faculty. A 2016–2017 survey conducted by the Higher Education Research Institute (HERI) at the University of California, Los Angeles found that a substantial majority of faculty respondents “frequently” encouraged students to: (i) evaluate the quality or reliability of the information they receive; (ii) recognize biases that affect their thinking; (iii) analyze multiple sources of information before coming to a conclusion; and (iv) support their opinions with a logical argument ( Stolzenberg et al., 2019 ).

There is general agreement that CT involves the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion (e.g., Erwin and Sebrell, 2003 ; Kosslyn and Nelson, 2017 ; Shavelson et al., 2018 ). We further suggest that CT includes dealing with dilemmas of ambiguity or conflict among principles and contradictory information ( Oser and Biedermann, 2020 ).

Importantly, Oser and Biedermann (2020) posit that CT can be manifested at three levels. The first level, Critical Analysis , is the most complex of the three levels. Critical Analysis requires both knowledge in a specific discipline (conceptual) and procedural analytical (deduction, inclusion, etc.) knowledge. The second level is Critical Reflection , which involves more generic skills “… necessary for every responsible member of a society” (p. 90). It is “a basic attitude that must be taken into consideration if (new) information is questioned to be true or false, reliable or not reliable, moral or immoral etc.” (p. 90). To engage in Critical Reflection, one needs not only apply analytic reasoning, but also adopt a reflective stance toward the political, social, and other consequences of choosing a course of action. It also involves analyzing the potential motives of various actors involved in the dilemma of interest. The third level, Critical Alertness , involves questioning one’s own or others’ thinking from a skeptical point of view.

Wheeler and Haertel (1993) categorized higher-order skills, such as CT, into two types: (i) when solving problems and making decisions in professional and everyday life, for instance, related to civic affairs and the environment; and (ii) in situations where various mental processes (e.g., comparing, evaluating, and justifying) are developed through formal instruction, usually in a discipline. Hence, in both settings, individuals must confront situations that typically involve a problematic event, contradictory information, and possibly conflicting principles. Indeed, there is an ongoing debate concerning whether CT should be evaluated using generic or discipline-based assessments ( Nagel et al., 2020 ). Whether CT skills are conceptualized as generic or discipline-specific has implications for how they are assessed and how they are incorporated into the classroom.

In the iPAL project, CT is characterized as a multifaceted construct that comprises conceptualizing, analyzing, drawing inferences or synthesizing information, evaluating claims, and applying the results of these reasoning processes to various purposes (e.g., solve a problem, decide on a course of action, find an answer to a given question or reach a conclusion) ( Shavelson et al., 2019 ). In the course of carrying out a CT task, an individual typically engages in activities such as specifying or clarifying a problem; deciding what information is relevant to the problem; evaluating the trustworthiness of information; avoiding judgmental errors based on “fast thinking”; avoiding biases and stereotypes; recognizing different perspectives and how they can reframe a situation; considering the consequences of alternative courses of actions; and communicating clearly and concisely decisions and actions. The order in which activities are carried out can vary among individuals and the processes can be non-linear and reciprocal.

In this article, we focus on generic CT skills. The importance of these skills derives not only from their utility in academic and professional settings, but also the many situations involving challenging moral and ethical issues – often framed in terms of conflicting principles and/or interests – to which individuals have to apply these skills ( Kegan, 1994 ; Tessier-Lavigne, 2020 ). Conflicts and dilemmas are ubiquitous in the contexts in which adults find themselves: work, family, civil society. Moreover, to remain viable in the global economic environment – one characterized by increased competition and advances in second generation artificial intelligence (AI) – today’s college students will need to continually develop and leverage their CT skills. Ideally, colleges offer a supportive environment in which students can develop and practice effective approaches to reasoning about and acting in learning, professional and everyday situations.

Measurement of Critical Thinking

Critical thinking is a multifaceted construct that poses many challenges to those who would develop relevant and valid assessments. For those interested in current approaches to the measurement of CT that are not the focus of this paper, consult Zlatkin-Troitschanskaia et al. (2018) .

In this paper, we have singled out performance assessment as it offers important advantages to measuring CT. Extant tests of CT typically employ response formats such as forced-choice or short-answer, and scenario-based tasks (for an overview, see Liu et al., 2014 ). They all suffer from moderate to severe construct underrepresentation; that is, they fail to capture important facets of the CT construct such as perspective taking and communication. High fidelity performance tasks are viewed as more authentic in that they provide a problem context and require responses that are more similar to what individuals confront in the real world than what is offered by traditional multiple-choice items ( Messick, 1994 ; Braun, 2019 ). This greater verisimilitude promises higher levels of construct representation and lower levels of construct-irrelevant variance. Such performance tasks have the capacity to measure facets of CT that are imperfectly assessed, if at all, using traditional assessments ( Lane and Stone, 2006 ; Braun, 2019 ; Shavelson et al., 2019 ). However, these assertions must be empirically validated, and the measures should be subjected to psychometric analyses. Evidence of the reliability, validity, and interpretative challenges of performance assessment (PA) are extensively detailed in Davey et al. (2015) .

We adopt the following definition of performance assessment:

A performance assessment (sometimes called a work sample when assessing job performance) … is an activity or set of activities that requires test takers, either individually or in groups, to generate products or performances in response to a complex, most often real-world task. These products and performances provide observable evidence bearing on test takers’ knowledge, skills, and abilities—their competencies—in completing the assessment ( Davey et al., 2015 , p. 10).

A performance assessment typically includes an extended performance task and short constructed-response and selected-response (i.e., multiple-choice) tasks (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). In this paper, we refer to both individual performance- and constructed-response tasks as performance tasks (PT) (For an example, see Table 1 in section “iPAL Assessment Framework”).

Table 1. The iPAL assessment framework.

An Approach to Performance Assessment of Critical Thinking: The iPAL Program

The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1 ). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and practice in measuring CT with performance tasks ( Shavelson et al., 2018 ). In this section, we present iPAL’s assessment framework as the basis of measuring CT, with examples along the way.

iPAL Background

The iPAL assessment framework builds on the Council of Aid to Education’s Collegiate Learning Assessment (CLA). The CLA was designed to measure cross-disciplinary, generic competencies, such as CT, analytic reasoning, problem solving, and written communication ( Klein et al., 2007 ; Shavelson, 2010 ). Ideally, each PA contained an extended PT (e.g., examining a range of evidential materials related to the crash of an aircraft) and two short PT’s: one in which students either critique an argument or provide a solution in response to a real-world societal issue.

Motivated by considerations of adequate reliability, in 2012, the CLA was later modified to create the CLA+. The CLA+ includes two subtests: a PT and a 25-item Selected Response Question (SRQ) section. The PT presents a document or problem statement and an assignment based on that document which elicits an open-ended response. The CLA+ added the SRQ section (which is not linked substantively to the PT scenario) to increase the number of student responses to obtain more reliable estimates of performance at the student-level than could be achieved with a single PT ( Zahner, 2013 ; Davey et al., 2015 ).

iPAL Assessment Framework

Methodological foundations.

The iPAL framework evolved from the Collegiate Learning Assessment developed by Klein et al. (2007) . It was also informed by the results from the AHELO pilot study ( Organisation for Economic Co-operation and Development [OECD], 2012 , 2013 ), as well as the KoKoHs research program in Germany (for an overview see, Zlatkin-Troitschanskaia et al., 2017 , 2020 ). The ongoing refinement of the iPAL framework has been guided in part by the principles of Evidence Centered Design (ECD) ( Mislevy et al., 2003 ; Mislevy and Haertel, 2006 ; Haertel and Fujii, 2017 ).

In educational measurement, an assessment framework plays a critical intermediary role between the theoretical formulation of the construct and the development of the assessment instrument containing tasks (or items) intended to elicit evidence with respect to that construct ( Mislevy et al., 2003 ). Builders of the assessment framework draw on the construct theory and operationalize it in a way that provides explicit guidance to PT’s developers. Thus, the framework should reflect the relevant facets of the construct, where relevance is determined by substantive theory or an appropriate alternative such as behavioral samples from real-world situations of interest (criterion-sampling; McClelland, 1973 ), as well as the intended use(s) (for an example, see Shavelson et al., 2019 ). By following the requirements and guidelines embodied in the framework, instrument developers strengthen the claim of construct validity for the instrument ( Messick, 1994 ).

An assessment framework can be specified at different levels of granularity: an assessment battery (“omnibus” assessment, for an example see below), a single performance task, or a specific component of an assessment ( Shavelson, 2010 ; Davey et al., 2015 ). In the iPAL program, a performance assessment comprises one or more extended performance tasks and additional selected-response and short constructed-response items. The focus of the framework specified below is on a single PT intended to elicit evidence with respect to some facets of CT, such as the evaluation of the trustworthiness of the documents provided and the capacity to address conflicts of principles.

From the ECD perspective, an assessment is an instrument for generating information to support an evidentiary argument and, therefore, the intended inferences (claims) must guide each stage of the design process. The construct of interest is operationalized through the Student Model , which represents the target knowledge, skills, and abilities, as well as the relationships among them. The student model should also make explicit the assumptions regarding student competencies in foundational skills or content knowledge. The Task Model specifies the features of the problems or items posed to the respondent, with the goal of eliciting the evidence desired. The assessment framework also describes the collection of task models comprising the instrument, with considerations of construct validity, various psychometric characteristics (e.g., reliability) and practical constraints (e.g., testing time and cost). The student model provides grounds for evidence of validity, especially cognitive validity; namely, that the students are thinking critically in responding to the task(s).

In the present context, the target construct (CT) is the competence of individuals to think critically, which entails solving complex, real-world problems, and clearly communicating their conclusions or recommendations for action based on trustworthy, relevant and unbiased information. The situations, drawn from actual events, are challenging and may arise in many possible settings. In contrast to more reductionist approaches to assessment development, the iPAL approach and framework rests on the assumption that properly addressing these situational demands requires the application of a constellation of CT skills appropriate to the particular task presented (e.g., Shavelson, 2010 , 2013 ). For a PT, the assessment framework must also specify the rubric by which the responses will be evaluated. The rubric must be properly linked to the target construct so that the resulting score profile constitutes evidence that is both relevant and interpretable in terms of the student model (for an example, see Zlatkin-Troitschanskaia et al., 2019 ).

iPAL Task Framework

The iPAL ‘omnibus’ framework comprises four main aspects: A storyline , a challenge , a document library , and a scoring rubric . Table 1 displays these aspects, brief descriptions of each, and the corresponding examples drawn from an iPAL performance assessment (Version adapted from original in Hyytinen and Toom, 2019 ). Storylines are drawn from various domains; for example, the worlds of business, public policy, civics, medicine, and family. They often involve moral and/or ethical considerations. Deriving an appropriate storyline from a real-world situation requires careful consideration of which features are to be kept in toto , which adapted for purposes of the assessment, and which to be discarded. Framing the challenge demands care in wording so that there is minimal ambiguity in what is required of the respondent. The difficulty of the challenge depends, in large part, on the nature and extent of the information provided in the document library , the amount of scaffolding included, as well as the scope of the required response. The amount of information and the scope of the challenge should be commensurate with the amount of time available. As is evident from the table, the characteristics of the documents in the library are intended to elicit responses related to facets of CT. For example, with regard to bias, the information provided is intended to play to judgmental errors due to fast thinking and/or motivational reasoning. Ideally, the situation should accommodate multiple solutions of varying degrees of merit.

The dimensions of the scoring rubric are derived from the Task Model and Student Model ( Mislevy et al., 2003 ) and signal which features are to be extracted from the response and indicate how they are to be evaluated. There should be a direct link between the evaluation of the evidence and the claims that are made with respect to the key features of the task model and student model . More specifically, the task model specifies the various manipulations embodied in the PA and so informs scoring, while the student model specifies the capacities students employ in more or less effectively responding to the tasks. The score scales for each of the five facets of CT (see section “Concept and Definition of Critical Thinking”) can be specified using appropriate behavioral anchors (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). Of particular importance is the evaluation of the response with respect to the last dimension of the scoring rubric; namely, the overall coherence and persuasiveness of the argument, building on the explicit or implicit characteristics related to the first five dimensions. The scoring process must be monitored carefully to ensure that (trained) raters are judging each response based on the same types of features and evaluation criteria ( Braun, 2019 ) as indicated by interrater agreement coefficients.

The scoring rubric of the iPAL omnibus framework can be modified for specific tasks ( Lane and Stone, 2006 ). This generic rubric helps ensure consistency across rubrics for different storylines. For example, Zlatkin-Troitschanskaia et al. (2019 , p. 473) used the following scoring scheme:

Based on our construct definition of CT and its four dimensions: (D1-Info) recognizing and evaluating information, (D2-Decision) recognizing and evaluating arguments and making decisions, (D3-Conseq) recognizing and evaluating the consequences of decisions, and (D4-Writing), we developed a corresponding analytic dimensional scoring … The students’ performance is evaluated along the four dimensions, which in turn are subdivided into a total of 23 indicators as (sub)categories of CT … For each dimension, we sought detailed evidence in students’ responses for the indicators and scored them on a six-point Likert-type scale. In order to reduce judgment distortions, an elaborate procedure of ‘behaviorally anchored rating scales’ (Smith and Kendall, 1963) was applied by assigning concrete behavioral expectations to certain scale points (Bernardin et al., 1976). To this end, we defined the scale levels by short descriptions of typical behavior and anchored them with concrete examples. … We trained four raters in 1 day using a specially developed training course to evaluate students’ performance along the 23 indicators clustered into four dimensions (for a description of the rater training, see Klotzer, 2018).

Shavelson et al. (2019) examined the interrater agreement of the scoring scheme developed by Zlatkin-Troitschanskaia et al. (2019) and “found that with 23 items and 2 raters the generalizability (“reliability”) coefficient for total scores to be 0.74 (with 4 raters, 0.84)” ( Shavelson et al., 2019 , p. 15). In the study by Zlatkin-Troitschanskaia et al. (2019 , p. 478) three score profiles were identified (low-, middle-, and high-performer) for students. Proper interpretation of such profiles requires care. For example, there may be multiple possible explanations for low scores such as poor CT skills, a lack of a disposition to engage with the challenge, or the two attributes jointly. These alternative explanations for student performance can potentially pose a threat to the evidentiary argument. In this case, auxiliary information may be available to aid in resolving the ambiguity. For example, student responses to selected- and short-constructed-response items in the PA can provide relevant information about the levels of the different skills possessed by the student. When sufficient data are available, the scores can be modeled statistically and/or qualitatively in such a way as to bring them to bear on the technical quality or interpretability of the claims of the assessment: reliability, validity, and utility evidence ( Davey et al., 2015 ; Zlatkin-Troitschanskaia et al., 2019 ). These kinds of concerns are less critical when PT’s are used in classroom settings. The instructor can draw on other sources of evidence, including direct discussion with the student.

Use of iPAL Performance Assessments in Educational Practice: Evidence From Preliminary Validation Studies

The assessment framework described here supports the development of a PT in a general setting. Many modifications are possible and, indeed, desirable. If the PT is to be more deeply embedded in a certain discipline (e.g., economics, law, or medicine), for example, then the framework must specify characteristics of the narrative and the complementary documents as to the breadth and depth of disciplinary knowledge that is represented.

At present, preliminary field trials employing the omnibus framework (i.e., a full set of documents) indicated that 60 min was generally an inadequate amount of time for students to engage with the full set of complementary documents and to craft a complete response to the challenge (for an example, see Shavelson et al., 2019 ). Accordingly, it would be helpful to develop modified frameworks for PT’s that require substantially less time. For an example, see a short performance assessment of civic online reasoning, requiring response times from 10 to 50 min ( Wineburg et al., 2016 ). Such assessment frameworks could be derived from the omnibus framework by focusing on a reduced number of facets of CT, and specifying the characteristics of the complementary documents to be included – or, perhaps, choices among sets of documents. In principle, one could build a ‘family’ of PT’s, each using the same (or nearly the same) storyline and a subset of the full collection of complementary documents.

Paul and Elder (2007) argue that the goal of CT assessments should be to provide faculty with important information about how well their instruction supports the development of students’ CT. In that spirit, the full family of PT’s could represent all facets of the construct while affording instructors and students more specific insights on strengths and weaknesses with respect to particular facets of CT. Moreover, the framework should be expanded to include the design of a set of short answer and/or multiple choice items to accompany the PT. Ideally, these additional items would be based on the same narrative as the PT to collect more nuanced information on students’ precursor skills such as reading comprehension, while enhancing the overall reliability of the assessment. Areas where students are under-prepared could be addressed before, or even in parallel with the development of the focal CT skills. The parallel approach follows the co-requisite model of developmental education. In other settings (e.g., for summative assessment), these complementary items would be administered after the PT to augment the evidence in relation to the various claims. The full PT taking 90 min or more could serve as a capstone assessment.

As we transition from simply delivering paper-based assessments by computer to taking full advantage of the affordances of a digital platform, we should learn from the hard-won lessons of the past so that we can make swifter progress with fewer missteps. In that regard, we must take validity as the touchstone – assessment design, development and deployment must all be tightly linked to the operational definition of the CT construct. Considerations of reliability and practicality come into play with various use cases that highlight different purposes for the assessment (for future perspectives, see next section).

The iPAL assessment framework represents a feasible compromise between commercial, standardized assessments of CT (e.g., Liu et al., 2014 ), on the one hand, and, on the other, freedom for individual faculty to develop assessment tasks according to idiosyncratic models. It imposes a degree of standardization on both task development and scoring, while still allowing some flexibility for faculty to tailor the assessment to meet their unique needs. In so doing, it addresses a key weakness of the AAC&U’s VALUE initiative 2 (retrieved 5/7/2020) that has achieved wide acceptance among United States colleges.

The VALUE initiative has produced generic scoring rubrics for 15 domains including CT, problem-solving and written communication. A rubric for a particular skill domain (e.g., critical thinking) has five to six dimensions with four ordered performance levels for each dimension (1 = lowest, 4 = highest). The performance levels are accompanied by language that is intended to clearly differentiate among levels. 3 Faculty are asked to submit student work products from a senior level course that is intended to yield evidence with respect to student learning outcomes in a particular domain and that, they believe, can elicit performances at the highest level. The collection of work products is then graded by faculty from other institutions who have been trained to apply the rubrics.

A principal difficulty is that there is neither a common framework to guide the design of the challenge, nor any control on task complexity and difficulty. Consequently, there is substantial heterogeneity in the quality and evidential value of the submitted responses. This also causes difficulties with task scoring and inter-rater reliability. Shavelson et al. (2009) discuss some of the problems arising with non-standardized collections of student work.

In this context, one advantage of the iPAL framework is that it can provide valuable guidance and an explicit structure for faculty in developing performance tasks for both instruction and formative assessment. When faculty design assessments, their focus is typically on content coverage rather than other potentially important characteristics, such as the degree of construct representation and the adequacy of their scoring procedures ( Braun, 2019 ).

Concluding Reflections

Challenges to interpretation and implementation.

Performance tasks such as those generated by iPAL are attractive instruments for assessing CT skills (e.g., Shavelson, 2010 ; Shavelson et al., 2019 ). The attraction mainly rests on the assumption that elaborated PT’s are more authentic (direct) and more completely capture facets of the target construct (i.e., possess greater construct representation) than the widely used selected-response tests. However, as Messick (1994) noted authenticity is a “promissory note” that must be redeemed with empirical research. In practice, there are trade-offs among authenticity, construct validity, and psychometric quality such as reliability ( Davey et al., 2015 ).

One reason for Messick (1994) caution is that authenticity does not guarantee construct validity. The latter must be established by drawing on multiple sources of evidence ( American Educational Research Association et al., 2014 ). Following the ECD principles in designing and developing the PT, as well as the associated scoring rubrics, constitutes an important type of evidence. Further, as Leighton (2019) argues, response process data (“cognitive validity”) is needed to validate claims regarding the cognitive complexity of PT’s. Relevant data can be obtained through cognitive laboratory studies involving methods such as think aloud protocols or eye-tracking. Although time-consuming and expensive, such studies can yield not only evidence of validity, but also valuable information to guide refinements of the PT.

Going forward, iPAL PT’s must be subjected to validation studies as recommended in the Standards for Psychological and Educational Testing by American Educational Research Association et al. (2014) . With a particular focus on the criterion “relationships to other variables,” a framework should include assumptions about the theoretically expected relationships among the indicators assessed by the PT, as well as the indicators’ relationships to external variables such as intelligence or prior (task-relevant) knowledge.

Complementing the necessity of evaluating construct validity, there is the need to consider potential sources of construct-irrelevant variance (CIV). One pertains to student motivation, which is typically greater when the stakes are higher. If students are not motivated, then their performance is likely to be impacted by factors unrelated to their (construct-relevant) ability ( Lane and Stone, 2006 ; Braun et al., 2011 ; Shavelson, 2013 ). Differential motivation across groups can also bias comparisons. Student motivation might be enhanced if the PT is administered in the context of a course with the promise of generating useful feedback on students’ skill profiles.

Construct-irrelevant variance can also occur when students are not equally prepared for the format of the PT or fully appreciate the response requirements. This source of CIV could be alleviated by providing students with practice PT’s. Finally, the use of novel forms of documentation, such as those from the Internet, can potentially introduce CIV due to differential familiarity with forms of representation or contents. Interestingly, this suggests that there may be a conflict between enhancing construct representation and reducing CIV.

Another potential source of CIV is related to response evaluation. Even with training, human raters can vary in accuracy and usage of the full score range. In addition, raters may attend to features of responses that are unrelated to the target construct, such as the length of the students’ responses or the frequency of grammatical errors ( Lane and Stone, 2006 ). Some of these sources of variance could be addressed in an online environment, where word processing software could alert students to potential grammatical and spelling errors before they submit their final work product.

Performance tasks generally take longer to administer and are more costly than traditional assessments, making it more difficult to reliably measure student performance ( Messick, 1994 ; Davey et al., 2015 ). Indeed, it is well known that more than one performance task is needed to obtain high reliability ( Shavelson, 2013 ). This is due to both student-task interactions and variability in scoring. Sources of student-task interactions are differential familiarity with the topic ( Hyytinen and Toom, 2019 ) and differential motivation to engage with the task. The level of reliability required, however, depends on the context of use. For use in formative assessment as part of an instructional program, reliability can be lower than use for summative purposes. In the former case, other types of evidence are generally available to support interpretation and guide pedagogical decisions. Further studies are needed to obtain estimates of reliability in typical instructional settings.

With sufficient data, more sophisticated psychometric analyses become possible. One challenge is that the assumption of unidimensionality required for many psychometric models might be untenable for performance tasks ( Davey et al., 2015 ). Davey et al. (2015) provide the example of a mathematics assessment that requires students to demonstrate not only their mathematics skills but also their written communication skills. Although the iPAL framework does not explicitly address students’ reading comprehension and organization skills, students will likely need to call on these abilities to accomplish the task. Moreover, as the operational definition of CT makes evident, the student must not only deploy several skills in responding to the challenge of the PT, but also carry out component tasks in sequence. The former requirement strongly indicates the need for a multi-dimensional IRT model, while the latter suggests that the usual assumption of local item independence may well be problematic ( Lane and Stone, 2006 ). At the same time, the analytic scoring rubric should facilitate the use of latent class analysis to partition data from large groups into meaningful categories ( Zlatkin-Troitschanskaia et al., 2019 ).

Future Perspectives

Although the iPAL consortium has made substantial progress in the assessment of CT, much remains to be done. Further refinement of existing PT’s and their adaptation to different languages and cultures must continue. To this point, there are a number of examples: The refugee crisis PT (cited in Table 1 ) was translated and adapted from Finnish to US English and then to Colombian Spanish. A PT concerning kidney transplants was translated and adapted from German to US English. Finally, two PT’s based on ‘legacy admissions’ to US colleges were translated and adapted to Colombian Spanish.

With respect to data collection, there is a need for sufficient data to support psychometric analysis of student responses, especially the relationships among the different components of the scoring rubric, as this would inform both task development and response evaluation ( Zlatkin-Troitschanskaia et al., 2019 ). In addition, more intensive study of response processes through cognitive laboratories and the like are needed to strengthen the evidential argument for construct validity ( Leighton, 2019 ). We are currently conducting empirical studies, collecting data on both iPAL PT’s and other measures of CT. These studies will provide evidence of convergent and discriminant validity.

At the same time, efforts should be directed at further development to support different ways CT PT’s might be used—i.e., use cases—especially those that call for formative use of PT’s. Incorporating formative assessment into courses can plausibly be expected to improve students’ competency acquisition ( Zlatkin-Troitschanskaia et al., 2017 ). With suitable choices of storylines, appropriate combinations of (modified) PT’s, supplemented by short-answer and multiple-choice items, could be interwoven into ordinary classroom activities. The supplementary items may be completely separate from the PT’s (as is the case with the CLA+), loosely coupled with the PT’s (as in drawing on the same storyline), or tightly linked to the PT’s (as in requiring elaboration of certain components of the response to the PT).

As an alternative to such integration, stand-alone modules could be embedded in courses to yield evidence of students’ generic CT skills. Core curriculum courses or general education courses offer ideal settings for embedding performance assessments. If these assessments were administered to a representative sample of students in each cohort over their years in college, the results would yield important information on the development of CT skills at a population level. For another example, these PA’s could be used to assess the competence profiles of students entering Bachelor’s or graduate-level programs as a basis for more targeted instructional support.

Thus, in considering different use cases for the assessment of CT, it is evident that several modifications of the iPAL omnibus assessment framework are needed. As noted earlier, assessments built according to this framework are demanding with respect to the extensive preliminary work required by a task and the time required to properly complete it. Thus, it would be helpful to have modified versions of the framework, focusing on one or two facets of the CT construct and calling for a smaller number of supplementary documents. The challenge to the student should be suitably reduced.

Some members of the iPAL collaborative have developed PT’s that are embedded in disciplines such as engineering, law and education ( Crump et al., 2019 ; for teacher education examples, see Jeschke et al., 2019 ). These are proving to be of great interest to various stakeholders and further development is likely. Consequently, it is essential that an appropriate assessment framework be established and implemented. It is both a conceptual and an empirical question as to whether a single framework can guide development in different domains.

Performance Assessment in Online Learning Environment

Over the last 15 years, increasing amounts of time in both college and work are spent using computers and other electronic devices. This has led to formulation of models for the new literacies that attempt to capture some key characteristics of these activities. A prominent example is a model proposed by Leu et al. (2020) . The model frames online reading as a process of problem-based inquiry that calls on five practices to occur during online research and comprehension:

1. Reading to identify important questions,

2. Reading to locate information,

3. Reading to critically evaluate information,

4. Reading to synthesize online information, and

5. Reading and writing to communicate online information.

The parallels with the iPAL definition of CT are evident and suggest there may be benefits to closer links between these two lines of research. For example, a report by Leu et al. (2014) describes empirical studies comparing assessments of online reading using either open-ended or multiple-choice response formats.

The iPAL consortium has begun to take advantage of the affordances of the online environment (for examples, see Schmidt et al. and Nagel et al. in this special issue). Most obviously, Supplementary Materials can now include archival photographs, audio recordings, or videos. Additional tasks might include the online search for relevant documents, though this would add considerably to the time demands. This online search could occur within a simulated Internet environment, as is the case for the IEA’s ePIRLS assessment ( Mullis et al., 2017 ).

The prospect of having access to a wealth of materials that can add to task authenticity is exciting. Yet it can also add ambiguity and information overload. Increased authenticity, then, should be weighed against validity concerns and the time required to absorb the content in these materials. Modifications of the design framework and extensive empirical testing will be required to decide on appropriate trade-offs. A related possibility is to employ some of these materials in short-answer (or even selected-response) items that supplement the main PT. Response formats could include highlighting text or using a drag-and-drop menu to construct a response. Students’ responses could be automatically scored, thereby containing costs. With automated scoring, feedback to students and faculty, including suggestions for next steps in strengthening CT skills, could also be provided without adding to faculty workload. Therefore, taking advantage of the online environment to incorporate new types of supplementary documents should be a high priority and, perhaps, to introduce new response formats as well. Finally, further investigation of the overlap between this formulation of CT and the characterization of online reading promulgated by Leu et al. (2020) is a promising direction to pursue.

Data Availability Statement

All datasets generated for this study are included in the article/supplementary material.

Author Contributions

HB wrote the article. RS, OZ-T, and KB were involved in the preparation and revision of the article and co-wrote the manuscript. All authors contributed to the article and approved the submitted version.

This study was funded in part by the Spencer Foundation (Grant No. #201700123).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank all the researchers who have participated in the iPAL program.

^ https://www.ipal-rd.com/
^ https://www.aacu.org/value
^ When test results are reported by means of substantively defined categories, the scoring is termed “criterion-referenced”. This is, in contrast to results, reported as percentiles; such scoring is termed “norm-referenced”.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014). Standards for Educational and Psychological Testing. Washington, D.C: American Educational Research Association.

Google Scholar

Arum, R., and Roksa, J. (2011). Academically Adrift: Limited Learning on College Campuses. Chicago, IL: University of Chicago Press.

Association of American Colleges and Universities (n.d.). VALUE: What is value?. Available online at:: https://www.aacu.org/value (accessed May 7, 2020).

Association of American Colleges and Universities [AACU] (2018). Fulfilling the American Dream: Liberal Education and the Future of Work. Available online at:: https://www.aacu.org/research/2018-future-of-work (accessed May 1, 2020).

Braun, H. (2019). Performance assessment and standardization in higher education: a problematic conjunction? Br. J. Educ. Psychol. 89, 429–440. doi: 10.1111/bjep.12274

PubMed Abstract | CrossRef Full Text | Google Scholar

Braun, H. I., Kirsch, I., and Yamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th grade NAEP reading assessment. Teach. Coll. Rec. 113, 2309–2344.

Crump, N., Sepulveda, C., Fajardo, A., and Aguilera, A. (2019). Systematization of performance tests in critical thinking: an interdisciplinary construction experience. Rev. Estud. Educ. 2, 17–47.

Davey, T., Ferrara, S., Shavelson, R., Holland, P., Webb, N., and Wise, L. (2015). Psychometric Considerations for the Next Generation of Performance Assessment. Washington, DC: Center for K-12 Assessment & Performance Management, Educational Testing Service.

Erwin, T. D., and Sebrell, K. W. (2003). Assessment of critical thinking: ETS’s tasks in critical thinking. J. Gen. Educ. 52, 50–70. doi: 10.1353/jge.2003.0019

CrossRef Full Text | Google Scholar

Haertel, G. D., and Fujii, R. (2017). “Evidence-centered design and postsecondary assessment,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 313–339. doi: 10.4324/9781315709307-26

Hyytinen, H., and Toom, A. (2019). Developing a performance assessment task in the Finnish higher education context: conceptual and empirical insights. Br. J. Educ. Psychol. 89, 551–563. doi: 10.1111/bjep.12283

Hyytinen, H., Toom, A., and Shavelson, R. J. (2019). “Enhancing scientific thinking through the development of critical thinking in higher education,” in Redefining Scientific Thinking for Higher Education: Higher-Order Thinking, Evidence-Based Reasoning and Research Skills , eds M. Murtonen and K. Balloo (London: Palgrave MacMillan).

Indiana University (2019). FSSE 2019 Frequencies: FSSE 2019 Aggregate. Available online at:: http://fsse.indiana.edu/pdf/FSSE_IR_2019/summary_tables/FSSE19_Frequencies_(FSSE_2019).pdf (accessed May 1, 2020).

Jeschke, C., Kuhn, C., Lindmeier, A., Zlatkin-Troitschanskaia, O., Saas, H., and Heinze, A. (2019). Performance assessment to investigate the domain specificity of instructional skills among pre-service and in-service teachers of mathematics and economics. Br. J. Educ. Psychol. 89, 538–550. doi: 10.1111/bjep.12277

Kegan, R. (1994). In Over Our Heads: The Mental Demands of Modern Life. Cambridge, MA: Harvard University Press.

Klein, S., Benjamin, R., Shavelson, R., and Bolus, R. (2007). The collegiate learning assessment: facts and fantasies. Eval. Rev. 31, 415–439. doi: 10.1177/0193841x07303318

Kosslyn, S. M., and Nelson, B. (2017). Building the Intentional University: Minerva and the Future of Higher Education. Cambridge, MAL: The MIT Press.

Lane, S., and Stone, C. A. (2006). “Performance assessment,” in Educational Measurement , 4th Edn, ed. R. L. Brennan (Lanham, MA: Rowman & Littlefield Publishers), 387–432.

Leighton, J. P. (2019). The risk–return trade-off: performance assessments and cognitive validation of inferences. Br. J. Educ. Psychol. 89, 441–455. doi: 10.1111/bjep.12271

Leu, D. J., Kiili, C., Forzani, E., Zawilinski, L., McVerry, J. G., and O’Byrne, W. I. (2020). “The new literacies of online research and comprehension,” in The Concise Encyclopedia of Applied Linguistics , ed. C. A. Chapelle (Oxford: Wiley-Blackwell), 844–852.

Leu, D. J., Kulikowich, J. M., Kennedy, C., and Maykel, C. (2014). “The ORCA Project: designing technology-based assessments for online research,” in Paper Presented at the American Educational Research Annual Meeting , Philadelphia, PA.

Liu, O. L., Frankel, L., and Roohr, K. C. (2014). Assessing critical thinking in higher education: current state and directions for next-generation assessments. ETS Res. Rep. Ser. 1, 1–23. doi: 10.1002/ets2.12009

McClelland, D. C. (1973). Testing for competence rather than for “intelligence.”. Am. Psychol. 28, 1–14. doi: 10.1037/h0034092

McGrew, S., Ortega, T., Breakstone, J., and Wineburg, S. (2017). The challenge that’s bigger than fake news: civic reasoning in a social media environment. Am. Educ. 4, 4-9, 39.

Mejía, A., Mariño, J. P., and Molina, A. (2019). Incorporating perspective analysis into critical thinking performance assessments. Br. J. Educ. Psychol. 89, 456–467. doi: 10.1111/bjep.12297

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educ. Res. 23, 13–23. doi: 10.3102/0013189x023002013

Mislevy, R. J., Almond, R. G., and Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Res. Rep. Ser. 2003, i–29. doi: 10.1002/j.2333-8504.2003.tb01908.x

Mislevy, R. J., and Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educ. Meas. Issues Pract. 25, 6–20. doi: 10.1111/j.1745-3992.2006.00075.x

Mullis, I. V. S., Martin, M. O., Foy, P., and Hooper, M. (2017). ePIRLS 2016 International Results in Online Informational Reading. Available online at:: http://timssandpirls.bc.edu/pirls2016/international-results/ (accessed May 1, 2020).

Nagel, M.-T., Zlatkin-Troitschanskaia, O., Schmidt, S., and Beck, K. (2020). “Performance assessment of generic and domain-specific skills in higher education economics,” in Student Learning in German Higher Education , eds O. Zlatkin-Troitschanskaia, H. A. Pant, M. Toepper, and C. Lautenbach (Berlin: Springer), 281–299. doi: 10.1007/978-3-658-27886-1_14

Organisation for Economic Co-operation and Development [OECD] (2012). AHELO: Feasibility Study Report , Vol. 1. Paris: OECD. Design and implementation.

Organisation for Economic Co-operation and Development [OECD] (2013). AHELO: Feasibility Study Report , Vol. 2. Paris: OECD. Data analysis and national experiences.

Oser, F. K., and Biedermann, H. (2020). “A three-level model for critical thinking: critical alertness, critical reflection, and critical analysis,” in Frontiers and Advances in Positive Learning in the Age of Information (PLATO) , ed. O. Zlatkin-Troitschanskaia (Cham: Springer), 89–106. doi: 10.1007/978-3-030-26578-6_7

Paul, R., and Elder, L. (2007). Consequential validity: using assessment to drive instruction. Found. Crit. Think. 29, 31–40.

Pellegrino, J. W., and Hilton, M. L. (eds) (2012). Education for life and work: Developing Transferable Knowledge and Skills in the 21st Century. Washington DC: National Academies Press.

Shavelson, R. (2010). Measuring College Learning Responsibly: Accountability in a New Era. Redwood City, CA: Stanford University Press.

Shavelson, R. J. (2013). On an approach to testing and modeling competence. Educ. Psychol. 48, 73–86. doi: 10.1080/00461520.2013.779483

Shavelson, R. J., Zlatkin-Troitschanskaia, O., Beck, K., Schmidt, S., and Marino, J. P. (2019). Assessment of university students’ critical thinking: next generation performance assessment. Int. J. Test. 19, 337–362. doi: 10.1080/15305058.2018.1543309

Shavelson, R. J., Zlatkin-Troitschanskaia, O., and Marino, J. P. (2018). “International performance assessment of learning in higher education (iPAL): research and development,” in Assessment of Learning Outcomes in Higher Education: Cross-National Comparisons and Perspectives , eds O. Zlatkin-Troitschanskaia, M. Toepper, H. A. Pant, C. Lautenbach, and C. Kuhn (Berlin: Springer), 193–214. doi: 10.1007/978-3-319-74338-7_10

Shavelson, R. J., Klein, S., and Benjamin, R. (2009). The limitations of portfolios. Inside Higher Educ. Available online at: https://www.insidehighered.com/views/2009/10/16/limitations-portfolios

Stolzenberg, E. B., Eagan, M. K., Zimmerman, H. B., Berdan Lozano, J., Cesar-Davis, N. M., Aragon, M. C., et al. (2019). Undergraduate Teaching Faculty: The HERI Faculty Survey 2016–2017. Los Angeles, CA: UCLA.

Tessier-Lavigne, M. (2020). Putting Ethics at the Heart of Innovation. Stanford, CA: Stanford Magazine.

Wheeler, P., and Haertel, G. D. (1993). Resource Handbook on Performance Assessment and Measurement: A Tool for Students, Practitioners, and Policymakers. Palm Coast, FL: Owl Press.

Wineburg, S., McGrew, S., Breakstone, J., and Ortega, T. (2016). Evaluating Information: The Cornerstone of Civic Online Reasoning. Executive Summary. Stanford, CA: Stanford History Education Group.

Zahner, D. (2013). Reliability and Validity–CLA+. Council for Aid to Education. Available online at:: https://pdfs.semanticscholar.org/91ae/8edfac44bce3bed37d8c9091da01d6db3776.pdf .

Zlatkin-Troitschanskaia, O., and Shavelson, R. J. (2019). Performance assessment of student learning in higher education [Special issue]. Br. J. Educ. Psychol. 89, i–iv, 413–563.

Zlatkin-Troitschanskaia, O., Pant, H. A., Lautenbach, C., Molerov, D., Toepper, M., and Brückner, S. (2017). Modeling and Measuring Competencies in Higher Education: Approaches to Challenges in Higher Education Policy and Practice. Berlin: Springer VS.

Zlatkin-Troitschanskaia, O., Pant, H. A., Toepper, M., and Lautenbach, C. (eds) (2020). Student Learning in German Higher Education: Innovative Measurement Approaches and Research Results. Wiesbaden: Springer.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., and Pant, H. A. (2018). “Assessment of learning outcomes in higher education: international comparisons and perspectives,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 686–697.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., Schmidt, S., and Beck, K. (2019). On the complementarity of holistic and analytic approaches to performance assessment scoring. Br. J. Educ. Psychol. 89, 468–484. doi: 10.1111/bjep.12286

Keywords : critical thinking, performance assessment, assessment framework, scoring rubric, evidence-centered design, 21st century skills, higher education

Citation: Braun HI, Shavelson RJ, Zlatkin-Troitschanskaia O and Borowiec K (2020) Performance Assessment of Critical Thinking: Conceptualization, Design, and Implementation. Front. Educ. 5:156. doi: 10.3389/feduc.2020.00156

Received: 30 May 2020; Accepted: 04 August 2020; Published: 08 September 2020.

Reviewed by:

Copyright © 2020 Braun, Shavelson, Zlatkin-Troitschanskaia and Borowiec. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Henry I. Braun, [email protected]

This article is part of the Research Topic

Assessing Information Processing and Online Reasoning as a Prerequisite for Learning in Higher Education

Promoting and Assessing Critical Thinking

Critical thinking is a high priority outcome of higher education – critical thinking skills are crucial for independent thinking and problem solving in both our students’ professional and personal lives. But, what does it mean to be a critical thinker and how do we promote and assess it in our students? Critical thinking can be defined as being able to examine an issue by breaking it down, and evaluating it in a conscious manner, while providing arguments/evidence to support the evaluation. Below are some suggestions for promoting and assessing critical thinking in our students.

Thinking through inquiry

Asking questions and using the answers to understand the world around us is what drives critical thinking. In inquiry-based instruction, the teacher asks students leading questions to draw from them information, inferences, and predictions about a topic. Below are some example generic question stems that can serve as prompts to aid in generating critical thinking questions. Consider providing prompts such as these to students to facilitate their ability to also ask these questions of themselves and others. If we want students to generate good questions on their own, we need to teach them how to do so by providing them with the structure and guidance of example questions, whether in written form, or by our use of questions in the classroom.

Generic question stems

What are the strengths and weaknesses of …?
What is the difference between … and …?
Explain why/how …?
What would happen if …?
What is the nature of …?
Why is … happening?
What is a new example of …?
How could … be used to …?
What are the implications of …?
What is … analogous to?
What do we already know about …?
How does … affect …?
How does … tie in with what we have learned before?
What does … mean?
Why is … important?
How are … and … similar/different?
How does … apply to everyday life?
What is a counterarguement for …?
What is the best …and why?
What is a solution to the problem of …?
Compare … and … with regard to …?
What do you think causes …? Why?
Do you agree or disagree with this statement? What evidence is there to support your answer?
What is another way to look at …?

Critical thinking through writing

Another essential ingredient in critical thinking instruction is the use of writing. Writing converts students from passive to active learners and requires them to identify issues and formulate hypotheses and arguments. The act of writing requires students to focus and clarify their thoughts before putting them down on paper, hence taking them through the critical thinking process. Writing requires that students make important critical choices and ask themselves (Gocsik, 2002):

What information is most important?
What might be left out?
What is it that I think about this subject?
How did I arrive at what I think?
What are my assumptions? Are they valid?
How can I work with facts, observations, and so on, in order to convince others of what I think?
What do I not yet understand?

Consider providing the above questions to students so that they can evaluate their own writing as well. Some suggestions for critical thinking writing activities include:

Give students raw data and ask them to write an argument or analysis based on the data.
Have students explore and write about unfamiliar points of view or “what if” situations.
Think of a controversy in your field, and have the students write a dialogue between characters with different points of view.
Select important articles in your field and ask the students to write summaries or abstracts of them. Alternately, you could ask students to write an abstract of your lecture.
Develop a scenario that place students in realistic situations relevant to your discipline, where they must reach a decision to resolve a conflict.

See the Centre for Teaching Excellence (CTE) teaching tip “ Low-Stakes Writing Assignments ” for critical thinking writing assignments.

Critical thinking through group collaboration

Opportunities for group collaboration could include discussions, case studies, task-related group work, peer review, or debates. Group collaboration is effective for promoting critical thought because:

An effective team has the potential to produce better results than any individual,
Students are exposed to different perspectives while clarifying their own ideas,
Collaborating on a project or studying with a group for an exam generally stimulates interest and increases the understanding and knowledge of the topic.

See the CTE teaching tip “ Group Work in the Classroom: Types of Small Groups ” for suggestions for forming small groups in your classroom.

Assessing critical thinking skills

You can also use the students’ responses from the activities that promote critical thinking to assess whether they are, indeed, reaching your critical thinking goals. It is important to establish clear criteria for evaluating critical thinking. Even though many of us may be able to identify critical thinking when we see it, explicitly stated criteria help both students and teachers know the goal toward which they are working. An effective criterion measures which skills are present, to what extent, and which skills require further development. The following are characteristics of work that may demonstrate effective critical thinking:

Accurately and thoroughly interprets evidence, statements, graphics, questions, literary elements, etc.
Asks relevant questions.
Analyses and evaluates key information, and alternative points of view clearly and precisely.
Fair-mindedly examines beliefs, assumptions, and opinions and weighs them against facts.
Draws insightful, reasonable conclusions.
Justifies inferences and opinions.
Thoughtfully addresses and evaluates major alternative points of view.
Thoroughly explains assumptions and reasons.

It is also important to note that assessment is a tool that can be used throughout a course, not just at the end. It is more useful to assess students throughout a course, so you can see if criteria require further clarification and students can test out their understanding of your criteria and receive feedback. Also consider distributing your criteria with your assignments so that students receive guidance about your expectations. This will help them to reflect on their own work and improve the quality of their thinking and writing.

See the CTE teaching tip sheets “ Rubrics ” and “ Responding to Writing Assignments: Managing the Paper Load ” for more information on rubrics.

If you would like support applying these tips to your own teaching, CTE staff members are here to help. View the CTE Support page to find the most relevant staff member to contact.

Gocsik, K. (2002). Teaching Critical Thinking Skills. UTS Newsletter, 11(2):1-4
Facione, P.A. and Facione, N.C. (1994). Holistic Critical Thinking Scoring Rubric. Millbrae, CA: California Academic Press. www.calpress.com/rubric.html (retrieved September 2003)
King, A. (1995). Inquiring minds really do want to know: using questioning to teach critical thinking. Teaching of Psychology, 22(1): 13-17
Wade, C. and Tavris, C. (1987). Psychology (1st ed.) New York: Harper. IN: Wade, C. (1995). Using Writing to Develop and Assess Critical Thinking. Teaching of Psychology, 22(1): 24-28.

Catalog search

Teaching tip categories.

Assessment and feedback
Blended Learning and Educational Technologies
Career Development
Course Design
Course Implementation
Inclusive Teaching and Learning
Learning activities
Support for Student Learning
Support for TAs

Measuring, Assessing and Evaluating Thinking Skills in Educational Settings: A Necessity for Twenty-First Century

First Online: 02 January 2023

Cite this chapter

Yalçın Dilekli ORCID: orcid.org/0000-0003-0264-0231 3 , 5 &
Erdoğan Tezci ORCID: orcid.org/0000-0003-2055-0192 4

Part of the book series: Integrated Science ((IS,volume 13))

672 Accesses

1 Citations

The twenty-first century is very different from other centuries as mobilization of the people is very rapid, and the spread of knowledge has reached unbelievable levels, which has caused rapid changes in all disciplines. Under these circumstances, the traditional educational philosophy cannot meet the needs of the twenty-first century. The modern approach advocates that education should aim to raise thinking generations. A thinking generation means that individuals should be competent in terms of analytical, critical, and creative thinking, decision-making, and problem-solving. The transition from the traditional educational approach to the modern one has resulted in some problems called barriers in teaching thinking skills. These barriers can be summed up as attitudes of schools’ administrators, teachers, and parents. The major reason for this attitude is the difficulty of measuring and assessing these skills. The skills cannot be measured and assessed by traditional ways that measure remembering and memorizing. However, 21st-century skills are higher-order analysis, evaluation, and creation. These skills are measured and assessed using multiple-choice and open-ended questions and performance-based works. Furthermore, the evaluation process is not from a single perspective; instead, it is from multiple perspectives as self-evaluation, peer evaluation, and teacher evaluation. This chapter focuses on how teachers can measure and evaluate students’ thinking skills by focusing on concrete examples.

Graphical Abstract/Art Performance

Thinking skills.

Education is not learning of facts, but training of mind to think . Albert Einstein

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info
Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Scoping Review of Empirical Research on Recent Computational Thinking Assessments

Assessment of Real-World Problem-Solving and Critical Thinking Skills in a Technology Education Classroom

Edward de Bono

Onosko JJ (1991) Barriers to the promotion of higher-order thinking in social studies. Theor Res Soc Educ 19(4):341–366

Article Google Scholar

Baumfield V, Oberski I (1998) What do teachers think about thinking skills? Qual Assur Educ 6(1):44–51

Dilekli Y (2015) The relationship among teachers’ classroom practices for teaching thinking skills, teachers’ self-efficacy towards teaching thinking skills and teachers’ teaching styles. Unpublished Doctoral Dissertation, Balikesir University, Balikesir, Turkey

Google Scholar

Hashim R (2003) Malaysian teachers’ attitudes, competency and practices in the teaching of thinking. Intellectual Discourse 11(1):27–50

Dilekli Y (2019) Teaching thinking skills with activities. Pegem Publishing, Ankara, Turkey

Dilekli Y, Tezci E (2016) The relationship among teachers’ classroom practices for teaching thinking skills, teachers’ self-efficacy towards teaching thinking skills and teachers’ teaching styles. Thinking Skills Creativity 21:144–151

Dilekli Y (2019) What are the dimensions of thinking skills in Turkish literature? A content analysis study. Int J Eval Res Educ 8(1):110–118

Care E, Griffin P, McGaw B (2012) Assessment and teaching of 21st century skills. Springer, Melbourne, AU

Costa AL (2001) Developing minds: a resource book for teaching thinking. ASCD, Melbourne, AU

Swartz RJ, Parks S (1994) Infusing the teaching of critical and creative thinking into content instruction: a lesson design handbook for the elementary grades. ERIC, Critical Thinking & Software Pacific Grove CA

McGuinness C (1999) From thinking skills to thinking classrooms: a review and evaluation of approaches for developing pupils’ thinking. Department for Education and Employment London, Nottingam, UK

Sternberg RJ (1987) Teaching critical thinking: eight easy ways to fail before you begin. Phi Delta Kappan 68(6):456–459

Coffman DM (2013) Thinking about thinking: an exploration of preservice teachers’ views about higher order thinking skills. Unpublished Doctoral Dissertation, University of Kansas, USA

Nair S, Ngang TK (2012) Exploring parents’ and teachers’ views of primary pupils’ thinking skills and problem solving skills. Creat Educ 3(01):30–36

Tebbs TJ (2000) Assessing teachers’ self-efficacy towards teaching thinking skills. Unpublished Doctoral Dissertation, University of Connecticut, USA

Jürges H, Schneider K (2010) Central exit examinations increase performance... but take the fun out of mathematics. J Popul Econ 23 (2):497–517

Kızkapan O, Nacaroğlu O (2019) Science teacher science teachers’ opinions about central exams (LGS). Nevşehir Hacı Bektaş Veli Üniversitesi SBE Dergisi 9(2):701–719

McGuinnes C, Sheey N, Curry C, Eakin A (2003) ACTs II Sustainable thinking in classrooms: a methodology for enhancing thinking across the curriculum. Materials available from Professor C McGuinness, School of Psychology, Queen’s University, Belfast, Northern Ireland

Dilekli Y, Tezci E (2019) Adaptation of teachers’ teaching thinking practices scale into English. Euro J Educ Res 8(4):943–953

Brookhart SM (2010) How to assess higher-order thinking skills in your classroom. ASCD, Melbourne, Australia

Norris SP, Ennis RH (1989) Evaluating critical thinking. The practitioners’ guide to teaching thinking series. Critical Thinking Press & Software Pacific Grove, CA

Dwyer C, Hogan MJ, Stewart I (2011) The promotion of critical thinking skills through argument mapping. Nova Science, Australia

Facione PA (1990) Critical thinking: a statement of expert consensus for purposes of educational assessment and instruction. Research Findings and Recommendations. The Delphi Report Millbrae, California, USA

Anderson LW, Krathwohl DR, Airasian PW, Cruikshank KA, Mayer RE, Pintrich PR, Raths J, Wittrock MC (2001) A taxonomy for learning, teaching, and assessing: a revision of bloom’s taxonomy of educational objectives, abridged edition. Longman, White Plains, NY

McMillan JH, Myran S, Workman D (2002) Elementary teachers’ classroom assessment and grading practices. J Educ Res 95(4):203–213

Sasson I, Yehuda I, Malkinson N (2018) Fostering the skills of critical thinking and question-posing in a project-based learning environment. Thinking Skills Creativity 29:203–212

Perry A, Karpova E (2017) Efficacy of teaching creative thinking skills: A comparison of multiple creativity assessments. Thinking Skills Creativity 24:118–126

Jaušovec N, Jaušovec K (2000) Differences in resting EEG related to ability. Brain Topogr 12(3):229–240

Royer JM, Feldman RS, Chase P, Schulze K (1984) Instructor’s manual for educational psychology: applications and theory. Random House, New York, USA

Tezci E (2002) The effects of constructivist instructional design on the success and creativity of fifth-year students in primary schools. Unpublished doctoral dissertation Fırat University, Elazığ, Turkey

Sweller J (2009) Cognitive bases of human creativity. Educ Psychol Rev 21(1):11–19

Azzam AM (2009) Why creativity now? A conversation with Sir Ken Robinson. Educ Leadersh 67(1):22–26

Lee YC (2007) Developing decision-making skills for socio-scientific issues. J Biol Educ 41(4):170–177

Lee B, Lee Y (2020) A study examining the effects of a training program focused on problem-solving skills for young adults. Thinking Skills Creativity 37:100692

Runco MA, Acar S, Cayirdag N (2017) A closer look at the creativity gap and why students are less creative at school than outside of school. Thinking Skills Creativity 24:242–249

Bransford JD, Stein BS (1993) The IDEAL problem solver. W. H. Freeman, New York, USA

Download references

Author information

Authors and affiliations.

Department of Educational Science, Faculty of Education, Aksaray University, Aksaray, Turkey

Yalçın Dilekli

Necatibey Faculty of Education, Balıkesir University, Balıkesir, Turkey

Erdoğan Tezci

Bahçesaray M. Merkez Kampüs, Aksaray, Turkey

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yalçın Dilekli .

Editor information

Editors and affiliations.

Universal Scientific Education and Research Network (USERN), Stockholm, Sweden

Nima Rezaei

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Dilekli, Y., Tezci, E. (2022). Measuring, Assessing and Evaluating Thinking Skills in Educational Settings: A Necessity for Twenty-First Century. In: Rezaei, N. (eds) Integrated Education and Learning. Integrated Science, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-031-15963-3_22

Download citation

DOI : https://doi.org/10.1007/978-3-031-15963-3_22

Published : 02 January 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-15962-6

Online ISBN : 978-3-031-15963-3

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Already have an account?

Measuring Critical Thinking: Can It Be Done?

11 may measuring critical thinking: can it be done, why should we measure critical thinking, critical thinking is an objective of education.

An important reason for measuring critical thinking is that it is a key part of education, whether it is explicitly mentioned within the curriculum or not. The UK’s Framework for Higher Education Qualifications (FHEQ) includes descriptors that are specific to critical thinking, including:

“Critically evaluate arguments, assumptions, abstract concepts and data (that may be incomplete), to make judgements, and to frame appropriate questions to achieve a solution – or identify a range of solutions – to a problem.”

This illustrates the importance of critical thinking for academic success in higher education.

In the USA, critical thinking is actively embedded within the curriculum at a secondary school level in the ‘Common Core’ learning objectives. Students are required to learn to make strong arguments and to be able to structure them in a way that leads smoothly to support a conclusion. Including critical thinking in the secondary curriculum is important, since experts have found that critical thinking should be taught prior to higher education in order to be effective. This makes it important to include within learning objectives and assessment criteria, as a measure of academic achievement.

Employers Look for Critical Thinking Skills

Critical thinking is also essential for professional success. As a result, it is no surprise that a number of prominent employers test critical thinking skills as part of their recruitment process. The Watson Glaser Test, discussed in our article about measuring critical thinking , has been used by the Bank of England and the Government Legal Service among other employers. Given that critical thinking has an increasing role in employability, it makes sense that critical thinking should be taught and tested among students in education.

However, critical thinking is an abstract and complex term, which makes it more difficult to measure and assess than other essential skills like reading and writing. The recognition that critical thinking is an important skill has led to the development of a number of assessment methods, although all of these methods have considerable limitations.

How is Critical Thinking Measured?

Traditional measurement of critical thinking in education.

Traditionally, a student’s ability to think critically has been measured through essay-based assessment within humanities, arts and social sciences subjects. As a result, critical thinking ability is only an aspect of the assessment: students are simultaneously marked on their subject knowledge, reading and writing skills. Therefore, providing sufficient feedback about each aspect of the assessment is incredibly time intensive for teachers. With limited feedback from teachers, students may experience a cycle of writing essays and receiving negative feedback, without fully understanding why their argument is weak or how it can be improved.

This can lead to confusion and result in limited improvement in later essays as a result. For students who struggle with reading and writing, this challenge is only exacerbated. Furthermore, this method fails to produce a separate, useful measurement of critical thinking that teachers can use to identify problems and drive improvement.

Critical Thinking as an A Level Subject

The momentum for assessing critical thinking skills reached a peak in the UK with the introduction of school subjects dedicated to critical thinking. A number of exam boards included a dedicated A Level subject in the early 2000s, but only Cambridge International’s “Thinking Skills” curriculum remains. For this A Level, students are examined using a familiar exam format, with a mixture of short answer questions and essay style questions. Two papers, “critical thinking” and “applied reasoning” assess critical thinking, while the others focus on problem solving, with a greater focus on numerical questions.

Undoubtedly, students have more time to develop critical thinking skills through studying a dedicated course. However, there are problems with this approach. Despite the use of shorter questions, strong reading and writing abilities are still necessary and this heavily impacts the criteria for achieving high marks. There is also a high work load for the teacher. Additionally, choosing to teach or study this subject means choosing against more traditional or accessible subjects. But UK universities do not accept it as part of their entrance requirements as they want to see students demonstrate their critical thinking within the traditional subjects. For these reasons, critical thinking as a subject has failed to make it into the curriculum in most schools.

Critical Thinking Tests for Employment

Some critical thinking tests are designed as a way of evaluating job applicants or of testing employee skills, rather than to test structured learning of critical thinking. The Watson Glaser test is perhaps the most well-known critical thinking test of this kind and is used most commonly by law employers.

The Watson Glaser test uses short multiple choice questions, which reduces verbal load and makes marking easier. The results can be transformed easily into statistics for the quantitative measure of critical thinking among a sample group. However, this means the questions are simple, so the test does not measure proficiency in thinking critically about longer, more complex arguments. The questions are also based on scenarios that may not be familiar to exam takers. Other employment tests use a similar format, although many limit critical thinking to a small section of the test.

Summarising the Strengths and Weaknesses of Critical Thinking Tests

The table above summarises the strengths and weaknesses of each critical thinking test in this article. To learn more about measuring critical thinking, click here to read our longer article.

Look out for our next blog post, where we discuss how to improve the measurement of critical thinking.

Images from Armin Rimoldi, cottonbro and Polina Tan Kilevitch on Pexels

Privacy Overview

Yes, We Can Define, Teach, and Assess Critical Thinking Skills

By Jeff Heyck-Williams, Director of Curriculum and Instruction at Two Rivers Public Charter School

Before I dive into what we have done, I want to acknowledge that some of the criticism has merit.

Defining Critical Thinking Skills

Teaching Critical Thinking Skills

Kathryn Mancino, a 5th grade teacher at Two Rivers, has deliberately taught three of our thinking routines to students using the anchor charts above (click to view a larger size of each image). Her charts name the components of each routine and has a place for students to record when they’ve used it and what they have figured out about the routine. By using this structure with a chart that can be added to throughout the year, students see the routines as broadly applicable across disciplines and are able to refine their application over time.

Assessing Critical Thinking Skills

While it is clearly difficult, and we have not solved all of the challenges to scaling assessments of critical thinking, we can define, teach, and assess these skills. In fact, knowing how important they are for the economy of the future and our democracy, it is essential that we do.

The opinions expressed in Next Gen Learning in Action are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Sign Up for The Savvy Principal

Search Search Search …
Search Search …

Critical Thinking Models: A Comprehensive Guide for Effective Decision Making

Critical thinking models are valuable frameworks that help individuals develop and enhance their critical thinking skills . These models provide a structured approach to problem-solving and decision-making by encouraging the evaluation of information and arguments in a logical, systematic manner. By understanding and applying these models, one can learn to make well-reasoned judgments and decisions.

Various critical thinking models exist, each catering to different contexts and scenarios. These models offer a step-by-step method to analyze situations, scrutinize assumptions and biases, and consider alternative perspectives. Ultimately, the goal of critical thinking models is to enhance an individual’s ability to think critically, ultimately improving their reasoning and decision-making skills in both personal and professional settings.

Key Takeaways

Critical thinking models provide structured approaches for enhancing decision-making abilities
These models help individuals analyze situations, scrutinize assumptions, and consider alternative perspectives
The application of critical thinking models can significantly improve one’s reasoning and judgment skills.

Fundamentals of Critical Thinking

Definition and Importance

Critical thinking is the intellectual process of logically, objectively, and systematically evaluating information to form reasoned judgments, utilizing reasoning , logic , and evidence . It involves:

Identifying and questioning assumptions,
Applying consistent principles and criteria,
Analyzing and synthesizing information,
Drawing conclusions based on evidence.

The importance of critical thinking lies in its ability to help individuals make informed decisions, solve complex problems, and differentiate between true and false beliefs .

Core Cognitive Skills

Several core cognitive skills underpin critical thinking:

Analysis : Breaking down complex information into smaller components to identify patterns or inconsistencies.
Evaluation : Assessing the credibility and relevance of sources, arguments, and evidence.
Inference : Drawing conclusions by connecting the dots between analyzed information.
Synthesis : Incorporating analyzed information into a broader understanding and constructing one’s argument.
Logic and reasoning : Applying principles of logic to determine the validity of arguments and weigh evidence.

These skills enable individuals to consistently apply intellectual standards in their thought process, which ultimately results in sound judgments and informed decisions.

Influence of Cognitive Biases

A key aspect of critical thinking is recognizing and mitigating the impact of cognitive biases on our thought processes. Cognitive biases are cognitive shortcuts or heuristics that can lead to flawed reasoning and distort our understanding of a situation. Examples of cognitive biases include confirmation bias, anchoring bias, and availability heuristic.

To counter the influence of cognitive biases, critical thinkers must be aware of their own assumptions and strive to apply consistent and objective evaluation criteria in their thinking process. The practice of actively recognizing and addressing cognitive biases promotes an unbiased and rational approach to problem-solving and decision-making.

The Critical Thinking Process

Stages of Critical Thinking

The critical thinking process starts with gathering and evaluating data . This stage involves identifying relevant information and ensuring it is credible and reliable. Next, an individual engages in analysis by examining the data closely to understand its context and interpret its meaning. This step can involve breaking down complex ideas into simpler components for better understanding.

The next stage focuses on determining the quality of the arguments, concepts, and theories present in the analyzed data. Critical thinkers question the credibility and logic behind the information while also considering their own biases and assumptions. They apply consistent standards when evaluating sources, which helps them identify any weaknesses in the arguments.

Values play a significant role in the critical thinking process. Critical thinkers assess the significance of moral, ethical, or cultural values shaping the issue, argument, or decision at hand. They determine whether these values align with the evidence and logic they have analyzed.

After thorough analysis and evaluation, critical thinkers draw conclusions based on the evidence and reasoning gathered. This step includes synthesizing the information and presenting a clear, concise argument or decision. It also involves explaining the reasoning behind the conclusion to ensure it is well-founded.

Application in Decision Making

In decision making, critical thinking is a vital skill that allows individuals to make informed choices. It enables them to:

Analyze options and their potential consequences
Evaluate the credibility of sources and the quality of information
Identify biases, assumptions, and values that may influence the decision
Construct a reasoned, well-justified conclusion

By using critical thinking in decision making, individuals can make more sound, objective choices. The process helps them to avoid pitfalls like jumping to conclusions, being influenced by biases, or basing decisions on unreliable data. The result is more thoughtful, carefully-considered decisions leading to higher quality outcomes.

Critical Thinking Models

Critical thinking models are frameworks that help individuals develop better problem-solving and decision-making abilities. They provide strategies for analyzing, evaluating, and synthesizing information to reach well-founded conclusions. This section will discuss four notable models: The RED Model, Bloom’s Taxonomy, Paul-Elder Model, and The Halpern Critical Thinking Assessment.

The RED Model

The RED Model stands for Recognize Assumptions, Evaluate Arguments, and Draw Conclusions. It emphasizes the importance of questioning assumptions, weighing evidence, and reaching logical conclusions.

Recognize Assumptions: Identify and challenge assumptions that underlie statements, beliefs, or arguments.
Evaluate Arguments: Assess the validity and reliability of evidence to support or refute claims.
Draw Conclusions: Make well-reasoned decisions based on available information and sound reasoning.

The RED Model helps individuals become more effective problem solvers and decision-makers by guiding them through the critical thinking process ^(source) .

Bloom’s Taxonomy

Bloom’s Taxonomy is a hierarchical model that classifies cognitive skills into six levels of complexity. These levels are remembering, understanding, applying, analyzing, evaluating, and creating. By progressing through these levels, individuals can develop higher-order thinking skills.

Remembering: Recall information or facts.
Understanding: Comprehend the meaning of ideas, facts, or problems.
Applying: Use knowledge in different situations.
Analyzing: Break down complex topics or problems into sub-parts.
Evaluating: Assess the quality, relevance, or credibility of information, ideas, or solutions.
Creating: Combine elements to form a new whole, generate new ideas, or solve complex issues.

Paul-Elder Model

The Paul-Elder Model introduces the concept of “elements of thought,” focusing on a structured approach to critical thinking. This model promotes intellectual standards, such as clarity, accuracy, and relevance. It consists of three stages:

Critical Thinking: Employ the intellectual standards to problem-solving and decision-making processes.
Elements of Thought: Consider purpose, question at issue, information, interpretation and inference, concepts, assumptions, implications, and point of view.
Intellectual Traits: Develop intellectual traits, such as intellectual humility, intellectual empathy, and intellectual perseverance.

This model fosters a deeper understanding and appreciation of critical thinking ^(source) .

The Halpern Critical Thinking Assessment

The Halpern Critical Thinking Assessment is a standardized test developed by Diane Halpern to assess critical thinking skills. The evaluation uses a variety of tasks to measure abilities in core skill areas, such as verbal reasoning, argument analysis, and decision making. Pearson, a leading publisher of educational assessments, offers this test as a means to assess individuals’ critical thinking skills ^(source) .

These four critical thinking models can be used as frameworks to improve and enhance cognitive abilities. By learning and practicing these models, individuals can become better equipped to analyze complex information, evaluate options, and make well-informed decisions.

Evaluating Information and Arguments

In this section, we will discuss the importance of evaluating information and arguments in the process of critical thinking, focusing on evidence assessment, logic and fallacies, and argument analysis.

Evidence Assessment

Evaluating the relevance, accuracy, and credibility of information is a vital aspect of critical thinking. In the process of evidence assessment, a thinker should consider the following factors:

Source reliability : Research and understand the expertise and credibility of the source to ensure that biased or inaccurate information is not being considered.
Currency : Check the date of the information to make sure it is still relevant and accurate in the present context.
Objectivity : Analyze the information for potential bias and always cross-reference it with other credible sources.

When practicing critical thinking skills, it is essential to be aware of your own biases and make efforts to minimize their influence on your decision-making process.

Logic and Fallacies

Logic is crucial for deconstructing and analyzing complex arguments, while identifying and avoiding logical fallacies helps maintain accurate and valid conclusions. Some common fallacies to watch out for in critical thinking include:

Ad Hominem : Attacking the person making the argument instead of addressing the argument itself.
Strawman : Misrepresenting an opponent’s argument to make it easier to refute.
False Dilemma : Presenting only two options when there may be multiple viable alternatives.
Appeal to Authority : Assuming a claim is true simply because an authority figure supports it.

Being aware of these fallacies enables a thinker to effectively evaluate the strength of an argument and make sound judgments accordingly.

Argument Analysis

Analyzing an argument is the process of evaluating its structure, premises, and conclusion while determining its validity and soundness. To analyze an argument, follow these steps:

Identify the premises and conclusion : Determine the main point is being argued, how it is related and substance of the argument.
Evaluate the validity : Assess whether the conclusion logically follows from the premises and if the argument’s structure is sound.
Test the soundness : Evaluate the truth and relevance of the premises. This may require verifying the accuracy of facts and evidence, as well as assessing the reliability of sources.
Consider counter-arguments : Identify opposing viewpoints and counter-arguments, and evaluate their credibility to gauge the overall strength of the original argument.

By effectively evaluating information and arguments, critical thinkers develop a solid foundation for making well-informed decisions and solving problems.

Enhancing Critical Thinking

Strategies for improvement.

To enhance critical thinking, individuals can practice different strategies, including asking thought-provoking questions, analyzing ideas and observations, and being open to different perspectives. One effective technique is the Critical Thinking Roadmap , which breaks critical thinking down into four measurable phases: execute, synthesize, recommend, and communicate. It’s important to use deliberate practice in these areas to develop a strong foundation for problem-solving and decision-making. In addition, cultivating a mindset of courage , fair-mindedness , and empathy will support critical thinking development.

Critical Thinking in Education

In the field of education, critical thinking is an essential component of effective learning and pedagogy. Integrating critical thinking into the curriculum encourages student autonomy, fosters innovation, and improves student outcomes. Teachers can use various approaches to promote critical thinking, such as:

Employing open-ended questions to stimulate ideas
Incorporating group discussions or debates to facilitate communication and evaluation of viewpoints
Assessing and providing feedback on student work to encourage reflection and improvement
Utilizing real-world scenarios and case studies for practical application of concepts

Developing a Critical Thinking Mindset

To truly enhance critical thinking abilities, it’s important to adopt a mindset that values integrity , autonomy , and empathy . These qualities help to create a learning environment that encourages open-mindedness, which is key to critical thinking development. To foster a critical thinking mindset:

Be curious : Remain open to new ideas and ask questions to gain a deeper understanding.
Communicate effectively : Clearly convey thoughts and actively listen to others.
Reflect and assess : Regularly evaluate personal beliefs and assumptions to promote growth.
Embrace diversity of thought : Welcome different viewpoints and ideas to foster innovation.

Incorporating these approaches can lead to a more robust critical thinking skillset, allowing individuals to better navigate and solve complex problems.

Critical Thinking in Various Contexts

The workplace and beyond.

Critical thinking is a highly valued skill in the workplace, as it enables employees to analyze situations, make informed decisions, and solve problems effectively. It involves a careful thinking process directed towards a specific goal. Employers often seek individuals who possess strong critical thinking abilities, as they can add significant value to the organization.

In the workplace context, critical thinkers are able to recognize assumptions, evaluate arguments, and draw conclusions, following models such as the RED model . They can also adapt their thinking to suit various scenarios, allowing them to tackle complex and diverse problems.

Moreover, critical thinking transcends the workplace and applies to various aspects of life. It empowers an individual to make better decisions, analyze conflicting information, and engage in constructive debates.

Creative and Lateral Thinking

Critical thinking encompasses both creative and lateral thinking. Creative thinking involves generating novel ideas and solutions to problems, while lateral thinking entails looking at problems from different angles to find unique and innovative solutions.

Creative thinking allows thinkers to:

Devise new concepts and ideas
Challenge conventional wisdom
Build on existing knowledge to generate innovative solutions

Lateral thinking, on the other hand, encourages thinkers to:

Break free from traditional thought patterns
Combine seemingly unrelated ideas to create unique solutions
Utilize intuition and intelligence to approach problems from a different perspective

Both creative and lateral thinking are essential components of critical thinking, allowing individuals to view problems in a holistic manner and generate well-rounded solutions. These skills are highly valued by employers and can lead to significant personal and professional growth.

In conclusion, critical thinking is a multifaceted skill that comprises various thought processes, including creative and lateral thinking. By embracing these skills, individuals can excel in the workplace and in their personal lives, making better decisions and solving problems effectively.

Overcoming Challenges

Recognizing and addressing bias.

Cognitive biases and thinking biases can significantly affect the process of critical thinking . One of the key components of overcoming these challenges is to recognize and address them. It is essential to be aware of one’s own beliefs, as well as the beliefs of others, to ensure fairness and clarity throughout the decision-making process. To identify and tackle biases, one can follow these steps:

Be self-aware : Understand personal beliefs and biases, acknowledging that they may influence the interpretation of information.
Embrace diverse perspectives : Encourage open discussions and invite different viewpoints to challenge assumptions and foster cognitive diversity.
Reevaluate evidence : Continuously reassess the relevance and validity of the information being considered.

By adopting these practices, individuals can minimize the impact of biases and enhance the overall quality of their critical thinking skills.

Dealing with Information Overload

In today’s world, information is abundant, and it can become increasingly difficult to demystify and make sense of the available data. Dealing with information overload is a crucial aspect of critical thinking. Here are some strategies to address this challenge:

Prioritize information : Focus on the most relevant and reliable data, filtering out unnecessary details.
Organize data : Use tables, charts, and lists to categorize information and identify patterns more efficiently.
Break down complex information : Divide complex data into smaller, manageable segments to simplify interpretation and inferences.

By implementing these techniques, individuals can effectively manage information overload, enabling them to process and analyze data more effectively, leading to better decision-making.

In conclusion, overcoming challenges such as biases and information overload is essential in the pursuit of effective critical thinking. By recognizing and addressing these obstacles, individuals can develop clarity and fairness in their thought processes, leading to well-informed decisions and improved problem-solving capabilities.

Measuring Critical Thinking

Assessment tools and criteria.

There are several assessment tools designed to measure critical thinking, each focusing on different aspects such as quality, depth, breadth, and significance of thinking. One example of a widely used standardized test is the Watson-Glaser Critical Thinking Appraisal , which evaluates an individual’s ability to interpret information, draw conclusions, and make assumptions. Another test is the Cornell Critical Thinking Tests Level X and Level Z , which assess an individual’s critical thinking skills through multiple-choice questions.

Furthermore, criteria for assessing critical thinking often include precision, relevance, and the ability to gather and analyze relevant information. Some assessors utilize the Halpern Critical Thinking Assessment , which measures the application of cognitive skills such as deduction, observation, and induction in real-world scenarios.

The Role of IQ and Tests

It’s important to note that intelligence quotient (IQ) tests and critical thinking assessments are not the same. While IQ tests aim to measure an individual’s cognitive abilities and general intelligence, critical thinking tests focus specifically on one’s ability to analyze, evaluate, and form well-founded opinions. Therefore, having a high IQ does not necessarily guarantee strong critical thinking skills, as critical thinking requires additional mental processes beyond basic logical reasoning.

To build and enhance critical thinking skills, individuals should practice and develop higher-order thinking, such as critical alertness, critical reflection, and critical analysis. Using a Critical Thinking Roadmap , such as the four-phase framework that includes execution, synthesis, recommendation, and the ability to apply, individuals can continuously work to improve their critical thinking abilities.

Frequently Asked Questions

What are the main steps involved in the paul-elder critical thinking model.

The Paul-Elder Critical Thinking Model is a comprehensive framework for developing critical thinking skills. The main steps include: identifying the purpose, formulating questions, gathering information, identifying assumptions, interpreting information, and evaluating arguments. The model emphasizes clarity, accuracy, precision, relevance, depth, breadth, logic, and fairness throughout the critical thinking process. By following these steps, individuals can efficiently analyze and evaluate complex ideas and issues.

Can you list five techniques to enhance critical thinking skills?

Here are five techniques to help enhance critical thinking skills:

Ask open-ended questions : Encourages exploration and challenges assumptions.
Engage in active listening: Focus on understanding others’ viewpoints before responding.
Reflect on personal biases: Identify and question any preconceived notions or judgments.
Practice mindfulness: Develop self-awareness and stay present in the moment.
Collaborate with others: Exchange ideas and learn from diverse perspectives.

What is the RED Model of critical thinking and how is it applied?

The RED Model of critical thinking consists of three key components: Recognize Assumptions, Evaluate Arguments, and Draw Conclusions. To apply the RED Model, begin by recognizing and questioning underlying assumptions, being aware of personal biases and stereotypes. Next, evaluate the strengths and weaknesses of different arguments, considering evidence, logical consistency, and alternative explanations. Lastly, draw well-reasoned conclusions that are based on the analysis and evaluation of the information gathered.

How do the ‘3 C’s’ of critical thinking contribute to effective problem-solving?

The ‘3 C’s’ of critical thinking – Curiosity, Creativity, and Criticism – collectively contribute to effective problem-solving. Curiosity allows individuals to explore various perspectives and ask thought-provoking questions, while Creativity helps develop innovative solutions and unique approaches to challenges. Criticism, or the ability to evaluate and analyze ideas objectively, ensures that the problem-solving process remains grounded in logic and relevance.

What characteristics distinguish critical thinking from creative thinking?

Critical thinking and creative thinking are two complementary cognitive skills. Critical thinking primarily focuses on analyzing, evaluating, and reasoning, using objectivity and logical thinking. It involves identifying problems, assessing evidence, and drawing sound conclusions. Creative thinking, on the other hand, is characterized by the generation of new ideas, concepts, and approaches to solve problems, often involving imagination, originality, and out-of-the-box thinking.

What are some recommended books to help improve problem-solving and critical thinking skills?

There are several books that can help enhance problem-solving and critical thinking skills, including:

“Thinking, Fast and Slow” by Daniel Kahneman: This book explores the dual process theory of decision-making and reasoning.
“The 5 Elements of Effective Thinking” by Edward B. Burger and Michael Starbird: Offers practical tips and strategies for improving critical thinking skills.
“Critique of Pure Reason” by Immanuel Kant: A classic philosophical work that delves into the principles of reason and cognition.
“Mindware: Tools for Smart Thinking” by Richard E. Nisbett: Presents a range of cognitive tools to enhance critical thinking and decision-making abilities.
“The Art of Thinking Clearly” by Rolf Dobelli: Explores common cognitive biases and errors in judgment that can affect critical thinking.

Critical Thinking vs Positive Thinking (The Pros and Cons)

In a world where we are expected to be on top of things every second of the day, having the best coping […]

How to Teach Critical Thinking in the Digital Age: Effective Strategies and Techniques

In today’s rapidly evolving digital landscape, the ability to think critically has become increasingly important for individuals of all ages. As technology […]

The Connection between Critical Thinking and Ethics: Unraveling the Link

The connection between critical thinking and ethics is a significant one, as both concepts play crucial roles in decision-making and problem-solving. Critical […]

Critical Thinking and Logic – A Brief Walkthrough

Shelter, food, clothing, and water – these are usually considered as the most important necessities to live decently and sufficiently. In order […]

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings
Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on the Assessment of 21st Century Skills. Assessing 21st Century Skills: Summary of a Workshop. Washington (DC): National Academies Press (US); 2011.

Assessing 21st Century Skills: Summary of a Workshop.

Hardcopy Version at National Academies Press

2 Assessing Cognitive Skills

As described in Chapter 1 , the steering committee grouped the five skills identified by previous efforts ( National Research Council, 2008 , 2010 ) into the broad clusters of cognitive skills, interpersonal skills, and intrapersonal skills. Based on this grouping, two of the identified skills fell within the cognitive cluster: nonroutine problem solving and systems thinking. The definition of each, as provided in the previous report ( National Research Council, 2010 , p. 3), appears below:

Nonroutine problem solving: A skilled problem solver uses expert thinking to examine a broad span of information, recognize patterns, and narrow the information to reach a diagnosis of the problem. Moving beyond diagnosis to a solution requires knowledge of how the information is linked conceptually and involves metacognition—the ability to reflect on whether a problem-solving strategy is working and to switch to another strategy if it is not working ( Levy and Murnane, 2004 ). It includes creativity to generate new and innovative solutions, integrating seemingly unrelated information, and entertaining possibilities that others may miss ( Houston, 2007 ).
Systems thinking: The ability to understand how an entire system works; how an action, change, or malfunction in one part of the system affects the rest of the system; adopting a “big picture” perspective on work ( Houston, 2007 ). It includes judgment and decision making, systems analysis, and systems evaluation as well as abstract reasoning about how the different elements of a work process interact ( Peterson et al., 1999 ).

After considering these definitions, the committee decided a third cognitive skill, critical thinking, was not fully represented. The committee added critical thinking to the list of cognitive skills, since competence in critical thinking is usually judged to be an important component of both skills ( Mayer, 1990 ). Thus, this chapter focuses on assessments of three cognitive skills: problem solving, critical thinking, and systems thinking.

DEFINING THE CONSTRUCT

One of the first steps in developing an assessment is to define the construct and operationalize it in a way that supports the development of assessment tasks. Defining some of the constructs included within the scope of 21st century skills is significantly more challenging than defining more traditional constructs, such as reading comprehension or mathematics computational skills. One of the challenges is that the definitions tend to be both broad and general. To be useful for test development, the definition needs to be specific so that there can be a shared conception of the construct for use by those writing the assessment questions or preparing the assessment tasks.

This set of skills also generates debate about whether they are domain general or domain specific. A predominant view in the past has been that critical thinking and problem-solving skills are domain general: that is, that they can be learned without reference to any specific domain and, further, once they are learned, can be applied in any domain. More recently, psychologists and learning theorists have argued for a domain-specific conception of these skills, maintaining that when students think critically or solve problems, they do not do it in the absence of subject matter: instead, they think about or solve a problem in relation to some topic. Under a domain-specific conception, the learner may acquire these skills in one domain as he or she acquires expertise in that domain, but acquiring them in one domain does not necessarily mean the learner can apply them in another.

At the workshop, Nathan Kuncel, professor of psychology with University of Minnesota, and Eric Anderman, professor of educational psychology with Ohio State University, discussed these issues. The sections below summarize their presentations and include excerpts from their papers, 1 dealing first with the domain-general and domain-specific conceptions of critical thinking and problem solving and then with the issue of transferring skills from one domain to another.

Critical Thinking: Domain-Specific or Domain-General

It is well established, Kuncel stated, that foundational cognitive skills in math, reading, and writing are of central importance and that students need to be as proficient as possible in these areas. Foundational cognitive abilities, such as verbal comprehension and reasoning, mathematical knowledge and skill, and writing skills, are clearly important for success in learning in college as well as in many aspects of life. A recent study documents this. Kuncel and Hezlett (2007) examined the body of research on the relationships between traditional measures of verbal and quantitative skills and a variety of outcomes. The measures of verbal and quantitative skills included scores on six standardized tests—the GRE, MCAT, LSAT, GMAT, MAT, and PCAT. 2 The outcomes included performance in graduate school settings ranging from Ph.D. programs to law school, medical school, business school, and pharmacy programs. Figure 2-1 shows the correlations between scores on the standardized tests and the various outcome measures, including (from bottom to top) first-year graduate GPA (1st GGPA), cumulative graduate GPA (GGPA), qualifying or comprehensive examination scores, completion of the degree, estimate of research productivity, research citation counts, faculty ratings, and performance on the licensing exam for the profession. For instance, the top bar shows a correlation between performance on the MCAT and performance on the licensing exam for physicians of roughly .65, the highest of the correlations reported in this figure. The next bar indicates the correlation between performance on the LSAT and performance on the licensing exam for lawyers is roughly .35. Of the 34 correlations shown in the figure, all but 11 are over .30. Kuncel characterized this information as demonstrating that verbal and quantitative skills are important predictors of success based on a variety of outcome measures, including performance on standardized tests, whether or not people finish their degree program, how their performance is evaluated by faculty, and their contribution to the field.

Correlations between scores on standardized tests and academic and job outcome measures. SOURCE: Kuncel and Hezlett (2007). Reprinted with permission of American Association for the Advancement of Science.

Kuncel has also studied the role that broader abilities have in predicting future outcomes. A more recent review ( Kuncel and Hezlett, 2010 ) examined the body of research on the relationships between measures of general cognitive ability (historically referred to as IQ) and job outcomes, including performance in high, medium, and low complexity jobs; training success in civilian and military settings; how well leaders perform on objective measures; and evaluations of the creativity of people’s work. Figure 2-2 shows the correlations between performance on a measure of general cognitive ability and these outcomes. All of the correlations are above .30, which Kuncel characterized as demonstrating a strong relationship between general cognitive ability and job performance across a variety of performance measures. Together, Kuncel said, these two reviews present a body of evidence documenting that verbal and quantitative skills along with general cognitive ability are predictive of college and career performance.

Kuncel noted that other broader skills, such as critical thinking or analytical reasoning, may also be important predictors of performance, but he characterizes this evidence as inconclusive. In his view, the problems lie both with the conceptualization of the constructs as domain-general (as opposed to domain-specific) as well as with the specific definition of the construct. He finds the constructs are not well defined and have not been properly validated. For instance, a domain-general concept of the construct of “critical thinking” is often indistinguishable from general cognitive ability or general reasoning and learning skills. To demonstrate, Kuncel presented three definitions of critical thinking that commonly appear in the literature:

“[Critical thinking involves] cognitive skills or strategies that increase the probability of a desirable outcome—in the long run, critical thinkers will have more desirable outcomes than ‘noncritical’ thinkers. . . . Critical thinking is purposeful, reasoned, and goal-directed. It is the kind of thinking involved in solving problems, formulating inferences, calculating likelihoods, and making decisions” ( Halpern, 1998 , pp. 450–451).
“Critical thinking is reflective and reasonable thinking that is focused on deciding what to believe or do” ( Ennis, 1985 , p. 45).
“Critical thinking [is] the ability and willingness to test the validity of propositions” ( Bangert-Drowns and Bankert, 1990 , p. 3).

He characterizied these definitions both very general and very broad. For instance, Halpern’s definition essentially encompasses all of problem solving, judgment, and cognition, he said. Others are more specific and focus on a particular class of tasks (e.g., Bangert-Drowns and Bankert, 1990 ). He questioned the extent to which critical thinking so conceived is distinct from general cognitive ability (or general intelligence).

Kuncel conducted a review of the literature for empirical evidence of the validity of the construct of critical thinking. The studies in the review examined the relationships between various measures of critical thinking and measures of general intelligence and expert performance. He looked for two types of evidence—convergent validity evidence 3 and discriminant validity 4 evidence.

Kuncel found several analyses of the relationships among different measures of critical thinking (see Bondy et al., 2001 ; Facione, 1990 ; and Watson and Glaser, 1994 ). The assessments that were studied included the Watson-Glaser Critical Thinking Appraisal (WGCTA), the Cornell Critical Thinking Test (CCTT), the California Critical Thinking Skills Test (CCTST), and the California Critical Thinking Disposition Inventory (CCTDI). The average correlation among the measures was .41. Considering that all of these tests purport to be measures of the same construct, Kuncel judged this correlation to be low. For comparison, he noted a correlation of .71 between two subtests of the SAT intended to measure critical thinking (the SAT-critical reading test and the SAT-writing test).

With regard to discriminant validity, Kuncel conducted a literature search that yielded 19 correlations between critical-thinking skills and traditional measures of cognitive abilities, such as the Miller Analogies Test and the SAT ( Adams et al., 1999 ; Bauer and Liang, 2003 ; Bondy et al., 2001 ; Cano and Martinez, 1991 ; Edwards, 1950 ; Facione et al., 1995 , 1998 ; Spector et al., 2000 ; Watson and Glaser, 1994 ). He separated the studies into those that measured critical-thinking skills and those that measured critical-thinking dispositions (i.e., interest and willingness to use one’s critical-thinking skills). The average correlation between general cognitive ability measures and critical-thinking skills was .48, and the average correlation between general cognitive ability measures and critical-thinking dispositions was .21.

Kuncel summarized these results as demonstrating that different measures of critical thinking show lower correlations with each other (i.e., average of .41) than they do with traditional measures of general cognitive ability (i.e., average of .48). Kuncel judges that these findings provide little support for critical thinking as a domain-general construct distinct from general cognitive ability. Given this relatively weak evidence of convergent and discriminant validity, Kuncel argued, it is important to determine if critical thinking is correlated differently than cognitive ability with important outcome variables like grades or job performance. That is, do measures of critical-thinking skills show incremental validity beyond the information provided by measures of general cognitive ability?

Kuncel looked at two outcome measures: grades in higher education and job performance. With regard to higher education, he examined data from 12 independent samples with 2,876 subjects ( Behrens, 1996 ; Gadzella et al., 2002 , 2004 ; Kowalski and Taylor, 2004 ; Taube, 1997 ; Williams, 2003 ). Across these studies, the average correlation between critical-thinking skills and grades was .27 and between critical-thinking dispositions and grades was .24. To put these correlations in context, the SAT has an average correlation with 1st year college GPA between .26 to .33 for the individual scales and .35 when the SAT scales are combined ( Kobrin et al., 2008 ). 5

There are very limited data that quantify the relationship between critical-thinking measures and subsequent job performance. Kuncel located three studies with the Watson-Glaser Appraisal ( Facione and Facione, 1996 , 1997 ; Giancarlo, 1996 ). They yielded an average correlation of .32 with supervisory ratings of job performance (N = 293).

Kuncel described these results as “mixed” but not supporting a conclusion that assessments of critical thinking are better predictors of college and job performance than other available measures. Taken together with the convergent and discriminant validity results, the evidence to support critical thinking as an independent construct distinct from general cognitive ability is weak.

Kuncel believes these correlational results do not tell the whole story, however. First, he noted, a number of artifactual issues may have contributed to the relatively low correlation among different assessments of critical thinking, such as low reliability of the measures themselves, restriction in range, different underlying definitions of critical thinking, overly broad definitions that are operationalized in different ways, different kinds of assessment tasks, and different levels of motivation in test takers.

Second, he pointed out, even though two tests correlate highly with each other, they may not measure the same thing. That is, although the critical-thinking tests correlate .48, on average, with cognitive ability measures, it does not mean that they measure the same thing. For example, a recent study ( Kuncel and Grossbach, 2007 ) showed that ACT and SAT scores are highly predictive of nursing knowledge. But, obviously, individuals who score highly on a college admissions test do not have all the knowledge needed to be a nurse. The constructs may be related but not overlap entirely.

Kuncel explained that one issue with these studies is they all conceived of critical thinking in its broadest sense and as a domain-general construct. He said this conception is not useful, and he summarized his meta-analysis findings as demonstrating little evidence that critical thinking exists as a domain-general construct distinct from general cognitive ability. He highlighted the fact that some may view critical thinking as a specific skill that, once learned, can be applied in many situations. For instance, many in his field of psychology mention the following as specific critical-thinking skills that students should acquire: understanding the law of large numbers, understanding what it means to affirm the consequent, being able to make judgments about sample bias, understanding control groups, and understanding Type I versus Type II errors. However, Kuncel said many tasks that require critical thinking would not make use of any of these skills.

In his view, the stronger argument is for critical thinking as a domain-specific construct that evolves as the person acquires domain-specific knowledge. For example, imagine teaching general critical-thinking skills that can be applied across all reasoning situations to students. Is it reasonable, he asked, to think a person can think critically about arguments for different national economic policies without understanding macro-economics or even the current economic state of the country? At one extreme, he argued, it seems clear that people cannot think critically about topics for which they have no knowledge, and their reasoning skills are intimately tied to the knowledge domain. For instance, most people have no basis for making judgments about how to conduct or even prioritize different experiments for CERN’s Large Hadron Collider. Few people understand the topic of particle physics sufficiently to make more than trivial arguments or decisions. On the other hand, perhaps most people could try to make a good decision about which among a few medical treatments would best meet their needs.

Kuncel also talked about the kinds of statistical and methodological reasoning skills learned in different disciplines. For instance, chemists, engineers, and physical scientists learn to use these types of skills in thinking about the laws of thermodynamics that deal with equilibrium, temperature, work, energy, and entropy. On the other hand, psychologists learn to use these skills in thinking about topics such as sample bias and self-selection in evaluating research findings. Psychologists who are adept at thinking critically in their own discipline would have difficulty thinking critically about problems in the hard sciences, unless they have specific subject matter knowledge in the discipline. Likewise, it is difficult to imagine that a scientist highly trained in chemistry could solve a complex problem in psychology without knowing some subject matter in psychology.

Kuncel said it is possible to train specific skills that aid in making good judgments in some situations, but the literature does not demonstrate that it is possible to train universally effective critical thinking skills. He noted, “I think you can give people a nice toolbox with all sorts of tools they can apply to a variety of tasks, problems, issues, decisions, citizenship questions, and learning those things will be very valuable, but I dissent on them being global and trainable as a global skill.”

Transfer from One Context to Another

There is a commonplace assumption, Eric Anderman noted in his presentation, that learners readily transfer the skills they have learned in one course or context to situations and problems that arise in another. Anderman argued research on human learning does not support this assumption. Research suggests such transfer seldom occurs naturally, particularly when learners need to transfer complex cognitive strategies from one domain to another ( Salomon and Perkins, 1989 ). Transfer is only likely to occur when care is taken to facilitate that transfer: that is, when students are specifically taught strategies that facilitate the transfer of skills learned in one domain to another domain ( Gick and Holyoak, 1983 ).

For example, Anderman explained, students in a mathematics class might be taught how to solve a problem involving the multiplication of percentages (e.g., 4.79% × 0.25%). The students then might encounter a problem in their social studies courses that involves calculating compounded interest (such as to solve a problem related to economics or banking). Although the same basic process of multiplying percentages might be necessary to solve both problems, it is unlikely that students will naturally, on their own, transfer the skills learned in the math class to the problem encountered in the social studies class.

In the past, Anderman said, there had been some notion that critical-thinking and problem-solving skills could be taught independent of context. For example, teaching students a complex language such as Latin, a computer programming language such as LOGO, or other topics that require complex thinking might result in an overall increase in their ability to think critically and problem solve.

Both Kuncel and Anderman maintained that the research does not support this idea. Instead, the literature better supports a narrower definition in which critical thinking is considered a finite set of specific skills. These skills are useful for effective decision making for many, but by no means all, tasks or situations. Their utility is further curtailed by task-specific knowledge demands. That is, a decision maker often has to have specific knowledge to make more than trivial progress with a problem or decision.

Anderman highlighted four important messages emerging from recent research. First, research documents that it is critical that students learn basic skills (such as basic arithmetic skills like times tables) so the skills become automatic. Mastery of these skills is required for the successful learning of more complex cognitive skills. Second, the use of general practices intended to improve students’ thinking are not usually successful as a means of improving their overall cognitive abilities. The research suggests students may become more adept in the specific skill taught, but this does not transfer to an overall increase in cognitive ability. Third, when general problem-solving strategies are taught, they should be taught within meaningful contexts and not as simply rote algorithms to be memorized. Finally, educators need to actively teach students to transfer skills from one context to another by helping students to recognize that the solution to one type of problem may be useful in solving a problem with similar structural features ( Mayer and Wittrock, 1996 ).

He noted that instructing students in general problem-solving skills can be useful but more elaborate scaffolding and domain-specific applications of these skills are often necessary. Whereas general problem-solving and critical-thinking strategies can be taught, research indicates these skills will not automatically or naturally transfer to other domains. Anderman stressed that educators and trainers must recognize that 21st century skills should be taught within specific domains; if they are taught as general skills, he cautioned, then extreme care must be taken to facilitate the transfer of these skills from one domain to another.

ASSESSMENT EXAMPLES

The workshop included examples of four different types of assessments of critical-thinking and problem-solving skills—one that will be used to make international comparisons of achievement, one used to license lawyers, and two used for formative purposes (i.e., intended to support instructional decision making). The first example was the computerized problem-solving component of the Programme for International Student Assessment (PISA). This assessment is still under development but is scheduled for operational administration in 2012. 6 Joachim Funke, professor of cognitive, experimental, and theoretical psychology with the Heidelberg University in Germany, discussed this assessment.

The second example was the Multistate Bar Exam, a paper-and-pencil test that consists of both multiple-choice and extended-response components. This test is used to qualify law students for practice in the legal profession. Susan Case, director of testing with the National Conference of Bar Exams, made this presentation.

The two formative assessments both make use of intelligent tutors, with assessments embedded into instruction modules. The “Auto Tutor” described by Art Graesser, professor of psychology with the University of Memphis, is used in instructing high school and higher education students in critical thinking skills in science. The Auto Tutor is part of a system Graesser has developed called Operation ARIES! (Acquiring Research Investigative and Evaluative Skills). The “Packet Tracer,” described by John Beherns, director of networking academy learning systems development with Cisco, is intended for individuals learning computer networking skills.

Problem Solving on PISA

For the workshop, Joachim Funke supplied the committee with the draft framework for PISA (see Organisation for Economic Co-operation and Development, 2010 7 ) and summarized this information in his presentation. 8 The summary below is based on both documents.

PISA, Funke explained, defines problem solving as an individual’s capacity to engage in cognitive processing to understand and resolve problem situations where a solution is not immediately obvious. The definition includes the willingness to engage with such situations in order to achieve one’s potential as a constructive and reflective citizen ( Organisation for Co-operation and Development, 2010 , p. 12). Further, the PISA 2012 assessment of problem-solving competency will not test simple reproduction of domain-based knowledge, but will focus on the cognitive skills required to solve unfamiliar problems encountered in life and lying outside traditional curricular domains. While prior knowledge is important in solving problems, problem-solving competency involves the ability to acquire and use new knowledge or to use old knowledge in a new way to solve novel problems. The assessment is concerned with nonroutine problems, rather than routine ones (i.e., problems for which a previously learned solution procedure is clearly applicable). The problem solver must actively explore and understand the problem and either devise a new strategy or apply a strategy learned in a different context to work toward a solution. Assessment tasks center on everyday situations, with a wide range of contexts employed as a means of controlling for prior knowledge in general.

The key domain elements for PISA 2012 are as follows:

The problem context: whether it involves a technological device or not, and whether the focus of the problem is personal or social
The nature of the problem situation: whether it is interactive or static (defined below)
The problem-solving processes: the cognitive processes involved in solving the problem

The PISA 2012 framework (pp. 18–19) defines four processes that are components of problem solving. The first involves information retrieval. This process requires the test taker to quickly explore a given system to find out how the relevant variables are related to each other. The test taker must explore the situation, interact with it, consider the limitations or obstacles, and demonstrate an understanding of the given information. The objective is for the test taker to develop a mental representation of each piece of information presented in the problem. In the PISA framework, this process is referred to as exploring and understanding.

The second process is model building, which requires the test taker to make connections between the given variables. To accomplish this, the examinee must sift through the information, select the information that is relevant, mentally organize it, and integrate it with relevant prior knowledge. This requires the test taker to represent the problem in some way and formulate hypotheses about the relevant factors and their interrelationships. In the PISA framework, this dimension is called representing and formulating.

The third process is called forecasting and requires the active control of a given system. The framework defines this process as setting goals, devising a strategy to carry them out, and executing the plan. In the PISA framework, this dimension is called planning and executing.

The fourth process is monitoring and reflecting. The framework defines this process as checking the goal at each stage, detecting unexpected events, taking remedial action if necessary, and reflecting on solutions from different perspectives by critically evaluating assumptions and alternative solutions.

Each of these processes requires the use of reasoning skills, which the framework describes as follows ( Organisation for Economic Co-operation and Development, 2010 , p. 19):

In understanding a problem situation, the problem solver may need to distinguish between facts and opinion, in formulating a solution, the problem solver may need to identify relationship between variables, in selecting a strategy, the problem solver may need to consider cause and effect, and in communicating the results, the problem solver may need to organize information in a logical manner. The reasoning skills associated with these processes are embedded within problem solving. They are important in the PISA context since they can be taught and modeled in classroom instruction (e.g., Adey et al., 2007 ; Klauer and Phye, 2008 ).

For any given test taker, the test lasts for 40 minutes. PISA is a survey-based assessment that uses a balanced rotation design. A total of 80 minutes of material is organized into four 20-minute clusters, with each student taking two clusters.

The items are grouped into units around a common stimulus that describes the problem. Reading and numeracy demands are kept to a minimum. The tasks all consist of authentic stimulus items, such as refueling a moped, playing on a handball team, mixing a perfume, feeding cats, mixing elements in a chemistry lab, taking care of a pet, and so on. Funke noted that the different contexts for the stimuli are important because test takers might be motivated differentially and might be differentially interested depending on the context. The difficulty of the items is manipulated by increasing the number of variables or the number of relations that the test taker has to deal with.

PISA 2012 is a computer-based test in which items are presented by computer and test takers respond on the computer. Approximately three-quarters of the items are in a format that the computer can score (simple or complex multiple-choice items). The remaining items are constructed-response, and test takers enter their responses into text boxes.

Scoring of the items is based on the processes that the test taker uses to solve the problem and involves awarding points for the use of certain processes. For information retrieval, the focus is on identifying the need to collect baseline data (referred to in PISA terminology as identifying the “zero round”) and the method of manipulating one variable at a time (referred to in PISA terminology as “varying one thing at a time” or VOTAT). Full credit is awarded if the subject uses VOTAT strategy and makes use of zero rounds. Partial credit is given if the subject uses VOTAT but does not make use of zero rounds.

For model building, full credit is awarded if the generated model is correct. If one or two errors are present in the model, partial credit is given. If more than two errors are present, then no credit is awarded.

For forecasting, full credit is given if the target goals are reached. Partial credit is given if some progress toward the target goals can be registered, and no credit is given if there is no progress toward target goals at all.

PISA items are classified as static versus interactive. In static problems, all the information the test taker needs to solve the problem is presented at the outset. In contrast, interactive problems require the test taker to explore the problem to uncover important relevant information ( Organisation for Economic Co-operation and Development, 2010 , p. 15). Two sample PISA items appear in Box 2-1 .

Sample Problem-Solving Items for PISA 2012. Digital Watch–interactive: A simulation of a digital watch is presented. The watch is controlled by four buttons, the functions of which are unknown to the student at the outset of the problems. The (more...)

Funke and his colleagues have conducted analyses to evaluate the construct validity of the assessment. They have examined the internal structure of the assessment using structural equation modeling, which evaluates the extent to which the items measure the dimensions they are intended to measure. The results indicate the three dimensions are correlated with each other. Model Building and Forecasting correlate at .77; Forecasting and Information Retrieval correlate at .71; and Information Retrieval and Model Building correlate at .75. Funke said that the results also document that the items “load on” the three dimensions in the way the test developers hypothesized. He indicated some misfit related to the items that measure Forecasting, and he attributes this to the fact that the Forecasting items have a skewed distribution. However, the fit of the model does not change when these items are removed.

Funke reported results from studies of the relationship between test performance and other variables, including school achievement and two measures of problem solving on the PISA German National Extension on Complex Problem Solving. The latter assessment, called HEIFI, measures knowledge about a system and the control of the system separately. Scores on the PISA Model Building dimension are statistically significant (p < .05) related to school achievement (r = .64) and to scores on the HEIFI knowledge component (r = .48). Forecasting is statistically significant (p < .05) related to both of the HEIFI scores (r = .48 for HEIFI knowledge and r = .36 for HEIFI control). Information Retrieval is statistically significant (p < .05) related to HEIFI control (r = .38). The studies also show that HEIFI scores are not related to school achievement.

Funke closed by discussing the costs associated with the assessment. He noted it is not easy to specify the costs because in a German university setting, many costs are absorbed by the department and its equipment. Funke estimates that development costs run about $13 per unit, 9 plus $6.5 for the Cognitive Labs used to pilot test and refine the items. 10 The license for the Computer Based Assessment (CBA) Item-builder and the execution environment is given for free for scientific use from DIPF 11 Frankfurt.

The Bar Examination for Lawyers 12

The Bar examination is administered by each jurisdiction in the United States as one step in the process to license lawyers. The National Council of Bar Examiners (NCBE) develops a series of three exams for use by the jurisdictions. Jurisdictions may use any or all of these three exams or may administer locally developed exam components if they wish. The three major components developed by the NCBE include the Multi-state Bar Exam (MBE), the Multi-state Essay Exam (MEE), and the Multi-state Performance Test (MPT). All are paper-and-pencil tests. Examinees pay to take the test, and the costs are $54 for the MBE, $20 for the MEE, and $20 for the MPT.

Susan Case, who has spent her career working on licensing exams—first the medical licensing exam for physicians and then the bar exam for lawyers—noted the Bar examination is like other tests used to award professional licensure. The focus of the test is on the extent to which the test taker has the knowledge and skills necessary to be licensed in the profession on the day of the test. The test is intended to ensure the newly licensed professional knows what he/she needs to know to practice law. The test is not designed to measure the curriculum taught in law schools, but what licensed professionals need to know. When they receive the credential, lawyers are licensed to practice in all fields of law. This is analogous to medical licensing in which the licensed professional is eligible to practice any kind of medicine.

The Bar exam includes both multiple-choice and constructed-response components. Both require examinees to be able to gather and synthesize information and apply their knowledge to the given situation. The questions generally follow a vignette that describes a case or problem and asks the examinee to determine the issues to resolve before advising the client or to determine other information needed in order to proceed. For instance, what questions should be asked next? What is the best strategy to implement? What is the best defense? What is the biggest obstacle to relief? The questions may require the examinee to synthesize the law and the facts to predict outcomes. For instance, is the ordinance constitutional? Should a conviction be overturned?

The purpose of the MBE is to assess the extent to which an examinee can apply fundamental legal principles and legal reasoning to analyze a given pattern of facts. The questions focus on the understanding of legal principles rather than memorization of local case or statutory law. The MBE consists of 60 multiple-choice questions and lasts a full day.

A sample question follows:

A woman was told by her neighbor that he planned to build a new fence on his land near the property line between their properties. The woman said that, although she had little money, she would contribute something toward the cost. The neighbor spent $2,000 in materials and a day of his time to construct the fence. The neighbor now wants her to pay half the cost of the materials. Is she liable for this amount?

The purpose of the MEE is to assess the examinee’s ability to (1) identify legal issues raised by a hypothetical factual situation; (2) separate material that is relevant from that which is not; (3) present a reasoned analysis of the relevant issues in a clear, concise, and well-organized composition; and (4) demonstrate an understanding of the fundamental legal principles relevant to the probable resolution of the issues raised by the factual situation.

The MEE lasts for 6 hours and consists of nine 30-minute questions. An excerpt from a sample question follows:

The CEO/chairman of the 12-member board of directors (the Board) of a company plus three other members of the Board are senior officers of the company. The remaining eight members of the Board are wholly independent directors. Recently, the Board decided to hire a consulting firm to market a new product . . . The CEO disclosed to the Board that he had a 25% partnership interest in the consulting firm. The CEO stated that he would not be involved in any work to be performed by the consulting firm. He knew but did not disclose to the Board that the consulting firm’s proposed fee for this consulting assignment was substantially higher than it normally charged for comparable work . . . The Board discussed the relative merits of the two proposals for 10 minutes. The Board then voted unanimously (CEO abstaining) to hire the consulting firm . . . Did the CEO violate his duty of loyalty to his company? Explain. Assuming the CEO breached his duty of loyalty to his company, does he have any defense to liability? Explain. Did the other directors violate their duty of care? Explain.

The purpose of the MPT is to assess fundamental lawyering skills in realistic situations by asking the candidate to complete a task that a beginning lawyer should be able to accomplish. The MPT requires applicants to sort detailed factual materials; separate relevant from irrelevant facts; analyze statutory, case, and administrative materials for relevant principles of law; apply relevant law to the facts in a manner likely to resolve a client’s problem; identify and resolve ethical dilemmas; communicate effectively in writing; and complete a lawyering task within time constraints.

Each task is completely self-contained and includes a file, a library, and a task to complete. The task might deal with a car accident, for example, and therefore might include a file with pictures of the accident scene and depositions from the various witnesses, as well as a library with relevant case law. Examinees are given 90 minutes to complete each task.

For example, in a case involving a slip and fall in a store, the task might be to prepare an initial draft of an early dispute resolution for a judge. The draft should candidly discuss the strengths and weaknesses of the client’s case. The file would contain the instructional memo from the supervising attorney, the local rule, the complaint, an investigator’s report, and excerpts of the depositions of the plaintiff and a store employee. The library would include a jury instruction concerning the premises liability with commentary on contributory negligence.

The MBE is a multiple-choice test and thus scored by machine. However, the other two components require human scoring. The NCBE produces the questions and the grading guidelines for the MEE and MPT, but the essays and performance tests are scored by the jurisdictions themselves. The scorers are typically lawyers who are trained during grading seminars held at the NCBE offices, after the exam is administered. At this time, they review sample papers and receive training on how to apply the scoring guidelines in a consistent fashion.

Each component of the Bar examination (MBE, MEE, MPT) is intended to assess different skills. The MBE focuses on breadth of knowledge, the MEE focuses on depth of knowledge, and the MPT focuses on the ability to demonstrate practical skills. Together, the three formats cover the different types of tasks that a new lawyer needs to do.

Determinations about weighting the three components are left to the jurisdictions; however, the NCBE urges them to weight the MBE score by 50 percent and the MEE and MPT by 25 percent each. The recommendation is an attempt to balance a number of concerns, including authenticity, psychometric considerations, logistical issues, and economic concerns. The recommendation is to award the highest weight to the MBE because it is the most psychometrically sound. The reliability of scores on the MBE is generally over .90, much higher than scores on the other portions, and the MBE is scaled and equated across time. The recommended weighting helps to ensure high decision consistency and comparability of pass/fail decisions across administrations.

Currently the MBE is used by all but three jurisdictions (Louisiana, Washington, and Puerto Rico). The essay exam is used by 27 jurisdictions, and the performance test is used by 34 jurisdictions.

Test Development

Standing test development committees that include practicing lawyers, judges, and lawyers on staff with law schools write the test questions. The questions are reviewed by outside experts, pretested on appropriate populations, analyzed and revised, and professionally edited before operational use. Case said the test development procedures for the Bar exam are analogous to those used for the medical licensure exams.

Operation ARIES! (Acquiring Research Investigative and Evaluative Skills)

The summary below is based on materials provided by Art Graesser, including his presentation 13 and two background papers he supplied to the committee ( Graesser et al., 2010 ; Millis et al., in press ).

Operation ARIES! is a tutorial system with a formative assessment component intended for high school and higher education students, Graesser explained. It is designed to teach and assess critical thinking about science. The program operates in a game environment intended to be engaging to students. The system includes an “Auto Tutor,” which makes use of animated characters that converse with students. The Auto Tutor is able to hold conversations with students in natural language, interpret the student’s response, and respond in a way that is adaptive to the student’s response. The designers have created a science fiction setting in which the game and exercises operate. In the game, alien creatures called “Fuaths” are disguised as humans. The Fuaths disseminate bad science through various media outlets in an attempt to confuse humans about the appropriate use of the scientific method. The goal for the student is to become a “special agent of the Federal Bureau of Science (FBS), an agency with a mission to identify the Fuaths and save the planet” ( Graesser et al., 2010 , p. 328).

The system addresses scientific inquiry skills, developing research ideas, independent and dependent variables, experimental control, the sample, experimenter bias, and relation of data to theory. The focus is on use of these skills in the domains of biology, chemistry, and psychology. The system helps students to learn to evaluate evidence intended to support claims. Some examples of the kinds of research questions/claims that are evaluated include the following:

From Biology

Do chemical and organic pesticides have different effects on food quality?
Does milk consumption increase bone density?

From Chemistry

Does a new product for winter roads prevent water from freezing?
Does eating fish increase blood mercury levels?

From Psychology

Does using cell phones hurt driving?
Is a new cure for autism effective?

The system includes items in real-life formats, such as articles, advertisements, blogs, and letters to the editor, and makes use of different types of media where it is common to see faulty claims.

Through the system, the student encounters a story told by video, combined with communications received by e-mail, text message, and updates. The student is engaged through the Auto Tutor, which involves a “tutor agent” that serves as a narrator, and a “student agent” that serves in different roles, depending on the skill level of the student.

The system makes use of three kinds of modules—interactive training, case studies, and interrogations. The interactive training exchanges begin with the student reading an e-book, which provides the requisite information used in later modules. After each chapter, the student responds to a set of multiple-choice questions intended to assess the targeted skills. The text is interactive in that it involves “trialogs” (three-way conversations) between the primary agent, the student agent, and the actual (human) student. It is adaptive in that the strategy used is geared to the student’s performance. If the student is doing poorly, the two auto-tutor agents carry on a conversation that promotes vicarious learning: that is, the tutor agent and the student agent interact with each other, and the human student observes. If the student is performing at an intermediate level, normal tutoring occurs in which the student carries on a conversational exchange with the tutor agent. If the student is doing very well, he or she may be asked to teach the student agent, under the notion that the act of teaching can help to perfect one’s skills.

In the case study modules, the student is expected to apply what he or she has learned. The case study modules involve some type of flawed science, and the student is to identify the flaws by applying information learned from the interactive text in the first module. The student responds by verbally articulating the flaws, and the system makes use of advances in computational linguistics to analyze the meaning of the response. The researchers adopted the case study approach because it “allows learners to encode and discover the rich source of constraints and interdependencies underlying the target elements (flaws) within the cases. [Prior] cases provide a knowledge base for assessing new cases and help guide reasoning, problem solving, interpretation and other cognitive processes” ( Millis et al., in press , p. 17).

In the interrogation modules, insufficient information is provided, so students must ask questions. Research is presented in an abbreviated fashion, such as through headlines, advertisements, or abstracts. The student is expected to identify the relevant questions to ask and to learn to discriminate good research from flawed research. The storyline is advanced by e-mails, dialogues, and videos that are interspersed among the learning activities.

Through the three kinds of modules, the system interweaves a variety of key principles of learning that Graesser said have been shown to increase learning. These include

Self-explanation (where the learner explains the material to another student, such as the automated student)
Immediate feedback (through the tutoring system)
Multimedia effects (which tend to engage the student)
Active learning (in which students actually participate in solving a problem)
Dialog interactivity (in which students learn by engaging in conversations and tutorial dialogs)
Multiple, real-life examples (intended to help students transfer what they learn in one context to another context and to real world situations)

Graesser closed by saying that he and his colleagues are beginning to collect data from evaluation studies to examine the effects of the Auto Tutor. Research has focused on estimating changes in achievement before and after use of the system, and, to date, the results are promising.

Packet Tracer

The summary below is based on materials provided by John Behrens, including his presentation 14 and a background paper he forwarded in preparation for the workshop ( Behrens et al., in press ).

To help countries around the world train their populations in networking skills, Cisco created the Networking Academy. The academy is a public/private partnership through which Cisco provides free online curricula and assessments. Behrens pointed out that in order to become adept with networking, students need both a conceptual understanding of networking and the skills to apply this knowledge to real situations. Thus, hands-on practice and assessment on real equipment are important components of the academy’s instructional program. Cisco also wants to provide students with time for out-of-class practice and opportunities to explore on their own using online equipment that is not typically available in the average classroom setting. In the Networking Academy, students work with an online instructor, and they proceed through an established curriculum that incorporates numerous interactive activities.

Behrens talked specifically about a new program Cisco has developed called “Packet Tracer,” a computer package that uses simulations to provide instruction and includes an interactive and adaptable assessment component. Cisco has incorporated Packet Tracer activities into the curricula for training networking professionals. Through this program, instructors and students can construct their own activities, and students can explore problems on their own. In Cisco’s Networking Academy, assessments can be student-initiated or instructor-initiated. Student-initiated assessments are primarily embedded in the curriculum and include quizzes, interactive activities, and “challenge labs,” which are a feature of Packet Tracer. The student-initiated assessments are designed to provide feedback to the student to help his or her learning. They use a variety of technologies ranging from multiple-choice questions (in the quizzes) to complex simulations (in the challenge labs). Before the development of Packet Tracer, the instructor-initiated assessments consisted either of hands-on exams with real networking equipment or multiple-choice exams in the online assessment system. Packet Tracer provides more simulation-based options, and also includes detailed reporting and grade-book integration features.

Each assessment consists of one extensive network configuration or troubleshooting activity that may require up to 90 minutes to complete. Access to the assessment is associated with a particular curricular unit, and it may be re-accessed repeatedly based on instructor authorization. The system provides simulations of a broad range of networking devices and networking protocols, including features set around the Cisco IOS (Internet Operating System). Instructions for tasks can be presented through HTML-formatted text boxes that can be preauthored, stored, and made accessible by the instructor at the appropriate time.

Behrens presented an example of a simulated networking problem in which the student needs to obtain the appropriate cable. To complete this task, the student must determine what kind of cable is needed, where on the computer to plug it in, and how to connect it. The student’s performance is scored, and his or her interactions with the problem are tracked in a log. The goal is not to simply assign a score to the student’s performance but to provide detailed feedback to enhance learning and to correct any misinterpretations. The instructors can receive and view the log in order to evaluate how well the student understands the tasks and what needs to be done.

Packet Tracer can simulate a broad range of devices and networking protocols, including a wide range of PC facilities covering communication cards, power functionality, web browsers, and operating system configurations. The particular devices, configurations, and problem states are determined by the author of the task (e.g., the instructor) in order to address whatever proficiencies the chapter, course, or instruction targets. When icons of the devices are touched in the simulator, more detailed pictures are presented with which the student can interact. The task author can program scoring rules into the system. Students can be observed trying and discarding potential solutions based on feedback from the game resulting in new understandings. The game encourages students to engage in problem-solving steps (such as problem identification, solution generation, and solution testing). Common incorrect strategies can be seen across recordings.

For Kuncel’s presentation, see http://www7 .national-academies .org/bota/21st _Century_Workshop_Kuncel.pdf . For Kuncel’s paper, see http://www7 .national-academies .org/bota/21st _Century_Workshop_Kuncel_Paper.pdf . For Anderman’s presentation, see http://www7 .national-academies .org/bota/21st _Century_Workshop_Anderman.pdf . For Anderman’s paper, see http: //nrc51/xpedio/groups /dbasse/documents /webpage/060387~1.pdf [August 2011].

Respectively, the Graduate Record Exam, Medical College Admission Test, Law School Admission Test, Graduate Management Admission Test, Miller Analogies Test, and Pharmacy College Admission Test.

Convergent validilty indicates the degree to which an operationalized construct is similar to other operationalized constructs that it theoretically should also be similar to. For instance, to show the convergent validity of a test of critical thinking, the scores on the test can be correlated with scores on other tests that are also designed to measure critical thinking. High correlations between the test scores would be evidence of convergent validity.

Discriminant validity evaluates the extent to which a measure of an operationalized construct differs from measures of other operationalized constructs that it should differ from. In the present context, the interest is in verifying that critical thinking is a construct distinct from general intelligence and expert performance. Thus, discriminant validity would be examined by evaluating the patterns of correlations between and among scores on tests of critical thinking and scores on tests of the other two constructs (general intelligence and expert performance).

It is important to note that when corrected for restriction in range, these coefficients increase to .47 to .51 for individual scores and .51 for the combined score.

For a full description of the PISA program, see http://www .oecd.org/pages /0,3417,en_32252351 _32235731_1_1_1_1_1,00.html [August 2011].

Available at http://www .oecd.org/dataoecd /8/42/46962005.pdf [August 2011].

Available at http://www7 .national-academies .org/bota/21st _Century_Workshop_Funke.pdf [August 2011].

A unit consists of stimulus materials, instructions, and the associated questions.

Costs are in American dollars.

DIPF stands for the Deutsches Institut für Internationale Pädagogische Forschung, which translates to the German Institute for Educational Research and Educational Information.

The summary is based on a presentation by Susan Case, see http://www7 .nationalacademies .org/bota/21st _Century_Workshop_Case.pdf [August 2011].

For Graesser’s presentation, see http: //nrc51/xpedio/groups /dbasse/documents /webpage/060267~1.pdf [August 2011].

For Behrens’ presentation, see http://www7 .national-academies .org/bota/21st _Century_Workshop_Behrens.pdf [August 2011].

Cite this Page National Research Council (US) Committee on the Assessment of 21st Century Skills. Assessing 21st Century Skills: Summary of a Workshop. Washington (DC): National Academies Press (US); 2011. 2, Assessing Cognitive Skills.
PDF version of this title (1003K)

In this Page

Other titles in this collection.

The National Academies Collection: Reports funded by National Institutes of Health

Recent Activity

Assessing Cognitive Skills - Assessing 21st Century Skills Assessing Cognitive Skills - Assessing 21st Century Skills

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

International Study Reveals Measuring and Developing Critical-Thinking Skills as an Essential Best Practice in Higher Education

Opportunities exist for higher education institutions worldwide to increase critical-thinking skills among higher education graduates through explicit instruction, practice, and measurement of the skills employers are most seeking in today’s innovation economy..

NEW YORK, October 18, 2023 | Source: GlobeNewswire

The Council for Aid to Education, Inc. (CAE), a leader in designing innovative performance tasks for measurement and instruction of higher-order skills, recently co-authored an article on a six-year international study in the European Journal of Education Study . Key findings shared in “Assessing and Developing Critical-Thinking Skills in Higher Education” include that it is feasible to reliably and validly measure higher-order skills in a cross-cultural context and that assessment of these skills is necessary for colleges and universities to ensure that their programs are graduating students with the skills needed for career success after graduation.

Between 2015 and 2020, 120,000 students from higher education institutions in six different countries — Chile, Finland, Italy, Mexico, the UK, and the US — were administered CAE’s Collegiate Learning Assessment (CLA+) , a performance-based assessment that measures proficiency with critical thinking, problem solving, and written communication. Analysis of the data show that students entering a higher education program on average performed at the Developing mastery level of the test while exiting students on average performed at the Proficient mastery level. The amount of growth is relatively small (d = 0.10), but significant. However, half of exiting students perform at the two lowest levels of proficiency, indicating that higher education degrees do not necessarily mean students have gained the higher-order skills needed for innovation-oriented workplaces.

“In response to employer concerns about graduate employability, assessing and developing students’ higher-order skills is an essential component of best practices in higher education,” said Doris Zahner, Ph.D., CAE’s chief academic officer. “The ability to measure these skills in a cross-cultural context addresses a current gap between the skills that higher education graduates possess and the skills that are required by hiring managers for success in the workplace.”

This study reinforces the same findings of OECD’s 2013 Assessment of Higher Education Learning Outcomes (AHELO) Feasibility Study and is based upon a recently published 2022 OECD report, Does Higher Education Teach Students to Think Critically? . Since this original study, CAE has further improved CLA+ through lessons learned from its implementation, analytical research on the data gathered, and international collaboration.

The research discussed in “Assessing and Developing Critical-Thinking Skills in Higher Education” reinforces the need for policymakers, researchers, and higher education leaders to have valid and reliable internationally comparative assessments of the skills that are needed for today’s knowledge economy. “The results outlined in this report show the power of assessing critical-thinking skills and how such assessments can feed into the higher education policy agenda at the national and international level,” said article co-author Dirk Van Damme, former head of the Centre for Educational Research and Innovation at OECD and current senior research fellow at the Centre for Curriculum Redesign.

CAE, in collaboration with the Finland Ministry of Education and Culture, will continue to study the impact of higher education on the development of critical-thinking skills. Starting in 2023 and continuing through 2025, a cohort of students from 18 Finnish higher education institutions will use CLA+ to measure their growth with critical thinking, adding a longitudinal component to this ongoing research.

To learn more about this study, CAE’s other research, and CAE’s performance-based assessments and critical thinking instruction, visit cae.org .

About CAE As a nonprofit whose mission is to help improve the academic and career outcomes of secondary and higher education students, CAE is the leader in designing innovative performance tasks for measurement and instruction of higher order skills and within subject areas.

Over the past 20 years, CAE has helped over 825,000 students globally understand and improve their proficiency in critical thinking, problem solving and effective written communication. Additionally, CAE’s subject area assessments have helped millions of K12 students across the US. Supported by best practices in assessment development, administration and psychometrics, CAE’s performance-based assessments include the Collegiate Learning Assessment (CLA+) and College and Career Readiness Assessment (CCRA+). To learn more, please visit cae.org and connect with us on LinkedIn and YouTube .

You Might Also Like…

ON-DEMAND SUMMER LEARNING WEBINAR SERIES

THE AI EDGE:

Women’s History Month Video: Professionals Share How Higher-Order Skills Contribute to Career Success

SUGGESTED TOPICS
The Magazine
Newsletters
Managing Yourself
Managing Teams
Work-life Balance
The Big Idea
Data & Visuals
Reading Lists
Case Selections
HBR Learning
Topic Feeds
Account Settings
Email Preferences

How to Evaluate a Job Candidate’s Critical Thinking Skills in an Interview

Christopher Frank,
Paul Magnone,
Oded Netzer

It’s not about how they answer your questions — it’s about the kind of questions they ask you.

The oldest and still the most powerful tactic for fostering critical thinking is the Socratic method, developed over 2,400 years ago by Socrates, one of the founders of Western philosophy. The Socratic method uses thought-provoking question-and-answer probing to promote learning. It focuses on generating more questions than answers, where the answers are not a stopping point but the beginning of further analysis. Hiring managers can apply this model to create a different dialogue with candidates in a modern-day organization.

Hiring is one of the most challenging competencies to master, yet it is one of the most strategic and impactful managerial functions. A McKinsey study quantified that superior talent is up to eight times more productive, showing that the relationship between talent quality and business performance is dramatic. Organizations seeking growth or simply survival during difficult times must successfully recruit A-list talent, thought leaders, and subject matter experts. This is often done under time constraints as you must quickly fill a key position. Essentially you are committing to a long-term relationship after a few very short dates.

CF Christopher Frank is the coauthor of “ Decisions Over Decimals: Striking the Balance between Intuition and Information ” (Wiley) and “ Drinking from the Fire Hose: Making Smarter Decisions Without Drowning in Information ” (Portfolio). He is the Vice President of research and analytics at American Express.
PM Paul Magnone is the coauthor of “ Decisions Over Decimals: Striking the Balance between Intuition and Information ” (Wiley) and “ Drinking from the Fire Hose: Making Smarter Decisions Without Drowning in Information ” (Portfolio). He currently serves as the head of global strategic alliances for Google.
ON Oded Netzer is the coauthor of “ Decisions Over Decimals: Striking the Balance between Intuition and Information ” (Wiley). He is the Vice Dean for Research and the Arthur J. Samberg Professor of Business at Columbia Business School, an affiliate of the Columbia Data Science Institute, and an Amazon Scholar.

Partner Center

COMMENTS

Teaching, Measuring & Assessing Critical Thinking Skills
Yes, We Can Define, Teach, and Assess Critical Thinking Skills. Critical thinking is a thing. We can define it; we can teach it; and we can assess it. While the idea of teaching critical thinking has been bandied around in education circles since at least the time of John Dewey, it has taken greater prominence in the education debates with the ...
Assessing Critical Thinking in Higher Education: Current State and
Critical thinking is one of the most frequently discussed higher order skills, believed to play a central role in logical thinking, decision making, and problem solving (Butler, 2012; Halpern, 2003).It is also a highly contentious skill in that researchers debate about its definition; its amenability to assessment; its degree of generality or specificity; and the evidence of its practical ...
Critical Thinking Testing and Assessment
The purpose of assessing instruction for critical thinking is improving the teaching of discipline-based thinking (historical, biological, sociological, mathematical, etc.) It is to improve students' abilities to think their way through content using disciplined skill in reasoning. The more particular we can be about what we want students to ...
Critical Thinking About Measuring Critical Thinking
The Ennis-Weir critical thinking essay test. Pacific Grove, CA: Midwest Publications. Facione, P. A. (1990a). The California critical thinking skills test (CCTST): Forms A and B;The CCTST test manual.
What Are Critical Thinking Skills and Why Are They Important?
According to the University of the People in California, having critical thinking skills is important because they are [ 1 ]: Universal. Crucial for the economy. Essential for improving language and presentation skills. Very helpful in promoting creativity. Important for self-reflection.
Critical Thinking > Assessment (Stanford Encyclopedia of Philosophy)
The Critical Thinking Assessment Test (CAT) is unique among them in being designed for use by college faculty to help them improve their development of students' critical thinking skills (Haynes et al. 2015; Haynes & Stein 2021). Also, for some years the United Kingdom body OCR (Oxford Cambridge and RSA Examinations) awarded AS and A Level ...
A Short Guide to Building Your Team's Critical Thinking Skills
A Short Guide to Building Your Team's Critical Thinking Skills. by. Matt Plummer. October 11, 2019. twomeows/Getty Images. Summary. Most employers lack an effective way to objectively assess ...
Frontiers
Enhancing students' critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves ...
Promoting and Assessing Critical Thinking
An effective criterion measures which skills are present, to what extent, and which skills require further development. The following are characteristics of work that may demonstrate effective critical thinking: ... Teaching Critical Thinking Skills. UTS Newsletter, 11(2):1-4; Facione, P.A. and Facione, N.C. (1994). Holistic Critical Thinking ...
Measuring critical thinking skills.
Critical thinking remains one of the most important skills identified as an outcome of a college degree. This chapter reviews critical thinking measures that are specific to psychology as well as broad-based general measures. Utilizing the techniques available from the scholarship of teaching and learning (SoTL) literature, scholars can measure and document the effectiveness of planned ...
A Systematic Review on Instruments to Assess Critical Thinking
Critical Think ing an d Problem Solving (CTPS) are soft skills essential to be equipped among students according to. 21st-century learning. Several instruments have bee n dev eloped to measure ...
PDF Measuring student success skills
measuring-student-success-skills-a-review-of-the-literature-on-critical-thinking/. How can critical thinking be measured and assessed? The primary assessment tools for critical thinking are standardized tests and high-quality performance-based assessments. Standardized tests face criticism due to weaknesses related to, for example, construct
Measuring, Assessing and Evaluating Thinking Skills in ...
However, the most significant problem in teaching thinking skills remains the issue of measuring, assessing, and evaluating students' thinking skills. Additionally, the issue of teaching thinking skills requires a practical approach rather than a theoretical dwelling. ... Fostering the skills of critical thinking and question-posing in a ...
Measuring Critical Thinking: Can It Be Done?
However, critical thinking is an abstract and complex term, which makes it more difficult to measure and assess than other essential skills like reading and writing. The recognition that critical thinking is an important skill has led to the development of a number of assessment methods, although all of these methods have considerable limitations.
Yes, We Can Define, Teach, and Assess Critical Thinking Skills
These assessments demonstrate that it is possible to capture meaningful data on students' critical thinking abilities. They are not intended to be high stakes accountability measures. Instead ...
Measuring What Matters: Assessing Creativity, Critical Thinking, and
If creative and critical thinking are both inherently important in developing global problem solvers and further represent the goals of gifted curriculum, then classroom assessments must be designed to measure student development of these process skills. Many assessment rubrics emphasize the end product or superficially address process skills.
PDF A Logical Basis for Measuring Critical Thinking Skills
A Logical Basis for Measuring Critical Thinking Skills. We must go beyond Bloom's taxonomy to consider specific dispositions and abilities characteristic of critical thinkers. The recent explosion of interest in critical thinking has occa sioned an accompanying inter est in assessing it on a large scale This ...
Critical Thinking Models: A Comprehensive Guide for Effective Decision
The Halpern Critical Thinking Assessment is a standardized test developed by Diane Halpern to assess critical thinking skills. The evaluation uses a variety of tasks to measure abilities in core skill areas, such as verbal reasoning, argument analysis, and decision making.
Assessing Cognitive Skills
After considering these definitions, the committee decided a third cognitive skill, critical thinking, was not fully represented. The committee added critical thinking to the list of cognitive skills, since competence in critical thinking is usually judged to be an important component of both skills (Mayer, 1990). Thus, this chapter focuses on assessments of three cognitive skills: problem ...
International Study Reveals Measuring and Developing Critical-Thinking
Opportunities exist for higher education institutions worldwide to increase critical-thinking skills among higher education graduates through explicit instruction, practice, and measurement of the skills employers are most seeking in today's innovation economy. NEW YORK, October 18, 2023 | Source: GlobeNewswire The Council for Aid to Education, Inc. (CAE), a leader in designing innovative ...
(PDF) Critical thinking. Can it be measured?
In this, the test development started with a careful analysis of the skills that were central in critical thinking, leading to an operational description (Al-Osaimi et al. 2014). Multiple choice ...
How to Evaluate a Job Candidate's Critical Thinking Skills in an Interview
The oldest and still the most powerful tactic for fostering critical thinking is the Socratic method, developed over 2,400 years ago by Socrates, one of the founders of Western philosophy. The ...
Critical Thinking Performance Increases in Psychology Undergraduates
critical thinking skills are expected to develop is relatively small ranging between 12 and 16 weeks. One of the main ... measure critical thinking skills over a longer period of time. Using a cross-sectional design and students from different American colleges and different disciplines, Liu, Mao, et al.'s ...
Boost Critical Thinking with a Learning Plan
7. Critical thinking is an essential skill that allows you to analyze information objectively and make reasoned judgments. It involves the evaluation of sources, such as data or factual evidence ...

Why Schools Need to Change Yes, We Can Define, Teach, and Assess Critical Thinking Skills

Defining Critical Thinking Skills

Teaching Critical Thinking Skills

Assessing Critical Thinking Skills

Jeff Heyck-Williams (He, His, Him)

Read More About Why Schools Need to Change

Nurturing STEM Identity and Belonging: The Role of Equitable Program Implementation in Project Invent

Bring Your Vision for Student Success to Life with NGLC and Bravely

For Ethical AI, Listen to Teachers

Translate this page from English...

Critical Thinking Testing and Assessment

Consequential Validity

Critical Thinking About Measuring Critical Thinking

Supplement to Critical Thinking

Support SEP

People also looked at

Introduction

Conceptual Foundations, Definition and Measurement of Critical Thinking

Concept and Definition of Critical Thinking

Measurement of Critical Thinking

An Approach to Performance Assessment of Critical Thinking: The iPAL Program

iPAL Background

iPAL Assessment Framework

iPAL Task Framework

Use of iPAL Performance Assessments in Educational Practice: Evidence From Preliminary Validation Studies

Concluding Reflections

Future Perspectives

Performance Assessment in Online Learning Environment

Data Availability Statement

Author Contributions

Conflict of Interest

Acknowledgments

This article is part of the Research Topic

Promoting and Assessing Critical Thinking

Thinking through inquiry

Generic question stems

Critical thinking through writing

Critical thinking through group collaboration

Assessing critical thinking skills

Catalog search

Measuring, Assessing and Evaluating Thinking Skills in Educational Settings: A Necessity for Twenty-First Century

Cite this chapter

Graphical Abstract/Art Performance

Access this chapter

Similar content being viewed by others

A Scoping Review of Empirical Research on Recent Computational Thinking Assessments

Assessment of Real-World Problem-Solving and Critical Thinking Skills in a Technology Education Classroom

Edward de Bono

Author information

Corresponding author

Editor information

Rights and permissions

Copyright information

About this chapter

Download citation

Share this chapter

Measuring Critical Thinking: Can It Be Done?

“Critically evaluate arguments, assumptions, abstract concepts and data (that may be incomplete), to make judgements, and to frame appropriate questions to achieve a solution – or identify a range of solutions – to a problem.”

Employers Look for Critical Thinking Skills

How is Critical Thinking Measured?

Critical Thinking as an A Level Subject

Critical Thinking Tests for Employment

Summarising the Strengths and Weaknesses of Critical Thinking Tests

Privacy Overview

Yes, We Can Define, Teach, and Assess Critical Thinking Skills

Sign Up for The Savvy Principal

Critical Thinking Models: A Comprehensive Guide for Effective Decision Making

Key Takeaways

Fundamentals of Critical Thinking

Definition and Importance

Core Cognitive Skills

Influence of Cognitive Biases

The Critical Thinking Process

Stages of Critical Thinking

Application in Decision Making

Critical Thinking Models

The RED Model

Bloom’s Taxonomy

Paul-Elder Model

The Halpern Critical Thinking Assessment