How to Design Better Tests, Based on the Research

A review of a dozen recent studies reveals that teachers need to consider bias, rigor, and mindset to design good tests.


Claire Longmoor didn’t expect her math problem to go viral. 

“An orchestra of 120 players takes 40 minutes to play Beethoven’s 9th symphony,” the question read. “How long would it take for 60 players to play the symphony?”

As with so many puzzles that find their way to the internet, the responses were radically split and mostly wrong: One group of people, who were perhaps reading too fast, confidently declared the answer to be 20 minutes. The second camp reasoned that half as many musicians would have to work twice as hard, so the answer must be 80 minutes.

Yet a third group was stupefied, questioning the teacher’s ability to write good questions. “Think the person who came up with that question really doesn’t know how an orchestra works!” the Wexford Sinfonia Orchestra tweeted.

It’s a trick question, Longmoor admitted one designed to keep her students on their toes, echoing a common sentiment among test makers that such questions force students to read carefully, ensuring that they attend to substantive questions later on. But do trick questions actually work as intended?

Andrew Butler, a professor of psychological and brain sciences at Washington University, doesn’t think so. Trick questions are not “productive for learning” and can easily backfire, he says. The result: confused students, artificially reduced test performance, and a murkier picture of what students actually know.

Other research on test design suggests that all too often, we’re not just assessing what students know, but also getting a peek into the psychological and cognitive eddies that disrupt a student’s thinking—a high-stakes test that causes anxiety can become a barometer of a student’s poise, rather than their knowledge. A well-designed test is rigorous and keeps implicit bias in check while being mindful of the role that confidence, mindset, and anxiety play in test-taking. Here are eight tips to create effective tests, based on a review of more than a dozen recent studies.


Students often overestimate how prepared they are for an upcoming test, which can result in unexpected low performance, according to a 2017 study. Consider asking students to make and show you a study plan involving productive study strategies like self-quizzing, teaching the major concepts to peers, or spacing out their studying into multiple sessions instead of cramming the night before.

To help address test anxiety, researchers recommend setting aside a little time for simple writing or self-talk exercises before the test—they allow students to shore up their confidence, recall their test-taking strategies, and put the exam into perspective. In a 2019 study, for example, elementary students who spent a few minutes before a test and “silently spoke words of encouragement to themselves that were focused on effort” saw their math scores rise. And in a 2019 study of ninth graders, researchers found that a simple 10-minute expressive writing activity that reframed test anxiety as “a beneficial and energizing force” led to course failure rates being cut in half for vulnerable students.


Design tests so they’re at an appropriate level of difficulty for your students. Overly difficult tests sap students’ motivation and increase the likelihood that students will remember the wrong answers, according to a 2018 study.

In the end, “tests that are extremely easy or difficult are essentially useless for both assessment and learning,” the study concludes. Students who study moderately should get roughly 70 to 80 percent of the questions correct. 


Don’t start a test with challenging questions; let students ease into a test. Asking difficult questions to probe for deep knowledge is important, but remember that confidence and mindset can dramatically affect outcomes—and therefore muddy the waters of your assessment. 

A 2021 study found that students were more likely to do worse on a test if difficult questions were at the beginning instead of nearer to the middle or the end of the test. “Students might be disheartened by seeing a hard question early in the test, as a signal of the general difficulty of the rest of the test,” the researchers explain.


Question format matters. In a 2018 study, researchers analyzed test scores for 8 million students and discovered that boys tend to outperform girls on multiple-choice questions, accounting for roughly 25 percent of the gender achievement gap. Girls performed significantly better than boys on open-ended questions. Consider the mix of your testing formats: Combine traditional testing formats—multiple choice, short answer, and essay questions—with creative, open-ended assessments that can elicit different strengths and interests.

Be mindful, as well, of how cultural or racial bias and background knowledge can infiltrate the language and framing of test questions. In an infamous example, an SAT analogy question required students to select “oarsman:regatta” in response to the word pair “runner: marathon,” an expectation that was fraught with classist, racial, and geographic overtones. 

Other studies reveal that without a threshold of background knowledge, students fail to grasp the intent of their reading—an incorrect answer on a test may signify the failure to determine the meaning of the question, rather than measure the student’s understanding of the material. Keep test questions free of unnecessary jargon, revise tests to simplify questions, and consider allowing students to ask for clarification before you start the test. 



While it may be tempting to include trick questions to make sure that students are paying attention, they can get stuck or confused, wasting precious time and compromising the rest of the test as a result, a 2018 study concludes. 

Tests aren’t just tools to evaluate learning; they can also alter a student’s understanding of a topic. So if students try to recall information they’re unsure about, they may reconstruct it incorrectly, increasing the likelihood that they will retain false information. For example, if you asked, “What was George Washington’s goal with writing the Emancipation Proclamation?” some students may commit it to memory and connect the wrong president to the seminal historical document. 


Instead of a single high-stakes test, consider breaking it into smaller low-stakes tests that you can spread throughout the school year. That strategy alleviated test anxiety for 72 percent of middle and high school students, according to a 2014 study. 

The likely reason? When students take high-stakes tests, their cortisol levels—a biological marker for stress—rise dramatically, impeding their ability to concentrate and artificially lowering test scores, a 2018 study found. Stress is a normal part of test-taking, but there are kinds of stress that should be avoided, such as whether the student thinks they’ll be able to finish.


Time limits are unavoidable, but you can mitigate their pernicious effects on anxiety levels. “Evidence strongly suggests that timed tests cause the early onset of math anxiety for students across the achievement range,” explains Jo Boaler, a mathematics professor at Stanford. This extends to other subjects as well, according to a 2020 study, which also found that timed tests disproportionately harm students with disabilities.

If a student aces most of the test but then gets the last few questions wrong or leaves them blank, it’s possible that they panicked as the time limit approached—or knew the information intimately but simply couldn’t finish the test. It may be helpful to time yourself taking the test and cut a few questions so that it’s clearly shorter than your class period. 


Sometimes, less design is better: The research suggests that one effective strategy, at least periodically, is to ask students to write their own test questions. 

In a 2020 study, students who generated test questions scored 14 percentage points higher than students who simply reviewed the material. “Question generation promotes a deeper elaboration of the learning content,” explains psychology professor Mirjam Ebersbach. “One has to reflect what one has learned and how an appropriate knowledge question can be inferred from this knowledge.” Model question-asking for students—highlighting your own examples first—and then teach them how to ask good questions. They may start with simple factual questions, but with enough practice, they can propose questions that start with “Explain” or that dig deeper into a topic with how and why questions. 


Beyond test design, there’s the important question of what happens after a test. All too often, students receive a test, glance at the grade, and move on. But that deprives them, and the teacher, of a valuable opportunity to address misconceptions and gaps in knowledge. Don’t think of tests as an endpoint to learning. Follow up with feedback, and consider strategies like “exam wrappers”— short metacognitive writing activities that ask students to review their performance on the test and think about ways they could improve in future testing scenarios.

You might also rethink your policy around test retakes. While students can certainly take unfair advantage of some test-retaking policies, there are innovative approaches that preserve the integrity of the initial test while allowing students to recover partial credit for materials they haven’t successfully learned. Set clear limits, pose a different set of questions—or allow partial credit for demonstrating deep knowledge of questions they missed on the test—or ask students to reflect on why they missed earlier questions and what they can do to improve in the future, teachers recommend.

By Youki Terada