What You'll Learn This Hour
- How Evaluate-the-Argument questions replace classic DS in GMAT Focus Edition DI
- How to isolate the unstated assumption in any argument and test it with the "negate" technique
- The "two-way test": a correct answer must be able to both strengthen AND weaken the conclusion depending on outcome
- Common traps for causal, statistical, analogy, and policy arguments — and how to avoid them under time pressure
Core Concepts
What Are Evaluate-the-Argument Questions?
GMAT Focus Edition replaced classic Data Sufficiency (the "Statement 1/Statement 2" format) with richer reasoning tasks in the Data Insights section. Evaluate-the-Argument questions are among these: you read a short argument, then choose which piece of additional information would be most useful for evaluating that argument's conclusion.
The key insight: you are not asked to prove or disprove the conclusion. You are asked which information gives you the most leverage to assess the argument — meaning the answer must be relevant to the argument's core assumption.
The Three-Step Strategy
Find the Gap
Identify what the argument assumes but never states. The gap between evidence and conclusion is where the answer lives.
Apply the Two-Way Test
For each answer choice ask: "If this information came out favorably, would it strengthen the argument? If unfavorably, weaken it?" Both must be true for the correct answer.
Eliminate Irrelevant Choices
Choices that only strengthen, only weaken, or address topics outside the argument's chain of reasoning can be eliminated immediately.
Argument Types You Will Encounter
Causal Arguments
A caused B. Gap: Could something else have caused B? Could A occur without B?
Statistical Arguments
Data supports a trend. Gap: Is the sample representative? Are base rates comparable?
Analogy Arguments
What worked in X will work in Y. Gap: Are X and Y sufficiently similar in relevant respects?
Policy Arguments
Policy P will achieve goal G. Gap: Will P actually cause G? Are there side effects that undermine G?
Decision Tree: Evaluate-the-Argument
Worked Examples
"After the city installed new streetlights along the downtown corridor last year, crime rates in that area dropped by 22%. The mayor concluded that the streetlights caused the crime reduction, and proposed expanding the program citywide."
Which of the following would be most useful for evaluating the mayor's conclusion?
- (A) Whether the new streetlights also reduced energy costs
- (B) Whether crime rates in comparable downtown areas without new streetlights also declined during the same period
- (C) Whether the mayor plans to fund the program through taxes or bonds
- (D) Whether residents in the area reported feeling safer after installation
- (E) Whether the streetlights use LED or fluorescent technology
Step 1 — Find the Gap: The argument assumes the streetlights — not some other concurrent change — caused the crime drop. This is a classic alternative cause gap.
Step 2 — Two-Way Test on (B): If crime ALSO dropped in similar areas without new streetlights, the lights may not be the cause (weakens the mayor's conclusion). If crime did NOT drop in those areas, the streetlights look more likely causal (strengthens). Both directions are relevant.
Step 3 — Eliminate: (A) is about cost, irrelevant to causation. (C) is about funding, not whether the lights work. (D) is feelings, not actual crime data. (E) is technology type, not effectiveness.
Correct Answer: (B). It directly tests the alternative-cause gap.
"A survey of 400 MBA applicants found that 68% who used a professional admissions consultant were admitted to a top-20 business school. Only 41% of applicants who did not use a consultant were admitted. Therefore, using a professional consultant significantly increases one's chance of admission."
Which of the following would be most useful for evaluating the conclusion above?
- (A) The average fee charged by professional admissions consultants
- (B) Whether the 400 applicants were surveyed online or in person
- (C) Whether applicants who used consultants had stronger underlying credentials than those who did not
- (D) How long each consultant had been working in the field
- (E) Whether the schools surveyed have increased their class sizes recently
Step 1 — Find the Gap: The comparison of admission rates assumes the two groups (consultant vs. no consultant) were equivalent in terms of qualifications. If consultant users were already stronger candidates, the consultant isn't the cause of success — selection bias is the gap.
Step 2 — Two-Way Test on (C): If consultant users HAD stronger credentials, the difference in admission rates is explained by credentials, not the consultant (weakens). If credentials were similar across both groups, the consultant looks more impactful (strengthens). Both ways are relevant.
Eliminate: (A) is about cost — irrelevant to whether consultants help. (B) is methodology of survey, but does not address the comparison bias. (D) consultant experience tells us nothing about the argument's claim. (E) class sizes affect absolute numbers but not the rate comparison's validity.
Correct Answer: (C). It addresses selection bias — the core assumption gap.
"The city of Maplewood reduced its public transit operating deficit by 30% after implementing a dynamic pricing system on bus fares. City officials in Riverside, which faces a similar transit deficit, propose implementing the same dynamic pricing system and expect to achieve comparable savings."
Which of the following would be most useful for evaluating whether Riverside will achieve comparable savings?
- (A) Whether Riverside's mayor personally supports dynamic pricing
- (B) Whether Maplewood's transit authority won any awards for the pricing initiative
- (C) Whether the demographic and ridership patterns in Riverside are similar to those in Maplewood
- (D) Whether bus fares in both cities are currently regulated by state law
- (E) Whether Riverside has more bus routes than Maplewood
Step 1 — Find the Gap: This is an analogy argument. It assumes Riverside is similar enough to Maplewood that what worked there will work here. The gap: Are the two cities comparable in ways relevant to the policy's success?
Step 2 — Two-Way Test on (C): If demographics and ridership ARE similar, the analogy holds and Riverside can expect similar savings (strengthens). If they are NOT similar (e.g., Riverside riders are more price-sensitive and will stop riding entirely), the policy may not achieve comparable results (weakens). Both directions are relevant.
Eliminate: (A) is political opinion, doesn't affect financial outcome. (B) awards are irrelevant to whether Riverside replicates results. (D) regulatory context could matter, but is less directly tied to ridership outcomes than (C). (E) number of routes tells us scale but not whether riders will respond similarly to pricing changes.
Correct Answer: (C). The analogy gap is whether the two cities are comparable in relevant characteristics.
GMAT Traps to Avoid
A choice that only strengthens or only weakens the argument is NOT the answer to an Evaluate question. You need a choice that does both — depending on what the information reveals.
Some choices contain facts that are true and interesting but do not address the argument's core assumption. If the information does not touch the gap between evidence and conclusion, eliminate it.
Don't pick the choice that outright proves or disproves the conclusion. The question asks what helps EVALUATE (assess) the argument, not what settles it. If the answer already decides the question, it's too strong.
GMAT often includes choices that mention words from the passage but address a subtly different issue. Read carefully: does the choice probe the specific reasoning gap, or just sound related to the topic?
Practice Questions
12 Evaluate-the-Argument questions. Click "Show Answer" after attempting each.
"Since the organic food store opened on Main Street three years ago, annual visits to the local hospital's emergency room have declined by 18%. The store's owner argues that access to healthy food options has improved residents' health and reduced emergency situations."
Which of the following would be most useful for evaluating the store owner's argument?
- (A) Whether the organic store's prices are higher than conventional grocery stores
- (B) Whether the store owner holds a degree in nutrition
- (C) Whether ER visits in neighboring towns without the store also declined during the same period
- (D) Whether the hospital added a new urgent care clinic nearby that may have redirected some visits
- (E) Both C and D
Show Answer
Correct Answer: (E) — Both C and D
The owner claims the store caused better health. Two major alternative explanations exist: (C) a general regional trend unrelated to the store, and (D) a healthcare supply change (new clinic) that redirected visits rather than reducing emergencies. Both pieces of information directly test the causal claim from different angles. If neither alternative applies, the argument strengthens; if either does, it weakens.
"A study of 1,000 adults found that people who sleep fewer than six hours per night score 15% lower on standardized memory tests than those who sleep seven or more hours. The study's lead researcher concluded that insufficient sleep impairs memory function."
Which of the following would be most useful for evaluating this conclusion?
- (A) The average age of participants in the study
- (B) Whether the memory tests used were validated by independent researchers
- (C) Whether participants were randomly assigned to sleep durations or self-selected their sleep habits
- (D) Whether the researchers had previously published studies on sleep
- (E) The country in which the study was conducted
Show Answer
Correct Answer: (C)
The key gap is causation vs. correlation and selection bias. If participants self-selected their sleep (observational study), then people who sleep less might do so because of stress, anxiety, or illness — conditions that also impair memory. The sleep itself may not be the cause. If participants were randomly assigned, causation is much better supported. Both outcomes are directly relevant to the conclusion.
"To reduce distracted driving accidents, the state legislature passed a law banning handheld cell phone use while driving. The transportation commissioner predicted that the ban would reduce overall traffic fatalities by at least 10% within two years."
Which of the following would be most useful for evaluating the commissioner's prediction?
- (A) Whether handheld cell phone use while driving was a significant contributor to traffic fatalities in the state before the ban
- (B) Whether the ban applies to commercial trucks as well as passenger vehicles
- (C) Whether the legislature considered other distracted-driving measures
- (D) Whether the commissioner has a background in traffic safety
- (E) Whether similar bans in other states targeted pedestrian use as well
Show Answer
Correct Answer: (A)
The prediction assumes that cell phone use was a major cause of fatalities. If handheld cell phone use WAS responsible for a large share of fatalities, a ban could plausibly achieve 10% reduction (strengthens). If cell phone use was only a minor factor, even perfect compliance won't move fatalities by 10% (weakens). This is the core assumption that determines whether the policy can achieve its stated goal.
"TechStart, a software startup in Austin, doubled its revenue after implementing a four-day work week. Horizon Consulting, a management consulting firm in New York, is considering the same policy and expects similar revenue growth."
Which of the following would be most useful for evaluating whether Horizon will achieve similar growth?
- (A) Whether Horizon's employees would prefer a four-day week
- (B) Whether TechStart's doubled revenue was primarily driven by a product launch that coincided with the policy change
- (C) The number of employees at TechStart versus Horizon
- (D) Whether consulting clients typically require availability five days a week, unlike software customers
- (E) Both B and D
Show Answer
Correct Answer: (E) — Both B and D
(B) tests whether TechStart's growth was actually due to the four-day week or a confounding factor — if a product launch drove the growth, the analogy collapses. (D) tests whether the industries are comparable: consulting clients may require five-day availability, making a four-day policy harmful rather than neutral. Both directly probe the analogy's validity from different angles.
"Employees at Varner Corp. who participated in the company's voluntary wellness program missed an average of 3 fewer sick days per year than employees who did not participate. Management concluded that the wellness program improves employee health."
Which of the following would be most useful for evaluating management's conclusion?
- (A) Whether the wellness program was designed by an outside vendor
- (B) Whether employees who joined the program were already healthier and more health-conscious than those who did not
- (C) Whether the company offered financial incentives for program participation
- (D) Whether 3 sick days represents a statistically significant difference
- (E) Whether the program includes mental health components
Show Answer
Correct Answer: (B)
The program is voluntary, creating a self-selection problem. If participants were already healthier people (who also happen to exercise, eat well, etc.), the lower sick days reflect their pre-existing health rather than the program's effect (weakens). If participants and non-participants had similar baseline health, the wellness program looks effective (strengthens). This is the central assumption gap.
"To increase voter turnout, the county election board proposes moving polling locations to shopping malls. They argue that placing polls in convenient, high-traffic areas will make voting easier and thereby increase participation."
Which of the following would be most useful for evaluating the election board's argument?
- (A) Whether shopping malls have adequate parking facilities
- (B) Whether the primary reason eligible voters do not vote is inconvenient polling location rather than apathy or distrust
- (C) Whether mall management has agreed to provide free space for polling
- (D) Whether other counties have recently moved polling locations
- (E) Whether electronic voting machines are compatible with mall infrastructure
Show Answer
Correct Answer: (B)
The policy assumes that inconvenient location is the primary barrier to voting. If convenience IS the main barrier, the mall policy could boost turnout (strengthens). If non-voters stay home due to apathy, distrust, or other reasons, moving polling locations won't help (weakens). This directly tests the policy's underlying premise.
"A manufacturing company reports that its defect rate dropped from 4.2% to 1.8% after deploying a new quality control software. An industry analyst argues that the software is highly effective and recommends it to other manufacturers."
Which of the following would be most useful for evaluating the analyst's recommendation?
- (A) Whether the software was developed domestically or abroad
- (B) Whether the company also changed its raw material supplier or production process at the same time as the software deployment
- (C) Whether the analyst has a financial stake in the software company
- (D) Whether 1.8% is the industry average defect rate
- (E) Whether the software requires significant employee training
Show Answer
Correct Answer: (B)
The analyst's claim is that the software caused the defect rate to drop. If other changes (new supplier, new process) were made simultaneously, those could explain the improvement rather than the software (weakens the recommendation). If no other changes were made, the software is more likely the cause (strengthens). This tests the attribution of causation.
"A university in Canada found that students who took mandatory study skills workshops in their first semester earned GPAs 0.4 points higher on average than the national average for first-year students. A university in Brazil is considering making similar workshops mandatory, expecting the same GPA improvement."
Which of the following would be most useful for evaluating whether the Brazilian university will see a similar GPA improvement?
- (A) Whether the Canadian university is a public or private institution
- (B) Whether the Brazilian university's current first-year GPA is already above the national average, leaving little room for improvement
- (C) Whether the study skills workshops cover time management and note-taking
- (D) Whether Brazilian and Canadian academic grading systems are structured similarly enough to make GPA comparisons meaningful
- (E) Both B and D
Show Answer
Correct Answer: (E) — Both B and D
(B) addresses a ceiling effect: if students already score above average, a 0.4 point improvement may be mathematically impossible (weakens). If there is room to improve, the workshop could achieve the target (strengthens). (D) addresses whether GPA means the same thing in both systems — if grading is structured differently, the comparison benchmark is invalid. Both are essential to evaluating the analogy.
"Sunflower Valley's property values increased 25% in the five years following the construction of a new light rail station in the neighborhood. A real estate developer concluded that proximity to light rail infrastructure directly increases property values."
Which of the following would be most useful for evaluating the developer's conclusion?
- (A) Whether Sunflower Valley is zoned for commercial or residential use
- (B) Whether property values in similar neighborhoods without new light rail stations also rose during the same five-year period
- (C) Whether the light rail station serves multiple lines or only one
- (D) Whether the developer owns property near the station
- (E) Whether the city's overall population grew during those five years
Show Answer
Correct Answer: (B)
The developer assumes light rail caused the price increase. A proper control comparison would show whether comparable neighborhoods without light rail also saw 25% growth (from broader market forces). If they did, the rail station is not the differentiating factor (weakens). If those neighborhoods grew much less, the rail station looks like the cause (strengthens). Note: (E) is tempting but only partially addresses the issue — (B) is the direct counterfactual test.
"A school district proposes eliminating standardized tests in grades 3–5 to reduce student stress, arguing that high-stakes testing is the primary source of anxiety in young students and that removal will measurably improve student well-being."
Which of the following would be most useful for evaluating the school district's argument?
- (A) Whether standardized tests in grades 3–5 are mandated by the state government
- (B) Whether research shows that standardized tests are among the most significant sources of anxiety for students in those grades, compared to other sources such as social pressures or family factors
- (C) Whether teachers in the district support or oppose the elimination of tests
- (D) Whether eliminating tests would affect the district's federal funding
- (E) Whether the tests are administered online or in paper format
Show Answer
Correct Answer: (B)
The policy argument assumes testing is the PRIMARY driver of anxiety. If research confirms tests are the dominant anxiety source for this age group, removing them would address the root cause (strengthens). If anxiety stems mainly from social pressures or home environments, eliminating tests may not improve well-being (weakens). This probes the central causal assumption of the policy.
"Hospital A's 30-day patient readmission rate is 12%, while Hospital B's is 19%. A healthcare reporter concluded that Hospital A provides significantly better care than Hospital B, based on readmission rates as a measure of quality."
Which of the following would be most useful for evaluating the reporter's conclusion?
- (A) Whether Hospital A is located in an urban or suburban area
- (B) Whether both hospitals serve patient populations with similar severity of illness and socioeconomic conditions
- (C) Whether the reporter has previously written about healthcare quality
- (D) Whether readmission rates are tracked consistently under the same definition at both hospitals
- (E) Both B and D
Show Answer
Correct Answer: (E) — Both B and D
(B) addresses case-mix adjustment: Hospital B may serve sicker, lower-income patients who are more likely to be readmitted regardless of care quality. Without adjusting for patient population severity, the comparison is invalid. (D) addresses whether the metric means the same thing at both hospitals — if one hospital counts readmissions differently, the numbers aren't comparable. Both are essential to evaluate whether the statistic reflects actual care quality differences.
"BrightBean Coffee reduced customer wait times by 40% after switching to a mobile pre-ordering system. GourmetBite, a fast-casual restaurant chain, is planning to implement the same pre-ordering system and expects to reduce wait times by a similar margin."
Which of the following would be most useful for evaluating whether GourmetBite will achieve a similar reduction?
- (A) Whether GourmetBite's menu is more complex and customizable than BrightBean's, making pre-order preparation more time-consuming
- (B) Whether BrightBean's customers rated the pre-ordering system highly in satisfaction surveys
- (C) Whether GourmetBite has ever offered any digital ordering options before
- (D) Whether the mobile pre-ordering system requires a smartphone app or also works via web browser
- (E) Whether GourmetBite has more locations than BrightBean
Show Answer
Correct Answer: (A)
The analogy assumes the two businesses will respond similarly to pre-ordering. The most critical difference is menu complexity: if GourmetBite's customizable orders require longer kitchen preparation even when pre-ordered, the bottleneck shifts from ordering to preparation — and wait time savings would be minimal (weakens). If the menus are similarly simple and preparation times are comparable, GourmetBite could achieve similar wait time reductions (strengthens). This directly tests the analogy's core assumption about operational similarity.
Quick Reference Card
# Evaluate-the-Argument: Rules & Formulas
## The Two-Way Test
Correct answer: If TRUE → strengthens conclusion
If FALSE → weakens conclusion
Wrong answer: Only strengthens OR only weakens
## Argument Gap by Type
Causal → Alternative cause? Reverse causation?
Stats → Sample bias? Apples-to-oranges comparison?
Analogy → Are the two situations relevantly similar?
Policy → Will the policy actually achieve the goal?
## Elimination Rules
KILL if: Choice is irrelevant to the reasoning gap
KILL if: Choice only goes one direction (strengthen/weaken)
KILL if: Choice proves/disproves conclusion outright
KILL if: Adjacent topic — uses passage words but wrong focus
## Time Budget
Read argument: ~45 sec
Identify gap: ~30 sec
Evaluate each choice: ~20 sec each
Total target: ~2 min 30 sec per question
## Classic Assumption Signals
"Therefore" / "Thus" → conclusion word — find the gap before it
"Because" / "Since" → premise word — evidence side of the gap
"Should" / "Must" → policy/prescription — goal assumption gap