Upward cluster = positive correlation. Tight cluster = strong. Outliers sit far from the band. Points above the best-fit line = actual > predicted. Correlation never proves causation.
A scatter plot displays individual data points, each representing one observation on two variables. The x-axis shows one variable; the y-axis shows the other. The overall pattern of points reveals the relationship between them.
Points far from the main cluster or trend line. They may represent data errors, exceptional cases, or the most interesting observations in the data set.
Groups of points that are close together. Multiple clusters may indicate sub-populations (e.g., small vs. large companies in the same dataset).
A positive correlation between ad spend and sales does NOT prove that advertising caused sales to increase.
A single extreme outlier can significantly pull the best-fit line in its direction, misrepresenting the bulk of the data.
Even with a strong correlation, individual points have significant scatter. The line predicts on average, not exactly.
A negative correlation simply means as X increases, Y decreases. It's not inherently a problem.
Interpolating (within data range) is reliable. Extrapolating (outside data range) is speculative and prone to error.
If data has two sub-groups, the combined scatter plot may show no correlation while each sub-group does have one.
The y-intercept of the best-fit line is the predicted value when X=0, not necessarily the lowest data point.
An R² of 0.8 means 80% of Y's variation is explained by X. It does NOT mean 80% of predictions are correct.
Even if A causes B, B could also cause A (reverse causation). A scatter plot cannot distinguish direction.
With enough points, random data can appear to show patterns. Small samples especially deceive.
A scatter plot shows hours studied (x-axis) and exam score (y-axis). The points form a tight upward-sloping band. This indicates:
A scatter plot shows a point far above the line of best fit. This outlier represents:
A scatter plot of ice cream sales vs. drowning incidents shows a strong positive correlation. This means:
A scatter plot has x-axis: Price ($) and y-axis: Quantity Demanded. The points slope downward from left to right. This indicates:
A scatter plot has two distinct clusters: one in the lower-left and one in the upper-right. Within each cluster, there is no correlation. But overall, the data shows a positive correlation. This phenomenon is called:
On a scatter plot, all points lie exactly on a straight line sloping upward. The correlation coefficient (r) is:
A scatter plot shows employee satisfaction (x) vs. productivity (y). The line of best fit has a slope of 0.8. A point at x=50 has actual y=55. The predicted y at x=50 (assuming y-intercept = 10) is:
A business uses a scatter plot to analyze whether advertising spend predicts sales. The R² value is 0.25. This means:
Two scatter plots show the same data but with different axis scales. Plot 1: x-axis spans 0-100, y-axis 0-1000. Plot 2: x-axis spans 0-10, y-axis 0-10000. The correlation between the variables is:
A scatter plot shows 15 companies with Operating Cost (x) and Net Profit (y). There is a negative correlation (r = −0.85). A new company has high operating costs. A reasonable prediction based on this data is:
The overall slope of the cluster tells you the sign of the correlation.
A tight, narrow cluster = strong correlation. A wide spread = weak correlation.
They have unusually high or low values for at least one variable relative to the trend.
Even a perfect correlation (r=1) only tells you variables move together — not why.