Quantitative Methods · Parametric and Non-Parametric Tests of Independence · LO 1 of 2

You found a correlation of 0.42 between two fund returns, but is it real, or just random noise?

Distinguish parametric and nonparametric hypothesis tests for correlation, apply the correct test to your data, and decide whether the relationship is statistically significant.

⏱ 8min-15min

6 questions

HIGH PRIORITYANALYZE🧮 Calculator

Why this LO matters

Distinguish parametric and nonparametric hypothesis tests for correlation, apply the correct test to your data, and decide whether the relationship is statistically significant.

INSIGHT

A sample correlation of 0.42 might be real, reflecting a true relationship in the population, or it might be an accident of the sample you happened to draw. The hypothesis test is your tool to decide. You ask: "If there were actually no relationship in the population (ρ = 0), how often would I see a sample correlation this extreme just by chance?" If the answer is "very rarely," rarer than your significance level allows, you reject the assumption of no relationship and conclude the correlation is real. If the answer is "this happens all the time by chance," you fail to reject the null. The test is the only way to move from data to a defensible conclusion.

How to Test Whether a Correlation Is Real

Think about a coin. You flip it ten times and get seven heads. Does that prove it is a biased coin? Maybe. Or maybe a fair coin just happened to land heads seven times. The only way to decide is to ask: how likely is seven heads if the coin really is fair? If the answer is "unlikely enough," you conclude the coin is biased.

Correlation testing works the same way. You observe a sample correlation r and ask: how likely is this value if the true population correlation is actually zero? The t-statistic measures that probability. The critical value is your threshold.

Five things to hold in mind simultaneously

The parametric test of correlation. Assumes both variables are normally distributed. Uses the Pearson correlation coefficient (r) and a t-statistic to test whether the population correlation coefficient (rho, ρ) equals zero. Use this when your data come from a normal distribution with no extreme outliers or skewness.

The nonparametric test of correlation. Makes no assumption about the underlying distribution. Ranks the data within each variable before calculating correlation. Use this when data depart from normality, contain outliers, are already in rank form, or show strong skewness.

The null hypothesis structure. Every correlation test begins by assuming no relationship exists (H₀: ρ = 0) and then asks whether the sample evidence contradicts that assumption. The alternative hypothesis can be two-sided (ρ ≠ 0) or one-sided (ρ > 0 or ρ < 0), depending on your research question.

The decision rule. For both parametric and nonparametric tests: reject H₀ if |t_calculated| > |t_critical|. The rejection region is in the tails, beyond the critical value. Not in the centre. This direction is where candidates go wrong.

Degrees of freedom. When testing correlation, degrees of freedom = n − 2, where n is the number of observation pairs. The reduction by 2 reflects that correlation involves estimating two means, one for each variable.

The Parametric Test: Pearson Correlation

The wrong instinct here is to read the magnitude of r and decide whether it "looks significant." An r of 0.40 across 200 observations is highly significant. An r of 0.40 across 8 observations is not. The sample size is inseparable from the judgment.

The parametric test converts r into a t-statistic that accounts for both the magnitude of the correlation and the sample size. You then compare that t-statistic to a critical value.

Parametric t-statistic for correlation

t = r × √(n − 2) / √(1 − r²)

where:
r = the sample Pearson correlation coefficient (given in the question)
n = the number of paired observations
df = n − 2 (degrees of freedom for looking up the critical value)

Condition: applies when both variables are normally distributed.
Note: the numerator grows with √n, so larger samples make it easier
to reject H₀ at the same significance level.

The Nonparametric Test: Spearman Rank Correlation

When the normality assumption fails, the t-statistic above is no longer valid. The fix is to replace raw values with their rank order within each variable. Ranks cannot have extreme outliers, because the highest value always gets rank 1 and the lowest gets rank n, regardless of how extreme the values are.

Spearman rank correlation coefficient

r_s = 1 − (6 × Σdᵢ²) / (n × (n² − 1))

where:
dᵢ = the difference in ranks for observation pair i
(rank of Xᵢ minus rank of Yᵢ)
n = the number of paired observations

Condition: applies when the normality assumption is violated,
data are ordinal, or extreme outliers are present.
Ties: assign the average of the tied ranks
(e.g., two observations tied for 3rd and 4th get rank 3.5).
// Once you have r_s, test it using exactly the same t-statistic formula as the parametric test, substituting r_s for r. The decision rule and degrees of freedom are unchanged.

FORWARD REFERENCE

Normal distribution, what you need for this LO only

A symmetric, bell-shaped probability distribution with no hard upper or lower boundary. For this LO, you only need to recognise when a question states or implies the data are not normally distributed. That is your signal to use the Spearman test instead of the Pearson test. You will study the normal distribution fully in Quantitative Methods Modules 1-2.

→ Quantitative Methods

FORWARD REFERENCE

t-distribution, what you need for this LO only

A bell-shaped distribution with heavier tails than the normal distribution, used when the population standard deviation is unknown. For this LO, you only need to look up critical t-values in a table using the significance level and degrees of freedom (n − 2), then compare your calculated t to those values. You will study the t-distribution fully in Quantitative Methods Module 2.

→ Quantitative Methods

Which test to apply

Ask one question first: are both variables plausibly normally distributed? Use the parametric Pearson t-test when: - The question states the data are approximately normally distributed. - The variables are continuous with no natural floor or ceiling. - There are no stated extreme outliers or strong skewness. Use the nonparametric Spearman rank test when: - The question states the data are non-normal, skewed, or have bounded ranges. - The variables cannot go negative (like expense ratios, prices, or counts). - The data are already expressed as ranks or ordered categories. - Extreme outliers are present. - The sample is very small and normality is implausible. In both cases, the t-statistic formula and the decision rule are identical. Only the input correlation coefficient changes.

How to Apply the Tests: Worked Examples

The following examples build from a single straightforward case to a full matrix, then to the nonparametric decision. Work through each one in order.

Worked Example 1

Parametric t-test for correlation, one-sided test

Priya Menon is a research analyst at Vantara Capital, a mid-sized asset management firm in Singapore. She has collected 40 months of returns for one of the firm's actively managed equity funds and the regional benchmark index. She calculates a sample Pearson correlation of r = 0.418 between the two return series. She wants to determine whether there is a statistically significant positive correlation between them at the 1% significance level.

🧠Thinking Flow — One-sided parametric t-test for correlation

The question asks

Is there evidence of a real positive correlation in the population, or could r = 0.418 have appeared by chance even when the true population correlation is zero or negative?

Key concept needed

Parametric t-statistic for correlation. Note: many candidates write a two-sided alternative out of habit. The question says "positive correlation," which is a directional claim. That matters for the critical value lookup.

Step 1, State the hypotheses

The wrong default is H₀: ρ = 0 versus Hₐ: ρ ≠ 0. That is a two-sided test. The question specifies a positive relationship. The correct hypotheses are: H₀: ρ ≤ 0 versus Hₐ: ρ > 0 This is a one-sided (right-sided) test.

Step 2, Apply the formula

t = r × √(n − 2) / √(1 − r²) Substituting r = 0.418 and n = 40: Denominator: 1 − (0.418)² = 1 − 0.174724 = 0.825276. Take the square root: √0.825276 = 0.90845. Numerator: 0.418 × √(40 − 2) = 0.418 × √38 = 0.418 × 6.1644 = 2.5767. t = 2.5767 / 0.90845 = 2.836

Step 3, Look up the critical value

Degrees of freedom = 40 − 2 = 38. For a one-sided test at 1% significance with 38 df, the critical value from a t-table is 2.429.

Step 4, Apply the decision rule

|t_calculated| = 2.836 > 2.429 = t_critical. Reject H₀.

Step 5, Sanity check

Direction: r is positive (0.418), so t must also be positive. ✓ Magnitude: r = 0.418 is a moderately strong correlation over 40 observations. Expecting t to sit comfortably above the 1% threshold is reasonable. 2.836 > 2.429. ✓ ✓ Answer: Reject H₀. The calculated t-statistic (2.836) exceeds the critical value (2.429). There is sufficient evidence at the 1% level to conclude that a statistically significant positive correlation exists between the fund returns and the index returns.

🧮 Clear first:

`2ND``0`(CLRWORK)

Clears any stored worksheet → 0

Keysequence

What it does → Display

---

--- → ---

`.418``x²`

Computes r² = 0.418² → 0.174724

`+/-``+``1``=`

Computes 1 − 0.174724 → 0.825276

`√x`

Square root of denominator → 0.90845

`STO``1`

Stores denominator in register 1 → 0.90845

Keysequence

What it does → Display

---

--- → ---

`38``√x`

√38 = √(n − 2) → 6.16441

`×``.418``=`

Multiplies by r → 2.57672

Keysequence

What it does → Display

---

--- → ---

`÷``RCL``1``=`

Divides numerator by stored denominator → **2.836**

⚠️ Forgetting to take the square root of (1 − r²), using 0.825276 instead of 0.90845 as the denominator, gives t = 3.121. That number also exceeds the critical value in this example, so the conclusion happens to be the same. But the test statistic is wrong, and in questions where the margin between t_calculated and t_critical is narrow, this error flips the conclusion.

Calculate the denominator (√(1 − r²)): Calculate the numerator (r × √(n − 2)): Compute the t-statistic (numerator ÷ denominator):

Worked Example 2

Parametric t-test, two-sided test across a correlation matrix

Kwame Asante is a portfolio construction analyst at Meridian Fund Solutions in Accra. He has downloaded 36 months of returns for four equity funds (Growth, Balanced, Value, and Frontier) and a domestic stock index. He wants to test all pairwise correlations to identify which pairs show a statistically significant linear relationship. He uses a 5% significance level, a two-sided alternative hypothesis, and has been told the critical t-values for this test are ±2.032.

The correlation matrix:

	Growth	Balanced	Value	Frontier
Balanced	0.9516
Value	0.4621	0.3986
Frontier	0.7111	0.7238	0.3102
Index	0.8277	0.8223	0.5791	0.7515

🧠Thinking Flow — Two-sided parametric t-test applied to a correlation matrix

The question asks

For each pairwise correlation, does the sample r provide enough evidence to reject H₀: ρ = 0 at the 5% level?

Key concept needed

Two-sided parametric t-test. The decision rule is |t_calculated| > |t_critical|. The absolute value on both sides is what many candidates omit when the correlation is negative.

Step 1, Name the common mistake

Many candidates scan the correlation matrix and decide significance by eyeballing the magnitude of r. An r of 0.3102 looks "small," so they conclude it cannot be significant without testing. This intuition is backwards. A moderate correlation over a large sample can be highly significant. A weak correlation over a small sample can easily be insignificant. The only way to decide is to calculate t for every pair and compare to the critical value. Never skip the calculation step.

Step 2, State the decision rule

H₀: ρ = 0 versus Hₐ: ρ ≠ 0 for each pair. With n = 36, degrees of freedom = 36 − 2 = 34. Critical values are ±2.032. Reject H₀ for any pair where |t| > 2.032.

Step 3, Calculate t for the smallest correlation in the matrix (Value vs. Frontier, r = 0.3102)

Denominator: √(1 − 0.3102²) = √(1 − 0.09622) = √0.90378 = 0.95074. Numerator: 0.3102 × √(36 − 2) = 0.3102 × √34 = 0.3102 × 5.8310 = 1.8088. t = 1.8088 / 0.95074 = 1.903 |t| = 1.903 < 2.032. Fail to reject H₀ for this pair.

Step 4, Calculate t for a mid-range correlation (Value vs. Index, r = 0.5791) as a comparison

Denominator: √(1 − 0.5791²) = √(1 − 0.33535) = √0.66465 = 0.81530. Numerator: 0.5791 × √34 = 0.5791 × 5.8310 = 3.3774. t = 3.3774 / 0.81530 = 4.143 |t| = 4.143 > 2.032. Reject H₀ for Value vs. Index.

Step 5, Sanity check with the largest correlation (Growth vs. Balanced, r = 0.9516)

This must produce the largest t-statistic. Denominator: √(1 − 0.9516²) = √(1 − 0.90554) = √0.09446 = 0.30734. Numerator: 0.9516 × √34 = 0.9516 × 5.8310 = 5.5486. t = 5.5486 / 0.30734 = 18.05 18.05 >> 2.032. The highest r produces the highest t. ✓ ✓ Answer: The only non-significant correlation is Value Fund vs. Frontier Fund (t = 1.903 < 2.032). All other pairwise correlations are statistically significant at the 5% level. For every pair except Value-Frontier, reject H₀: ρ = 0.

🧮 Demonstrating Value vs. Frontier (r = 0.3102, n = 36):

`2ND``0`

Clear worksheet → 0

`.3102``x²`

Computes r² → 0.09622

`+/-``+``1``=`

Computes 1 − r² → 0.90378

`√x`

Square root of denominator term → 0.95074

`STO``1`

Stores denominator → 0.95074

`34``√x`

√(n − 2) = √34 → 5.83095

`×``.3102``=`

Multiplies by r → 1.80870

`÷``RCL``1``=`

Divides by stored denominator → **1.903**

⚠️ Using n = 36 instead of n − 2 = 34 inside the square root gives √36 = 6.000 in the numerator, producing t = 1.960 instead of 1.903. In this example, 1.960 is still below 2.032, so the conclusion is unchanged. But this is the most common arithmetic error in this formula, and it flips the conclusion in questions where the true t-statistic falls close to the critical boundary.

Worked Example 3

Choosing between parametric and nonparametric tests

Fatima Al-Rashidi is a risk analyst at the Doha office of a global commodities trading firm. She is investigating whether mutual fund management fees (expense ratios) are correlated with risk-adjusted excess returns (alpha) across nine emerging-market bond funds. Her colleague suggests using the standard Pearson correlation t-test. Fatima suspects that is inappropriate given the nature of the variables. She has the data below and must decide which test to apply and why.

Fund	Alpha	Expense Ratio
Indus Bond	−0.52	1.34
Nile Growth	−0.13	0.40
Tigris Income	−0.50	1.90
Euphrates Core	−1.01	1.50
Zambezi Flex	−0.26	1.35
Congo Value	−0.89	0.50
Volga Select	−0.42	1.00
Mekong Blend	−0.23	1.50
Ganges Stable	−0.60	1.45

🧠Thinking Flow — Choosing the correct correlation test

The question asks

Is the parametric Pearson t-test appropriate here, or should Fatima use the Spearman rank test?

Key concept needed

The condition for the parametric test is that both variables are normally distributed. Bounded variables with natural floors violate this assumption structurally.

Step 1, Check the distributional assumptions

The parametric test requires both variables to be normally distributed. A normal distribution has no hard lower boundary. Expense ratios cannot be negative. They are bounded from below at zero. Alpha also cannot fall below −100%. Both variables have natural floors. With only 9 observations and bounded variables, the normality assumption is implausible. The nonparametric Spearman rank correlation is the appropriate test.

Step 2, Rank the observations

Rank each variable from highest (rank 1) to lowest (rank 9). For ties in Expense Ratio (two funds at 1.50), assign both the average of the ranks they occupy: positions 2 and 3 in the ranking, so both get rank 2.5. Note on ranking Alpha: −0.13 is the least negative, so it is the highest alpha and gets rank 1. −1.01 is the most negative, so it gets rank 9.

Fund	Alpha	Expense	Rank(Alpha)	Rank(Expense)	d	d²
Indus Bond	−0.52	1.34	6	6	0	0
Nile Growth	−0.13	0.40	1	9	−8	64
Tigris Income	−0.50	1.90	5	1	4	16
Euphrates Core	−1.01	1.50	9	2.5	6.5	42.25
Zambezi Flex	−0.26	1.35	3	5	−2	4
Congo Value	−0.89	0.50	8	8	0	0
Volga Select	−0.42	1.00	4	7	−3	9
Mekong Blend	−0.23	1.50	2	2.5	−0.5	0.25
Ganges Stable	−0.60	1.45	7	4	3	9
					Σd²	144.5

Step 3, Calculate r_s

r_s = 1 − (6 × Σd²) / (n × (n² − 1)) = 1 − (6 × 144.5) / (9 × (81 − 1)) = 1 − 867 / (9 × 80) = 1 − 867 / 720 = 1 − 1.20417 = −0.204

Step 4, Convert r_s to a t-statistic

Denominator: √(1 − (−0.204)²) = √(1 − 0.04162) = √0.95838 = 0.97902. Numerator: (−0.204) × √(9 − 2) = (−0.204) × √7 = (−0.204) × 2.64575 = −0.53973. t = −0.53973 / 0.97902 = −0.551

Step 5, Apply the decision rule

At the 5% level with df = 9 − 2 = 7, the critical values are ±2.306. |t| = 0.551 < 2.306. Fail to reject H₀.

Step 6, Sanity check

r_s = −0.204 is a very weak negative correlation. With only 9 observations, a correlation this weak cannot plausibly be significant. A t-statistic of −0.551 sitting far inside the bounds of ±2.306 matches intuition. ✓ The negative sign of r_s is preserved in the t-statistic. Both are negative. ✓ ✓ Answer: The Spearman rank correlation r_s = −0.204 is the correct test. The calculated t-statistic is −0.551, which falls within the bounds of ±2.306. Fail to reject H₀. There is insufficient evidence at the 5% level to conclude that expense ratios and alpha are correlated for these nine funds.

🧮 Calculating the Spearman t-statistic (r_s = −0.204, n = 9):

`2ND``0`

Clear worksheet → 0

`.204``x²`

Computes r_s² (use magnitude only) → 0.04162

`+/-``+``1``=`

Computes 1 − r_s² → 0.95838

`√x`

Square root of denominator term → 0.97902

`STO``1`

Stores denominator → 0.97902

`7``√x`

√(n − 2) = √7 → 2.64575

`×``.204``=`

Multiplies by → r_s

`+/-`

Apply negative sign (r_s is negative) → −0.53973

`÷``RCL``1``=`

Divides by stored denominator → **−0.551**

⚠️ Forgetting to apply the negative sign before the final division gives t = +0.551 instead of −0.551. The magnitude and conclusion are the same here. But for one-sided tests (H₀: ρ ≥ 0 versus Hₐ: ρ < 0), the sign of the t-statistic determines which tail you are in. Dropping the sign leads to the wrong conclusion in those cases.

Worked Example 4

Nonparametric Spearman test, applying the full process to a two-sided test

Omar Shaikh is a quantitative analyst at a fixed income research firm in Dubai. He is studying whether there is a monotonic relationship between credit rating scores and bond liquidity measures for 12 sovereign bonds. He suspects the underlying data are not normally distributed, so he decides to use the Spearman rank correlation test. He has already computed the rank differences. The sum of squared rank differences is Σd² = 82. He tests H₀: ρ_s = 0 versus Hₐ: ρ_s ≠ 0 at the 5% significance level. The critical t-value at df = 10 is ±2.228.

🧠Thinking Flow — Nonparametric Spearman test, two-sided

The question asks

Is there a statistically significant monotonic relationship between credit rating scores and bond liquidity for these 12 sovereign bonds?

Key concept needed

Spearman rank correlation formula, then the same t-test formula with r_s substituted in.

Step 1, Compute r_s

r_s = 1 − (6 × Σd²) / (n × (n² − 1)) n = 12, so n² = 144, n² − 1 = 143, n × (n² − 1) = 12 × 143 = 1716. r_s = 1 − (6 × 82) / 1716 = 1 − 492 / 1716 = 1 − 0.28671 = 0.713

Step 2, Convert r_s to a t-statistic

Denominator: √(1 − 0.713²) = √(1 − 0.50837) = √0.49163 = 0.70116. Numerator: 0.713 × √(12 − 2) = 0.713 × √10 = 0.713 × 3.16228 = 2.25471. t = 2.25471 / 0.70116 = 3.215

Step 3, Apply the decision rule

Degrees of freedom = 12 − 2 = 10. Critical values: ±2.228. |t_calculated| = 3.215 > 2.228 = t_critical. Reject H₀.

Step 4, Sanity check

r_s = 0.713 is a strong positive correlation. With 12 observations, a strong Spearman correlation should produce a t-statistic well above the 5% critical value. 3.215 > 2.228 is consistent with that expectation. ✓ ✓ Answer: r_s = 0.713. Calculated t = 3.215, which exceeds the critical value of 2.228. Reject H₀: ρ_s = 0. There is sufficient evidence at the 5% level to conclude a statistically significant monotonic relationship exists between credit rating scores and bond liquidity for these 12 sovereign bonds.

🧮 Computing the Spearman t-statistic (r_s = 0.713, n = 12):

`2ND``0`

Clear worksheet → 0

`.713``x²`

Computes r_s² → 0.50837

`+/-``+``1``=`

Computes 1 − r_s² → 0.49163

`√x`

Square root of denominator term → 0.70116

`STO``1`

Stores denominator → 0.70116

`10``√x`

√(n − 2) = √10 → 3.16228

`×``.713``=`

Multiplies by r_s → 2.25471

`÷``RCL``1``=`

Divides by stored denominator → **3.215**

⚠️ Using n = 12 instead of n − 2 = 10 inside the square root gives √12 = 3.46410 in the numerator, producing t = 3.519 instead of 3.215. Both exceed 2.228 in this example, so the conclusion is unchanged here. In questions where the true t-statistic is close to the critical value, this error flips the conclusion. Always use n − 2.

Now that you have seen all four worked examples, the pattern is clear. Both tests reduce to the same decision rule: compute the t-statistic from your correlation coefficient, compare |t_calculated| to t_critical, and reject H₀ if the former exceeds the latter. The only moving part is which correlation coefficient you feed into the formula.

⚠️

Watch out for this

The t-exceeds-critical-but-conclude-insignificant trap. A candidate correctly calculates t = 2.836 (or |t| = 2.323 in a different scenario), then reads the decision rule backwards and concludes the relationship is not statistically significant, even though the test statistic sits outside the critical boundary. The correct conclusion is to reject H₀: |t_calculated| > t_critical is the condition for rejection, not for acceptance. The cognitive error: candidates memorise "reject if the statistic is in the rejection region" as a phrase without anchoring which direction the rejection region is. The rejection region is in the tails, beyond the critical value. It is not the centre around zero. Before writing your conclusion, ask exactly one question: is |t_calculated| greater than t_critical? If yes, reject H₀. Full stop.

🧠

Memory Aid

CONTRAST ANCHOR

Parametric tests the strength of a linear number. Nonparametric tests the strength of a rank order.

Practice Questions · LO1

6 Questions LO1

Score: — / 6

Q 1 of 6 — REMEMBER

When testing the null hypothesis that the population correlation coefficient equals zero using a parametric t-test, what are the correct degrees of freedom?

CORRECT: C

CORRECT: C, The parametric t-test for correlation uses degrees of freedom = n − 2. The reduction by 2 reflects that the correlation coefficient is calculated from two variables, each requiring its own mean to be estimated from the data. With df = n − 2, you look up the critical t-value in a table using your chosen significance level and this degrees-of-freedom count.

Why not A? Using df = n − 1 is correct for a one-sample t-test of a single mean, where only one mean is estimated. In correlation testing, two variables are involved and two means must be estimated, so the degrees of freedom must account for both. Using n − 1 overstates the degrees of freedom, produces a critical value that is too small, and makes it easier to reject H₀ than the data warrant.

Why not B? Using df = n treats each observation pair as contributing a full degree of freedom without any deduction for estimated parameters. Every parametric test loses degrees of freedom equal to the number of parameters estimated from the sample. Correlation testing estimates two means, so n − 2 is the correct deduction, not zero. Using n would produce critical values that are too liberal, making rejection of the null hypothesis too easy.

---

Q 2 of 6 — UNDERSTAND

An analyst wants to test for correlation between daily rainfall measurements and municipal water consumption records. She notes that rainfall measurements are right-skewed, with occasional extreme flood events pulling the distribution sharply upward. Which statement best describes the appropriate test and why?

CORRECT: A

CORRECT: A, The parametric Pearson t-test requires both variables to be normally distributed. Right-skewed data with extreme outliers violate this assumption. When the normality assumption fails, the Spearman rank correlation test is the appropriate alternative. Spearman replaces actual values with their ranks before computing correlation, which eliminates sensitivity to the shape of the underlying distribution and to extreme outliers.

Why not B? The central limit theorem applies to the sampling distribution of the sample mean, not to the distributional assumption underlying the Pearson correlation test. The Pearson t-test specifically requires the joint distribution of the two variables to be bivariate normal. A large sample does not repair a structural violation of this assumption. It simply means you have more observations from a non-normal distribution. Large n does not justify using the parametric test when normality is structurally violated.

Why not C? It is true that both tests use the same t-statistic formula and the same decision rule. However, this does not make the tests interchangeable. What differs is the input: the parametric test uses the Pearson r calculated from raw values, while the nonparametric test uses the Spearman r_s calculated from ranks. Feeding a Pearson r computed from non-normal data into the t-formula produces invalid inference, even though the formula looks identical.

---

Q 3 of 6 — APPLY

Tomás Vega, a fixed income analyst at Pacífico Asset Management in Lima, calculates a sample Pearson correlation of r = −0.312 between corporate bond spreads and quarterly GDP growth over 52 quarters. He tests H₀: ρ = 0 versus Hₐ: ρ ≠ 0 at the 5% significance level using a two-sided alternative. The critical t-value for this test is ±2.009. What is Tomás's calculated t-statistic, and what is his conclusion?

CORRECT: B

CORRECT: B, With r = −0.312 and n = 52, apply the formula: numerator = −0.312 × √(52 − 2) = −0.312 × √50 = −0.312 × 7.07107 = −2.20617; denominator = √(1 − (−0.312)²) = √(1 − 0.09734) = √0.90266 = 0.95010; t = −2.20617 / 0.95010 = −2.323. Since |−2.323| = 2.323 > 2.009, Tomás rejects H₀. There is sufficient evidence at the 5% level to conclude a statistically significant linear relationship exists between bond spreads and GDP growth.

Why not A? The value t = −1.874 results from using n inside the square root instead of n − 2. Using √52 = 7.2111 instead of √50 = 7.0711 changes the numerator and produces a wrong t-statistic. Using n in the numerator is the most common arithmetic error in this formula. Here it produces a value below the critical threshold, reversing the correct conclusion.

Why not C? Option C has the correct t-statistic (−2.323) but the wrong conclusion. Since the test is two-sided and |t_calculated| = 2.323 exceeds the critical value of 2.009, the correct decision is to reject H₀. Concluding "fail to reject" when |t| > t_critical is the exact decision-rule reversal described in the trap box. The rejection region is in the tails beyond the critical value, not in the centre around zero.

---

Q 4 of 6 — APPLY+

Amara Osei, a research analyst at Kumasi Capital Partners, is studying the relationship between two variables across 18 observation pairs. She suspects non-normality and uses the Spearman rank correlation test. The sum of squared rank differences from her data is Σd² = 438. She tests H₀: ρ_s = 0 versus Hₐ: ρ_s ≠ 0 at the 5% significance level. The critical t-value at df = 16 is ±2.120. What is Amara's conclusion?

CORRECT: B

CORRECT: B, First, compute r_s: n = 18, n² − 1 = 323, n(n² − 1) = 5814. r_s = 1 − (6 × 438) / 5814 = 1 − 2628 / 5814 = 1 − 0.45199 = 0.548. Now convert to a t-statistic: denominator = √(1 − 0.548²) = √(1 − 0.30030) = √0.69970 = 0.83648; numerator = 0.548 × √16 = 0.548 × 4 = 2.19200; t = 2.192 / 0.836 = 2.621. Since |2.621| > 2.120, reject H₀. The correct r_s = 0.548 and the conclusion is reject. Option B is the closest in structure to the correct analytical process, the key skill being tested is that you must compute both r_s and the t-statistic, never stopping at r_s alone.

Why not A? Option A claims r_s = 0.638 and concludes significance without computing a t-statistic. The decision rule for correlation testing is based on |t_calculated| vs t_critical, not on the magnitude of r_s alone. Even a high r_s can fail to reach significance with a small sample. Jumping from r_s to a significance conclusion bypasses the required test statistic step.

Why not C? Option C misinterprets Σd². A larger Σd² means a weaker Spearman correlation, not a stronger one. The formula subtracts the Σd² term from 1: as Σd² increases, r_s decreases toward 0 and then becomes negative. Reading a large Σd² as evidence of a strong relationship inverts the meaning of the formula entirely.

---

Q 5 of 6 — ANALYZE

A portfolio manager is evaluating two research reports. Report 1 tests the correlation between analyst forecast errors and company size using Pearson r = 0.41 across 62 large-cap firms, reporting t = 3.31 and rejecting H₀ at the 5% significance level. Report 2 tests the same variables across 9 small-cap firms with explicitly non-normal return distributions using Pearson r = 0.65, reporting t = 2.27 and rejecting H₀ at the 5% significance level (critical value ±2.306). Which report contains a methodological concern, and what is it?

CORRECT: B

CORRECT: B, Report 2 explicitly states the data have non-normal distributions. The Pearson parametric t-test requires both variables to be normally distributed. When this assumption is violated, the test statistic does not follow a t-distribution, and the critical values from a t-table are not applicable. The correct approach for non-normal data is the Spearman rank correlation test. Report 2's rejection of H₀ may be an artifact of applying the wrong test rather than evidence of a true population relationship.

Why not A? Report 1 uses n = 62 with no stated violation of normality. The t-distribution is always used for correlation tests regardless of sample size. It does not switch to a normal distribution table as n increases. Using a t-table with df = 60 is entirely appropriate. Report 1 contains no apparent methodological concern based on the information given.

Why not C? The claim that the Pearson t-test requires n > 30 is false. The test requires normally distributed data, not a minimum sample size. A small sample from a genuinely normal distribution is valid for the parametric test. The 30-observation threshold is a rule of thumb for the central limit theorem applied to mean testing, not a requirement for correlation testing. Report 2's problem is the normality violation, not the sample size.

---

Q 6 of 6 — TRAP

Daniela Ferreira, an equity analyst at a brokerage firm in São Paulo, calculates a sample Pearson correlation of r = 0.397 between trading volume and price volatility for 30 mid-cap stocks over a 12-month period. She tests H₀: ρ = 0 versus Hₐ: ρ ≠ 0 at the 5% significance level. The critical t-value is ±2.048. She correctly computes the t-statistic as 2.289. She then states: "Since my t-statistic of 2.289 is greater than zero and close to the critical value of 2.048, the relationship is not statistically significant at the 5% level." What is wrong with Daniela's conclusion?

CORRECT: C

CORRECT: C, Daniela calculated the t-statistic correctly (2.289). The decision rule for a two-sided test is: reject H₀ if |t_calculated| > t_critical. Here, 2.289 > 2.048. The correct conclusion is to reject H₀. The rejection region lies in the tails beyond the critical value, not in the interior around zero. Daniela's error is treating proximity to the critical value as a reason for caution when the magnitude has already crossed the threshold. Once |t_calculated| exceeds t_critical, there is no borderline category: you reject.

Why not A? There is no "borderline default to fail to reject" rule in hypothesis testing. The decision rule is binary: either the test statistic exceeds the critical value or it does not. 2.289 exceeds 2.048, so the null is rejected. The closeness of the margin is irrelevant to the formal statistical decision. It might affect how you interpret the economic significance of the result, but it does not change the statistical conclusion.

Why not B? Daniela used df = n − 2 = 28, which is the correct degrees of freedom for a correlation t-test. Using df = n − 1 = 29 would be appropriate for a one-sample mean test, not for correlation testing. Using df = 29 gives a slightly smaller critical value (approximately 2.045 versus 2.048), which would make it even easier to reject H₀. That correction would not fix Daniela's error, it would reinforce the correct conclusion she is already resisting.

---

Glossary

parametric test

A statistical test that assumes the data follow a specific probability distribution, usually the normal distribution. Like a recipe that only works with the right ingredients: if the data are not normally distributed, the test's conclusions may be invalid.

nonparametric test

A statistical test that makes no assumption about the underlying distribution of the data. It converts raw values into ranks, making it robust to skewness, outliers, and bounded variables.

normally distributed

Describes data that follow a bell-shaped curve, symmetric around the mean, with no hard upper or lower boundary. Most people's heights in a large population are approximately normally distributed.

population correlation coefficient

The true correlation between two variables across the entire population, denoted by the Greek letter rho (ρ). Because we rarely observe the whole population, we estimate it from a sample using the sample correlation r.

rho

The Greek letter ρ, used to denote the population correlation coefficient. In hypothesis testing, the null hypothesis typically states ρ = 0, meaning no linear relationship exists in the population.

t-statistic

A calculated number that measures how many standard errors a sample estimate falls from the hypothesised value. It is compared to a critical value to decide whether to reject the null hypothesis. A larger absolute t-statistic means the sample evidence is further from the null.

Pearson correlation coefficient

The most common measure of linear association between two continuous variables, denoted r. It ranges from −1 (perfect negative linear relationship) to +1 (perfect positive linear relationship), with 0 indicating no linear relationship.

Spearman rank correlation coefficient

A nonparametric measure of association, denoted r_s, calculated by replacing raw values with their ranks within each variable and then applying the correlation formula. It captures monotonic relationships and is not affected by extreme values.

H₀

The null hypothesis, the default assumption in a hypothesis test, usually stating that there is no effect, no relationship, or no difference. It is retained unless the sample evidence provides sufficient grounds to reject it. For correlation tests, H₀ is typically ρ = 0.

critical value

The cutoff point from a probability distribution that separates the rejection region from the non-rejection region. If the absolute value of the test statistic exceeds the critical value, the null hypothesis is rejected. Like a speed limit: you either exceed it or you do not.

The number of paired observations in the sample used to calculate a correlation. In correlation testing, degrees of freedom equal n − 2.

normal distribution

A symmetric, bell-shaped probability distribution fully described by its mean and standard deviation. Values far from the mean are increasingly rare. Used widely in statistics because many natural phenomena approximately follow this pattern.

t-distribution

A bell-shaped probability distribution with heavier tails than the normal distribution, used when the population standard deviation is unknown. The tails shrink as sample size increases. Used to find critical values for correlation and mean tests.

standard deviation

A measure of how spread out a set of values is around their average. A low standard deviation means values cluster close to the mean. A high standard deviation means they are spread far from it.

alpha

In the context of fund performance, alpha is the risk-adjusted excess return, the return a fund earns above what is expected given its level of systematic risk. A fund with positive alpha outperforms its risk-adjusted benchmark. A fund with negative alpha underperforms it.

LO 1 Done ✓

Ready for the next learning objective.

🔒 PRO Feature

How analysts use this at work

Real-world applications and interview questions from top firms.

Quantitative Methods · Parametric and Non-Parametric Tests of Independence · LO 2 of 2

How do you know if two categories are actually linked, or just appeared together by chance?

Use a chi-square test on a contingency table to determine whether two categorical variables are independent or related.

⏱ 8min-15min

3 questions

LOW PRIORITYUNDERSTAND

Why this LO matters

Use a chi-square test on a contingency table to determine whether two categorical variables are independent or related.

INSIGHT

Correlation measures relationships between two numerical variables. That sentence is the entire reason this LO exists. Categorical data, like "dividend star" or "high-risk leverage," cannot be converted to a meaningful number. When you have two categorical classifications and want to know if they are linked, you arrange the counts in a contingency table and test whether observed cell counts match what you would expect if the two categories were completely unrelated. If the counts match the expectation, the categories are independent. If they differ significantly, the categories are related.

What is a contingency table and why does independence matter?

Think about a cinema loyalty card survey. You ask 500 customers two questions: do they visit weekly, monthly, or rarely? And do they buy popcorn always, sometimes, or never? You suspect frequent visitors probably buy more popcorn. But suspicion is not evidence. You need a test.

You arrange the counts in a grid: three rows for visit frequency, three columns for popcorn habit, one number in each cell showing how many customers fall into that combination. That grid is a contingency table. The question is whether the row classification and the column classification are independent of each other, or whether knowing someone's visit frequency genuinely tells you something about their popcorn habit.

This is exactly the structure of every exam question on this LO. The categories change. The logic does not.

Testing for Independence with Categorical Data

Contingency table. A table displaying how observations are distributed across two categorical classifications simultaneously. Each cell shows the count of observations in that combination. Use this format whenever you need to test whether two categorical variables are related or independent.

Observed frequency. The actual count of observations in each cell, taken directly from the data you have. These are the numbers you start with. They represent what you actually saw in your sample.

Expected frequency. The count you would expect in each cell if the two variables were completely independent. Calculate this for every cell using: (row total × column total) ÷ overall total. This is the assumption of independence made concrete.

Chi-square test statistic. A single number measuring how far observed frequencies deviate from expected frequencies across all cells. Calculate it by summing (Observed − Expected)² ÷ Expected across all cells. Larger values indicate the variables are more likely related.

Degrees of freedom. The number of cells that are free to vary before all totals are determined. For a contingency table: (number of rows − 1) × (number of columns − 1). This determines which chi-square distribution you use for the critical value.

Null hypothesis of independence. The claim that the two categorical variables are not related. Phrase it as: "[Variable 1] and [Variable 2] are independent." Rejection of this null means the variables are related.

FORWARD REFERENCE

The chi-square distribution, what you need for this LO only

The chi-square distribution is a probability distribution used for tests involving categorical data. Like the t-distribution, its shape depends on degrees of freedom. For this LO, you only need to know: find the critical value using your degrees of freedom and significance level (usually 5%), then compare your calculated chi-square statistic to that critical value. You will study this distribution fully in Module 1 of Quantitative Methods. For this LO, you only need to apply the comparison: if calculated chi-square exceeds the critical value, reject the null hypothesis of independence.

→ Quantitative Methods

How to apply the test: two worked examples

[[WORKED-EXAMPLE: 1 · Identifying the right test for categorical data]

Scenario: Priya Nair is a research analyst at Solaris Asset Management. She has collected data on 500 investment funds, classifying each fund along two dimensions: its stated risk appetite (conservative, moderate, or aggressive) and its actual three-year return category (below-market, market, or above-market). She wants to know whether these two classifications are related, or whether they appear together purely by chance.

🧠Thinking Flow — Identifying the right test for categorical data

The question asks

Which statistical test is appropriate when two categorical classifications are observed together in a frequency table, and how do you decide whether the two classifications are independent?

Key concept needed

Chi-square test for independence.

Step 1, Recognise the wrong approach

A common mistake is to compute a correlation coefficient between the two classifications. Correlation requires numerical data measured on a continuous scale. Risk appetite labelled "conservative / moderate / aggressive" is not a number. It is a category. Using correlation here gives a meaningless result.

Step 2, Apply the correct approach

When both variables are categorical, build a contingency table and apply the chi-square test for independence. This test works directly with counts of observations in category combinations, not with numerical measures.

Step 3, State the hypotheses correctly

Many candidates reverse the null and alternative. The null hypothesis always states independence. H₀: Risk appetite and return category are not related; these two classifications are independent. Hₐ: Risk appetite and return category are related; these two classifications are not independent. The null is the claim of no relationship. The test tries to find enough evidence to reject that claim.

Step 4, Sanity check on the rejection region

The chi-square statistic is built from squared differences, so it can never be negative. A statistic of zero means observed and expected frequencies match perfectly, which is strong evidence of independence. Larger values mean the frequencies diverge, which is evidence against independence. The entire rejection region sits in the right tail. There is no left-side rejection and no two-tailed test for this procedure. ✓ ✓ Answer: The chi-square test for independence applied to a contingency table is the correct approach. The null hypothesis states independence. The alternative states dependence. The rejection region is one-sided, right tail only.

Worked Example 2

Computing expected frequencies and the chi-square statistic

Marcus Osei works in the quantitative research team at Meridian Capital. He has classified 250 firms along two dimensions: dividend reliability group (Stars, Neutral, or Laggards) and financial-leverage risk group (Low, Medium, or High). The observed cell counts appear in the table below. Marcus must compute the expected frequencies under independence and then calculate the chi-square test statistic.

	Div Stars	Div Neutral	Div Laggards	Row Total
Low Leverage	40	40	40	120
Medium Leverage	30	10	20	60
High Leverage	10	50	10	70
Column Total	80	100	70	250

🧠Thinking Flow — Computing expected frequencies and the chi-square statistic

The question asks

How do you calculate the expected frequency for each cell, and how do you combine those into a single chi-square test statistic?

Key concept needed

Expected frequency formula and chi-square summation. A common wrong move is to use the observed frequencies directly as if they were expected frequencies. That gives a chi-square statistic of zero, which is the exact opposite of what the data actually shows.

Step 1, Identify the expected frequency formula

For any cell in row i and column j: Expected frequency = (Row i total × Column j total) ÷ Overall total Every cell gets its own calculation. There are 9 cells here, so 9 calculations are needed.

Step 2, Calculate all nine expected frequencies

Overall total = 250.

	Div Stars	Div Neutral	Div Laggards
Low Leverage	(120 × 80) ÷ 250 = 38.4	(120 × 100) ÷ 250 = 48.0	(120 × 70) ÷ 250 = 33.6
Medium Leverage	(60 × 80) ÷ 250 = 19.2	(60 × 100) ÷ 250 = 24.0	(60 × 70) ÷ 250 = 16.8
High Leverage	(70 × 80) ÷ 250 = 22.4	(70 × 100) ÷ 250 = 28.0	(70 × 70) ÷ 250 = 19.6

Quick verification: sum all nine expected frequencies. 38.4 + 48.0 + 33.6 + 19.2 + 24.0 + 16.8 + 22.4 + 28.0 + 19.6 = 250. ✓ This matches the overall total. If it does not match, recheck your row and column totals first.

Step 3, Calculate the scaled squared deviation for each cell

Formula for each cell: (Observed − Expected)² ÷ Expected

	Div Stars	Div Neutral	Div Laggards
Low Leverage	(40 − 38.4)² ÷ 38.4 = 0.067	(40 − 48.0)² ÷ 48.0 = 1.333	(40 − 33.6)² ÷ 33.6 = 1.219
Medium Leverage	(30 − 19.2)² ÷ 19.2 = 6.075	(10 − 24.0)² ÷ 24.0 = 8.167	(20 − 16.8)² ÷ 16.8 = 0.610
High Leverage	(10 − 22.4)² ÷ 22.4 = 6.864	(50 − 28.0)² ÷ 28.0 = 17.286	(10 − 19.6)² ÷ 19.6 = 4.702

Step 4, Sum all nine scaled squared deviations

χ² = 0.067 + 1.333 + 1.219 + 6.075 + 8.167 + 0.610 + 6.864 + 17.286 + 4.702 χ² = 46.323

Step 5, Determine degrees of freedom and apply the decision rule

Degrees of freedom = (rows − 1) × (columns − 1) = (3 − 1) × (3 − 1) = 2 × 2 = 4. Critical value at the 5% significance level with 4 degrees of freedom = 9.4877. Decision rule: reject H₀ if χ² > 9.4877. Calculated χ² = 46.323 > 9.4877. Reject H₀.

Step 6, Sanity check

The statistic is far above the critical value, not marginally above it. The Medium Leverage / Dividend Neutral cell (observed = 10, expected = 24.0) and the High Leverage / Dividend Neutral cell (observed = 50, expected = 28.0) show dramatic divergences. Large deviations in multiple cells produce large chi-square values. The conclusion of dependence is consistent with what the raw data visually shows. ✓ ✓ Answer: The chi-square test statistic is 46.323. With a critical value of 9.4877 at the 5% significance level and 4 degrees of freedom, Marcus rejects the null hypothesis. There is sufficient evidence to conclude that dividend reliability group and financial-leverage risk group are related. They are not independent.

Now that you have seen the mechanics, there is one specific error that exam questions are built to catch.

⚠️

Watch out for this

The null-as-dependence trap. A candidate who writes H₀ as "the two variables are related" has the hypotheses backwards and will reach the opposite conclusion from the correct analysis, even with a perfectly calculated chi-square statistic. The correct formulation always places independence in the null: H₀ states that the two classifications are independent, and Hₐ states that they are related. Candidates make this error because in other contexts H₀ feels like the "expected" or "interesting" claim, and a relationship between two investment variables can feel like the natural starting assumption. But the chi-square test, like all hypothesis tests, places the claim of no effect in the null. Before committing to a conclusion, confirm the null reads "independent", if the hypotheses are reversed, flip the conclusion entirely.

🧠

Memory Aid

ACRONYM

ECHO

E, Expected frequency first — Calculate (row total × column total) ÷ overall total for every cell before touching the chi-square formula.

C, Categorical data only — If both variables have numerical, continuous values, stop. Chi-square on a contingency table does not apply there.

H, H₀ claims independence — The null is always the no-relationship claim. The alternative is the dependence claim.

O, One-tailed right — The rejection region is always the right tail. A large statistic means observed and expected diverge strongly.

When a question shows a frequency table with two classification dimensions, run through ECHO in order: set up expected frequencies, confirm the data is categorical, write H₀ as independence, and remember that only a large chi-square rejects it. If you find yourself writing "H₀: the variables are dependent," ECHO has already told you to stop at H.

Practice Questions · LO2

3 Questions LO2

Score: — / 3

Q 1 of 3 — REMEMBER

In a chi-square test of independence applied to a contingency table, what does the null hypothesis state?

CORRECT: B

CORRECT: B, The null hypothesis in a chi-square test of independence always states the claim of no relationship: the two classifications are independent, meaning knowing the value of one variable tells you nothing about the other. The test then asks whether the data provide enough evidence to reject that claim.

Why not A? Option A describes the alternative hypothesis, not the null. The alternative is the claim that the variables are related. Placing dependence in H₀ reverses the entire logic of the test. If you then rejected that H₀, you would be concluding independence, which is the opposite of what the test is designed to detect. The null always carries the no-effect claim.

Why not C? Option C sounds plausible because perfect equality between observed and expected frequencies would produce a chi-square statistic of zero, which is consistent with independence. But H₀ is a statement about the population relationship, not a statement that two sets of numbers are identical. In practice, observed and expected will never match exactly even under true independence, because of random sampling variation. H₀ says the variables are independent in the population. It does not say the cell counts must match exactly in the sample.

---

Q 2 of 3 — UNDERSTAND

An analyst wants to examine whether a fund manager's stated investment style (value, blend, or growth) is related to the fund's star-rating category (one star, three stars, or five stars). She collects data on 400 funds. Why is the chi-square test of independence more appropriate here than a correlation coefficient?

CORRECT: B

CORRECT: B, Correlation measures the linear relationship between two variables that take numerical values on a continuous scale. Investment style (value, blend, growth) and star rating (one, three, five stars) are categories, not numbers. Even though star ratings use numbers as labels, they represent ordered groups, not measured quantities with equal intervals between them. Applying correlation to category labels produces a meaningless result. The chi-square test works directly with the count of observations in each category combination, making it the right tool for categorical data.

Why not A? Sample size does not determine whether correlation is appropriate. Correlation can be calculated on small or large samples. The issue is the nature of the data, not the amount of it. With 400 funds and two categorical variables, the chi-square test is correct simply because the variables are categorical, regardless of sample size.

Why not C? This is incorrect on both counts. Correlation can be negative when two numerical variables move in opposite directions. The chi-square statistic, by contrast, is always non-negative because it is built from squared differences. More importantly, the chi-square test does not detect the direction of a relationship at all. It only answers one question: are the two classifications independent or not? Direction of association would require a different measure entirely.

---

Q 3 of 3 — APPLY

Elena Vasquez is analyzing whether geographic region (North, South, East, or West) and product preference (Type A, Type B, or Type C) are independent. She surveys 360 customers. The row totals are 90, 80, 100, and 90 for the four regions. The column totals are 120, 150, and 90 for the three product types. What is the expected frequency for the cell representing customers from the South region who prefer Type B?

CORRECT: B

CORRECT: B, The expected frequency formula for any cell is (row total × column total) ÷ overall total. For the South region / Type B cell: row total = 80, column total = 150, overall total = 360. Expected frequency = (80 × 150) ÷ 360 = 12,000 ÷ 360 = 33.3. This is the count you would expect if geographic region and product preference were completely unrelated.

Why not A? A result of 41.7 comes from using the North region's row total (90) instead of the South region's row total (80): (90 × 150) ÷ 360 = 41.7. This is the expected frequency for the North / Type B cell, not South / Type B. When a contingency table has multiple rows, it is easy to pick up the wrong row total. Always identify the specific row and column totals for the cell you are computing before applying the formula.

Why not C? A result of 26.7 comes from using the Type C column total (90) instead of the Type B column total (150): (80 × 90) ÷ 360 = 26.7. This is the expected frequency for the South / Type C cell. The same row total is used correctly, but the wrong column total is substituted. A useful self-check: expected frequencies across a single row must sum to that row's total. The three expected values for the South row are 90 × 120 ÷ 360 = 30, 90 × 150 ÷ 360 = 37.5 wait, the South row total is 80, not 90. So: (80 × 120) ÷ 360 = 26.7, (80 × 150) ÷ 360 = 33.3, (80 × 90) ÷ 360 = 20.0. Summing these: 26.7 + 33.3 + 20.0 = 80.0. That matches the South row total, confirming that B is the correct answer for the South / Type B cell.

---

Glossary

Contingency table

A grid showing how observations are sorted into groups based on two categorical characteristics at the same time. Each row represents one category of the first characteristic, each column represents one category of the second, and each cell displays the count of observations in that combination. Like a seating chart that tracks both which section of a theatre people sit in and whether they bought their ticket in advance or at the door.

Observed frequency

The actual count of observations recorded in a specific cell of a contingency table, pulled directly from the data you collected. If you surveyed 250 fund managers and found that 40 of them work in conservative funds with below-market returns, then 40 is the observed frequency for that cell.

Expected frequency

The count of observations you would expect to see in a cell if the two categorical variables were completely unrelated to each other. Calculated using (row total × column total) ÷ overall total. Think of it as the count you would predict if you had no information about any relationship between the two groups.

Chi-square test statistic

A single number summarising how much the observed cell counts differ from the expected cell counts across the entire contingency table. Computed as the sum of (Observed − Expected)² ÷ Expected for every cell. A value close to zero suggests the variables are independent. A large value suggests they are related.

Degrees of freedom

The number of cells in a table that can vary freely before all the row and column totals force the remaining cells to be fixed. For a contingency table, it equals (number of rows − 1) × (number of columns − 1). Think of it like a crossword puzzle: once you fill in enough squares, the remaining squares are forced by the intersecting words.

Null hypothesis of independence

The default assumption that the two categorical variables are not related, meaning knowing one tells you nothing about the other. Stated as "[Variable 1] and [Variable 2] are independent." The chi-square test checks whether the data provide strong enough evidence to reject this assumption. If you cannot reject it, the variables may well be independent.

LO 2 Done ✓

You have completed all learning objectives for this module.

🔒 PRO Feature

How analysts use this at work

Real-world applications and interview questions from top firms.

Quantitative Methods · Parametric and Non-Parametric Tests of Independence · Job Ready

From exam to career

Statistical tests for relationships: correlation significance and categorical independence

Why this session exists

Why this session exists: The exam tests whether you can calculate a t-statistic from a correlation coefficient and make a binary decision about statistical significance. In an interview, the question is different. It is not "can you calculate this?" It is "do you know which test to run, why the choice of test matters, and what happens if you apply the wrong one?" That is the gap this section closes.

Two professional domains appear in this module. LO 9a covers quantitative research and risk management, where analysts decide whether observed correlations represent real relationships or statistical noise. LO 9b covers consulting and categorical research, where analysts test whether two classifications of data are genuinely related or appear together by chance. The jobs are investment consulting, quantitative research, risk management, and ESG or credit research.

LO 9

Correlation significance testing: when to use Pearson versus Spearman

How analysts use this at work

Portfolio managers at firms like Vanguard and Dimensional Fund Advisors use correlation significance testing every time they evaluate a factor or style exposure. They observe a sample correlation between two variables and need to determine whether that relationship exists in the population or is an artifact of the sample drawn. A portfolio manager building a multi-factor equity model might find that a momentum factor and a quality factor have a sample correlation of 0.52 across 60 months of returns. Before treating them as separate exposures, the manager runs the t-test on that correlation. If the test fails to reject the null, the manager keeps both factors in the model. If the test rejects, the manager recognises the factors are measuring a similar economic driver and consolidates the exposure. The test is the gatekeeper between a redundant factor and a legitimate independent signal.

Credit analysts at banks and asset managers face a different version of the same problem. When assessing whether credit ratings and default rates are correlated, analysts encounter variables that are structurally bounded. Credit ratings have a fixed scale. Expense ratios and share ratios cannot go below zero. These bounded variables often violate the normality assumption that the Pearson test requires. Analysts at firms like PIMCO and BlackRock therefore default to the Spearman rank test when they see bounded data. Running a Pearson test on non-normal data produces misleading conclusions. The practical consequence is a model that tells you a relationship is significant when it is actually a statistical artifact, leading to mispriced risk or poorly constructed portfolios.

Interview questions

Vanguard Quantitative Portfolio Manager "A research report shows a correlation of 0.34 between two style factors across 25 months. The analyst concludes the relationship is economically meaningful because 0.34 is moderately strong. What is wrong with this conclusion, and what would you conclude instead?"

PIMCO Quantitative Researcher "An analyst finds that a fund's sample correlation of 0.29 is not statistically significant at the 5% level. The analyst recommends dropping the variable from the model. The correlation represents a theoretically justified risk premium. How do you evaluate this recommendation?"

Bridgewater Risk Analyst "You are testing the correlation between two hedge fund return series. You have 500 weekly observations and find r = 0.18. Your colleague says the large sample size makes the Pearson t-test valid regardless of the data distribution. Your data are highly right-skewed with several extreme outlier weeks. How do you respond?"

One-line to use in your interview

Interviewers listen for industry-specific language. It signals you understand the concept, not just the definition. Use the plain English version to adapt it in your own words.

In practice, I treat the normality assumption as a decision trigger, not a box to check. When I see bounded or skewed data, I switch to Spearman without hesitation, because applying the wrong test gives a result that looks precise but is actually unreliable.

In plain English

I do not run the same test on every dataset. If the numbers have a natural floor or ceiling, or if extreme values are pulling the average in one direction, I use a different method that does not care about those distortions. The result looks the same on paper, but only one of them actually tells me what is true.

LO 9

Chi-square test for categorical data: testing independence in contingency tables

How analysts use this at work

Investment consultants at firms like Mercer and Aon use the chi-square test when advising institutional clients on portfolio allocation and manager selection. They often segment clients by risk tolerance or asset class preference and want to know whether those segments are genuinely different in their behaviour or just artifacts of how the consultant drew the sample. A consultant might cross-classify 300 pension plan clients by risk profile and product recommendation in a contingency table, then run the chi-square test. If the test fails to reject independence, the consultant concludes that observed patterns in recommendations are indistinguishable from random variation. This directly changes how they communicate segmentation to clients. If the test rejects, the consultant has statistical backing for claiming that different client types warrant genuinely different strategies, which justifies the advisory fee structure.

Research analysts at ESG rating agencies and sell-side credit teams use the same test when evaluating whether categorical firm characteristics are related. An analyst investigating whether ESG ratings and credit ratings are linked across 200 companies constructs a contingency table with rating categories in the rows and columns. The analyst computes expected frequencies under independence, calculates the chi-square statistic, and uses the result to decide whether to integrate ESG factors into a credit model. If the test shows dependence, ESG data adds predictive power. If the test fails to reject independence, ESG ratings may not inform credit outcomes at all. The critical professional error here is reversing the hypotheses. An analyst who states that the null hypothesis is that the variables are related, then rejects it, has actually concluded independence. In a credit context, that error leads to a model that either ignores relevant information or overweights noise.

Interview questions

Mercer Investment Consultant "You are advising a pension fund on manager selection. You cross-classify 250 funds by investment style and five-year star rating, compute the chi-square statistic, and find it exceeds the critical value at the 5% level. What does this result tell you, and what is your recommendation to the client?"

Standard Chartered Credit Analyst "An analyst runs a chi-square test on a contingency table of credit ratings versus ESG classifications. She calculates expected frequencies, then incorrectly uses those same numbers as the observed frequencies in the chi-square formula. What is the numerical result of this error, and what conclusion would she reach?"

Deloitte Risk and Capital Consultant "A colleague argues that since the chi-square test rejected the null hypothesis of independence between company size and sector classification, they should use both variables in a regression model without adjustment. How would you evaluate this reasoning?"

One-line to use in your interview

Interviewers listen for industry-specific language. It signals you understand the concept, not just the definition. Use the plain English version to adapt it in your own words.

In categorical analysis, I treat the hypothesis structure as a one-way valve. Independence always sits in the null. The test only gives you permission to claim dependence if the evidence is strong enough to reject it.

In plain English

I never start by assuming two groups are different. I assume they are the same unless the data gives me strong reason to think otherwise. The chi-square test is my tool for deciding whether that reason exists. Without it, I am just looking at patterns in noise.