Quantitative Methods · Statistical Measures of Asset Returns · LO 1 of 4

Ten analysts earn decent salaries, one billionaire walks into the room, does the average still tell you anything useful?

Choose the right measure of central tendency and location for the data in front of you, especially when outliers distort the arithmetic mean.

⏱ 8min-15min

6 questions

HIGH PRIORITYAPPLY🧮 Calculator

Why this LO matters

Choose the right measure of central tendency and location for the data in front of you, especially when outliers distort the arithmetic mean.

INSIGHT

A statistic is only useful if it describes what actually happens most of the time. The arithmetic mean describes the balance point of the data, the fulcrum. One billionaire in a room of average earners shifts that fulcrum to the right dramatically. Nobody in the room is earning the "average." The median tells you what the person in the middle of the room earns. Which statistic you need depends entirely on what question you are answering.

What are you actually measuring?

All measures of central tendency answer one question: where is the data centered?

They disagree on what "centered" means, and that disagreement matters enormously in investments.

The five measures of central tendency

Arithmetic mean. Sum all values, divide by n. The balance point of the data. Sensitive to outliers. Best for describing a single-period average return where each period is equally weighted.

Median. The value in the middle of a sorted dataset. Not affected by outliers. Best for skewed distributions or when extreme values exist.

Mode. The most frequently occurring value. Can be used with categorical data. A distribution can have no mode, one mode (unimodal), or several (bimodal, trimodal).

Geometric mean. The nth root of the product of n values (for returns: chain the growth factors, take the nth root, subtract 1). Used for multi-period compound returns. Always less than or equal to the arithmetic mean for the same data, the gap widens with volatility.

Harmonic mean. n divided by the sum of the reciprocals of the observations. Used for averaging rates or ratios, particularly for cost-averaging strategies. Always the lowest of the three means when values are unequal.

The salary survey that misled everyone

Imagine a consulting firm publishes the "average analyst salary" at a mid-size investment bank. The sample includes 15 junior analysts earning between €60,000 and €90,000 and one founding partner who still counts in the sample at €4,200,000. The arithmetic mean comes out at €320,000. Every junior analyst reading this feels they are severely underpaid. But the median salary, the value the person in the 8th position earns, is €74,000. The median tells the real story. The arithmetic mean is distorted by a single observation that is not representative of anyone else in the sample. The wrong answer candidates give: "The arithmetic mean is always the most informative measure." The right framework: when outliers are present, or when the distribution is skewed, the median is the more representative measure of central tendency. The arithmetic mean is the balance point, not the typical value.

When returns compound over time: the geometric mean

Geometric mean return

R_G = [(1 + R_1)(1 + R_2)...(1 + R_n)]^(1/n) - 1

R_G = geometric mean return
R_1...R_n = each period's return (as a decimal)
n = number of periods

Use when: calculating the compound growth rate of a portfolio over multiple periods.
Do not use: for averaging cross-sectional returns across different assets in one period.
// The geometric mean tells you what constant annual return would have produced the same ending wealth as the actual series of annual returns.
// Arithmetic mean and geometric mean agree only when all returns are identical. The more volatile the returns, the more the arithmetic mean overstates the compound growth rate.

🧠Thinking Flow — Which mean does this problem need?

The question asks

A fund returned 50% in year 1 and −33% in year 2. What is the compound annual return?

Key concept needed

The geometric mean, because returns are compounding across time periods, each year's return applies to a changing base.

Step 1, Identify the structure

Returns compound. Year 2 loss applies to a portfolio that already grew by 50%. Arithmetic averaging (averaging 50% and −33% = 8.5%) ignores this dependency and overstates performance.

Step 2, Apply the geometric mean formula

(1.50 × 0.67)^(1/2) − 1 = (1.005)^(0.5) − 1 = 0.25% compound annual return.

Step 3, Sanity check

Start with €100. After year 1: €150. After year 2 (−33%): €150 × 0.67 = €100.50. Two years of compounding produced €0.50 gain on €100, or roughly 0.25% per year. The geometric mean matches. The arithmetic mean of 8.5% would imply a vastly different ending value.

Answer

Geometric mean = approximately 0.25% per year. Arithmetic mean of 8.5% is misleading in a multi-period context.

Dealing with outliers: trimmed and winsorized means

When extreme values are legitimate but distorting, two approaches let you keep most of the data while reducing the influence of outliers.

Trimmed mean: Remove a stated small percentage of the highest and lowest values, then calculate the arithmetic mean of what remains. A 5% trimmed mean discards the bottom 2.5% and top 2.5% of observations. Used in sports judging, the highest and lowest scores are dropped.

Winsorized mean: Instead of dropping extreme values, replace them with the value at the boundary percentile. Observations below the 2.5th percentile are all replaced with the 2.5th percentile value. Observations above the 97.5th percentile are all replaced with the 97.5th percentile value. Sample size stays the same. The mean of the replaced dataset is the winsorized mean.

Quantiles: locating a value in a ranked distribution

A quantile is any value at or below which a stated fraction of the data lies. The naming depends on how finely the distribution is divided.

Quantile types

Quartiles. Divide into four equal parts. Q1 = 25th percentile. Q2 = median = 50th percentile. Q3 = 75th percentile.

Quintiles. Divide into five equal parts: 20th, 40th, 60th, 80th percentile.

Deciles. Divide into ten equal parts: 10th, 20th, 30th...90th percentile.

Percentiles. Divide into 100 equal parts. The Lth percentile is the value at or below which L% of the data lies. Position formula for the Lth percentile in a dataset of n values sorted ascending: ``

 Position = (n + 1) × (L / 100) If the position is a whole number: the observation at that position is the Lth percentile. If the position is not a whole number: interpolate between the two surrounding observations.

Interquartile range (IQR): Q3 − Q1. Measures the spread of the middle half of the data. Used in box and whisker plots.

Worked Example 1

Finding the median and quartiles in a ranked dataset

An analyst at Meridian Asset Management reviews the 1-year returns of 9 equity funds in her peer group: −3%, 1%, 2%, 4%, 5%, 7%, 9%, 11%, 14%.

🧠Thinking Flow — Median and quartile position

The question asks

Find the median, Q1, and Q3.

Key concept needed

Position formula for odd-numbered sample (n = 9).

Step 1, Confirm the data is sorted

−3, 1, 2, 4, 5, 7, 9, 11, 14. Yes, ascending order confirmed.

Step 2, Find the median (50th percentile)

Position = (9 + 1) × 50/100 = 5. The 5th value is 5%. Median = 5%.

Step 3, Find Q1 (25th percentile)

Position = (9 + 1) × 25/100 = 2.5. Interpolate between the 2nd value (1%) and the 3rd value (2%): 1 + 0.5 × (2 − 1) = 1.5%.

Step 4, Find Q3 (75th percentile)

Position = (9 + 1) × 75/100 = 7.5. Interpolate between the 7th value (9%) and the 8th value (11%): 9 + 0.5 × (11 − 9) = 10%.

Step 5, Sanity check

IQR = Q3 − Q1 = 10% − 1.5% = 8.5%. The middle 50% of the distribution spans an 8.5 percentage point range, which seems reasonable for an equity peer group.

Answer

Median = 5%, Q1 = 1.5%, Q3 = 10%, IQR = 8.5%.

🧮 Warning:** The stat worksheet gives you the arithmetic mean (X̄) and sample standard deviation (Sx). It does not compute the geometric mean, median, or quartiles. You must compute those by hand using the sorted data and position formulas. Do not mistake X̄ for the geometric or harmonic mean.

2ND,7(DATA)

Enter the data worksheet → X01

−3,ENTER,↓,↓

Input first return, skip Y → X02

1,ENTER,↓,↓

Input second return → X03

(repeatforall9values)

, → ,

2ND,8(STAT)

Enter stat worksheet → LIN

↓

Skip regression mode → n = 9

↓

→ X̄ = 5.56 (arithmetic mean)

↓

→ Sx = 5.27 (sample std dev)

Worked Example 2

Choosing between arithmetic and geometric mean

Priya Nair, a research analyst at Orion Capital, reviews a four-year return history for the Orion Growth Fund: Year 1: +30%, Year 2: +20%, Year 3: −25%, Year 4: +15%. Her supervisor asks for both the arithmetic and geometric mean annual returns.

🧠Thinking Flow — Arithmetic vs geometric mean

The question asks

Compute both means and explain when each is appropriate.

Key concept needed

Geometric mean captures actual compounding. Arithmetic mean is the simple average.

Step 1, Arithmetic mean

(30 + 20 − 25 + 15) / 4 = 40 / 4 = 10% per year.

Step 2, Geometric mean

Convert to growth factors: 1.30 × 1.20 × 0.75 × 1.15. 1.30 × 1.20 = 1.56. 1.56 × 0.75 = 1.17. 1.17 × 1.15 = 1.3455. Take the 4th root: 1.3455^(0.25) = 1.0770. Subtract 1: geometric mean = 7.70% per year.

Step 3, Sanity check

Start with €1,000. After 4 years at 7.70% compound: €1,000 × 1.3455 ≈ €1,346. Apply the actual returns: €1,000 × 1.30 × 1.20 × 0.75 × 1.15 = €1,345.50. Match confirmed.

Step 4, Interpretation

The arithmetic mean (10%) describes the average return in a single year. The geometric mean (7.70%) describes what actually happened to €1,000 invested for all four years. For evaluating compound performance over time, the geometric mean is the right measure.

Answer

Arithmetic mean = 10%, geometric mean ≈ 7.70%. The gap (2.30%) reflects return volatility, the greater the volatility, the larger this gap.

🧮 Reminder:** The y^x key is above the ÷ key. Enter the base first (1.3455), then press y^x, then enter the exponent (0.25 for the 4th root), then press =. Forgetting to subtract 1 at the end is the most common error here, the result before subtracting 1 is a growth factor, not a return.

1.30×1.20×0.75×1.15=

Compute the product of growth factors → 1.3455

y^x

Raise to a power → ,

0.25=

Enter the exponent (1/n = 1/4) → 1.0770

−1=

Subtract 1 → 0.0770

⚠️

Watch out for this

The arithmetic mean property trap. The deviations from the arithmetic mean always sum to zero, this is a mathematical property, not a coincidence. Candidates who do not know this property will spend time computing and adding deviations to see what they get, rather than knowing the answer immediately. The related exam question: "Andrea computes the deviations from the mean for a dataset and gets a total of 12%. What can you conclude?" Answer: she made an arithmetic error, the deviations must sum to zero. Candidates who mark "this indicates positive skewness" or similar statements are confusing deviations with squared deviations or cubed deviations, which do not share this zero-sum property.

🧠

Memory Aid

FORMULA HOOK

When the data has extreme values, the median doesn't flinch. When returns compound across time, the geometric mean is the one that tells the truth. When you see "average rate" or "average price paid over equal instalments," think harmonic. Deviations from the arithmetic mean always cancel to zero, that zero is a proof of correct arithmetic, not a description of the data. Use this when you encounter a multi-period return question: ask "is each period's return applied to a changing base?" If yes, geometric mean. If you are averaging returns across different funds in one period (cross-sectional), arithmetic mean is appropriate.

Practice Questions · LO1

6 Questions LO1

Score: — / 6

Q 1 of 6 — REMEMBER

A property of the arithmetic mean is that the sum of the deviations of all observations from the arithmetic mean is:

CORRECT: B

B is correct. The arithmetic mean is defined as the balance point of the data, the point at which positive deviations exactly offset negative deviations. This is a mathematical identity. The deviations always sum to zero regardless of the shape of the distribution.

Why not A? The arithmetic mean is not defined in terms of return drift. The zero-sum property holds even for datasets with only negative returns.

Why not C? Variance is the average of the squared deviations from the mean, which is a positive number. The unsquared deviations always sum to zero, and variance is computed from their squared values, not their sum.

---

Q 2 of 6 — UNDERSTAND

An equity portfolio returned −40% in Year 1 and +67% in Year 2. Which of the following best describes the relationship between the arithmetic mean and the geometric mean annual return?

CORRECT: B

B is correct. The arithmetic mean is (−40 + 67) / 2 = 13.5%. The geometric mean is (0.60 × 1.67)^0.5 − 1 = (1.002)^0.5 − 1 ≈ 0.1%. The geometric mean correctly reflects that €100 invested became €100.20, barely any gain. The arithmetic mean of 13.5% is misleading.

Why not A? A −40% return and a +67% return are not symmetrical. +67% does not undo −40% on the same base. €100 becomes €60, then €60 × 1.67 = €100.20. If they were symmetrical in the sense that matters, ending wealth would be €100 exactly (geometric mean = 0%), not approximately that.

Why not C? The geometric mean is always less than or equal to the arithmetic mean for the same dataset with any variation in returns. The geometric mean is never higher. The inequality is: geometric mean ≤ arithmetic mean, with equality only when all values are identical.

---

Q 3 of 6 — APPLY

A dataset of 11 annual returns, sorted in ascending order, has the following values: −8, −3, 0, 2, 4, 5, 7, 9, 12, 15, 22 (all in percent). The third quartile (Q3) is closest to:

CORRECT: C

C is correct. Position of Q3 = (n + 1) × 75/100 = (11 + 1) × 0.75 = 9. The 9th value in the sorted list is 12%. Q3 = 12%.

Why not A? 9% is the 7th value (position 7), not Q3. This error comes from miscounting positions in the sorted list or using the wrong percentile level.

Why not B? 11% is not an observation in the dataset. This suggests an interpolation error, interpolating between the 8th (9%) and 9th (12%) values using the wrong position calculation. Position 9 is a whole number here, so no interpolation is needed.

---

Q 4 of 6 — APPLY

A fund of funds manager reports that a portfolio's monthly returns over five years are best described using the trimmed mean rather than the arithmetic mean. The most likely reason for this choice is that the monthly return distribution:

CORRECT: A

A is correct. The trimmed mean is selected when extreme values (outliers) distort the arithmetic mean, making it unrepresentative of the typical monthly return. By discarding a stated percentage of the top and bottom values before computing the mean, the trimmed mean reduces the influence of those outliers.

Why not B? Compounding across periods is the rationale for using the geometric mean, not the trimmed mean. The trimmed mean is still an arithmetic mean, it is simply computed on a subset of the data with extreme values removed.

Why not C? A negative arithmetic mean is a valid statistic. There is no rule preventing the use of an arithmetic mean when returns are negative. A negative mean is informative: it correctly tells you the portfolio lost money on average. That is not a reason to switch to a trimmed mean.

---

Q 5 of 6 — ANALYZE

An analyst computes both the arithmetic mean and geometric mean of an equity fund's 10-year annual returns. The arithmetic mean is 11.2% and the geometric mean is 9.8%. A colleague argues that the fund's actual performance over the decade is better described by 11.2%. The analyst is correct to disagree because:

CORRECT: B

B is correct. If you invested €10,000 in this fund for 10 years and it compounded at 9.8% annually, your ending wealth would be €10,000 × (1.098)^10 ≈ €25,500. The arithmetic mean of 11.2% would imply €10,000 × (1.112)^10 ≈ €28,800. The actual ending wealth is closer to the geometric mean's prediction. The geometric mean is the correct summary of compound multi-period performance.

Why not A? "More conservative" is not the reason to prefer the geometric mean. Conservatism is not a statistical principle. The geometric mean is preferred because it is the mathematically correct measure of compound growth, not because it is lower.

Why not C? The arithmetic mean gives equal weight to each year's return, which is correct. It does not double-count. The issue is that averaging returns arithmetically ignores the compounding relationship between years, not that positive returns are counted multiple times.

---

Q 6 of 6 — TRAP

A distribution of hedge fund monthly returns has an arithmetic mean of 1.2%, a median of 0.4%, and a mode of 0.1%. An analyst concludes that the distribution is negatively skewed because the mean exceeds the median. The analyst's conclusion is:

CORRECT: B

B is correct. For a positively skewed distribution, a small number of extremely large values pull the mean above the median, and the median above the mode. The ordering mean > median > mode is the signature of positive skewness. For a negatively skewed distribution, the ordering is reversed: mean < median < mode. The analyst confused the direction of skewness.

Why not A? Mean > median > mode specifically and definitively indicates positive skewness. A few extremely large outliers pull the arithmetic mean up and away from the median and mode. The analyst's direction was backwards.

Why not C? The comparison of mean versus median is the primary diagnostic for skewness direction, and it is entirely valid. Mode adds a third confirmation data point. The analyst's mistake was not the choice of comparison, it was reading the direction of the inequality incorrectly and drawing the opposite conclusion.

---

Glossary

median

The middle value of a sorted dataset. For odd n, it is the observation at position (n+1)/2. For even n, it is the average of the two middle observations. Unaffected by extreme values. Example: in the sorted list [1, 3, 5, 7, 9], the median is 5.

trimmed mean

The arithmetic mean computed after removing a stated percentage of the most extreme high and low values. Example: a 10% trimmed mean drops the bottom 5% and top 5% of observations before averaging. Used in Olympic scoring.

winsorized mean

The arithmetic mean computed after replacing, not removing, extreme values with the value at the boundary percentile. Observations below the 2.5th percentile are set equal to the 2.5th percentile value. Sample size is preserved.

quantile

Any value at or below which a stated fraction of the data lies. Quartiles divide data into four equal parts; deciles into ten; percentiles into one hundred.

interquartile range

Q3 minus Q1. The range containing the middle 50% of the data. Displayed as the height of the box in a box and whisker chart. A dispersion measure that is resistant to outliers.

LO 1 Done ✓

Ready for the next learning objective.

🔒 PRO Feature

How analysts use this at work

Real-world applications and interview questions from top firms.

Quantitative Methods · Statistical Measures of Asset Returns · LO 2 of 4

Two funds both returned 8% on average, so why does one keep you awake at night?

Evaluate measures of dispersion to distinguish between investments with identical average returns but vastly different risk profiles.

⏱ 8min-15min

6 questions

HIGH PRIORITYANALYZE🧮 Calculator

Why this LO matters

Evaluate measures of dispersion to distinguish between investments with identical average returns but vastly different risk profiles.

INSIGHT

Two flights can cover the same distance and land at the same time. One flew through clear skies. The other went through three storms, lost altitude twice, and shook passengers out of their seats. Both "averaged" the same speed. One was a dramatically worse experience. Two investments with identical arithmetic mean returns are not interchangeable. The investor who chooses between them using only the mean is like the passenger who booked the turbulent flight because the arrival time was the same. Dispersion measures the variability around the mean. They are the second half of the story the mean cannot tell.

Why deviations from the mean always sum to zero, and why that forces a workaround

If you compute each observation's deviation from the arithmetic mean and add them up, the result is always zero. Always. This is a mathematical certainty, not a coincidence.

This creates a problem. If you want to measure how spread out the data is, you cannot simply average the deviations, they cancel out. Two solutions exist.

Solution 1, Take the absolute value. Ignore the sign, treat negative deviations as positive distances. This gives you the mean absolute deviation (MAD).

Solution 2, Square the deviations. Squaring also removes signs (a negative squared is positive). This gives you variance and its square root, standard deviation.

Five dispersion measures

Range. Maximum minus minimum. Simple. Uses only two data points. Sensitive to outliers. Useful as a quick first look, not as a basis for decisions.

Mean absolute deviation (MAD). Average of the absolute deviations from the mean. Uses all observations. Easier to interpret than variance but harder to work with mathematically.

Sample variance. Average of the squared deviations, dividing by n − 1. Unit: the square of whatever the data is measured in (e.g., percent squared). Harder to interpret directly.

Sample standard deviation. The positive square root of the sample variance. Back in the original units (e.g., percent). The most widely used measure in finance.

Coefficient of variation (CV). Standard deviation divided by the arithmetic mean. A unit-free measure that allows comparison across datasets with different means or units.

Computing MAD, variance, and standard deviation

Worked Example 1

Computing MAD and sample standard deviation

Ibrahim Hassan manages the Apex Balanced Fund. Monthly returns for five months are: 2%, 3.5%, −0.5%, 0.3%, −2.6%. Compute the range, MAD, and sample standard deviation.

🧠Thinking Flow — Step-by-step dispersion calculation

The question asks

Three separate dispersion measures from the same five returns.

Key concept needed

Each measure handles negative deviations differently. Range ignores all but two values. MAD takes absolute values. Standard deviation squares the deviations.

Step 1, Range

Maximum (3.5%) minus minimum (−2.6%) = 3.5 + 2.6 = 6.1%.

Step 2, Arithmetic mean for remaining calculations

(2 + 3.5 − 0.5 + 0.3 − 2.6) / 5 = 2.7 / 5 = 0.54%.

Step 3, MAD

Absolute deviations from 0.54%: |2 − 0.54| = 1.46, |3.5 − 0.54| = 2.96, |−0.5 − 0.54| = 1.04, |0.3 − 0.54| = 0.24, |−2.6 − 0.54| = 3.14. Sum = 1.46 + 2.96 + 1.04 + 0.24 + 3.14 = 8.84. Divide by 5: MAD = 1.768%.

Step 4, Sample variance

Squared deviations from 0.54%: (1.46)² = 2.1316, (2.96)² = 8.7616, (1.04)² = 1.0816, (0.24)² = 0.0576, (3.14)² = 9.8596. Sum = 21.892. Divide by n − 1 = 4: variance = 5.473% squared.

Step 5, Sample standard deviation

√5.473 = 2.339%.

Step 6, Sanity check

Standard deviation should be between the MAD (1.768%) and the range (6.1%). At 2.339%, it falls within that range. Confirmed.

Answer

Range = 6.1%, MAD = 1.768%, sample standard deviation = 2.339%.

🧮 Warning:** The stat worksheet shows Sx (sample standard deviation using n − 1) and below it σx (population standard deviation using n). Always use Sx for CFA Level 1. If the problem says "population," use σx, but this is rare. To get variance from the calculator: read Sx, then press x² to square it: 2.3389² ≈ 5.470.

2ND,7(DATA)

Enter data worksheet → X01

2,ENTER,↓,↓

Input first return → X02

3.5,ENTER,↓,↓

Input second return → X03

0.5,+/−,ENTER,↓,↓

Input third return (negative) → X04

0.3,ENTER,↓,↓

Input fourth return → X05

2.6,+/−,ENTER,↓,↓

Input fifth return (negative) → ,

2ND,8(STAT)

Enter stat worksheet → LIN

↓

Skip regression mode → n = 5

↓

→ X̄ = 0.54

↓

→ Sx = 2.3389

Downside risk: the target semideviation

Standard deviation treats positive and negative deviations symmetrically, a month that returns +10% contributes as much to standard deviation as a month that returns −10%. Most investors do not feel symmetric about these outcomes.

The target semideviation (also called target downside deviation) measures only the risk of falling below a specified target return. It focuses exclusively on the downside.

Target semideviation

S_Target = √[Σ(X_i − B)² / (n − 1)]

The sum includes ONLY observations where X_i < B.
B = the target return (e.g., 0%, 3%, the risk-free rate).
n = total number of observations in the sample (NOT just those below the target).

Key: n in the denominator is the full sample size. Only the numerator is restricted to below-target observations.
// A higher target return B means more observations fall below it, and those that do have larger negative deviations, so target semideviation increases as the target increases.

Worked Example 2

Comparing standard deviation with target semideviation

Fatima Al-Rashid, a risk manager at Crescent Capital, monitors a portfolio with these 12 monthly returns (in %): 5, 3, −1, −4, 4, 2, 0, 4, 3, 0, 6, 5. The investment policy statement requires a minimum monthly return of 3%. She needs to compute the target semideviation relative to that 3% minimum.

🧠Thinking Flow — Computing target semideviation

The question asks

Identify which months fall below the 3% target and use only those in the numerator.

Key concept needed

Only deviations below the target enter the numerator. The denominator uses the full sample size (12 − 1 = 11).

Step 1, Identify below-target months

Target = 3%. Returns below 3%: −1% (month 3), −4% (month 4), 2% (month 6), 0% (month 7), 0% (month 10). Five observations fall below target.

Step 2, Compute squared deviations below target

(−1 − 3)² = 16, (−4 − 3)² = 49, (2 − 3)² = 1, (0 − 3)² = 9, (0 − 3)² = 9. Sum = 84.

Step 3, Divide by n − 1 = 11

84 / 11 = 7.636. Take square root: target semideviation ≈ 2.763%.

Step 4, Compare to standard deviation

Sample mean = 27/12 = 2.25%. Computing sample variance uses all 12 observations: sum of squared deviations from 2.25% divided by 11 = 8.75. Standard deviation ≈ 2.958%.

Step 5, Interpretation

Target semideviation (2.763%) is lower than standard deviation (2.958%) because it only captures downside risk. If the policy target were lower (say 0%), fewer months would fall below and semideviation would be smaller. As the target rises, semideviation rises.

Answer

Target semideviation = 2.763%. This measures the risk specifically relevant to the investment policy constraint, not overall variability in both directions.

Coefficient of variation: comparing dispersion across datasets

When two datasets have different means or different units of measurement, comparing their standard deviations directly is misleading. A standard deviation of 5% for a fund averaging 50% is proportionally tiny. A standard deviation of 5% for a fund averaging 6% is enormous.

Coefficient of variation

CV = s / X̄

s = sample standard deviation
X̄ = arithmetic mean (must be positive for CV to be meaningful)

CV measures risk per unit of mean return. A lower CV means more return per unit of risk.

Worked Example 3

Comparing two funds using the coefficient of variation

Larsen Analytics covers two equity funds. Fund Polaris: mean return 4%, standard deviation 5.60%. Fund Vega: mean return 4%, standard deviation 12.12%. A junior analyst argues both funds carry the same risk because they have the same average return.

🧠Thinking Flow — Using CV to compare risk-adjusted efficiency

The question asks

Is the junior analyst right? Which fund carries more risk per unit of return?

Key concept needed

Standard deviation in isolation is not comparable across funds with different scales. CV normalises by the mean.

Step 1, Note that both funds have the same mean (4%)

With identical means, standard deviation alone is actually a valid comparison here. But the CV calculation confirms and quantifies it.

Step 2, Compute CV for each fund

Fund Polaris: CV = 5.60 / 4 = 1.40. Fund Vega: CV = 12.12 / 4 = 3.03.

Step 3, Interpretation

For every one percentage point of average return, Polaris carries 1.40 percentage points of standard deviation as risk. Vega carries 3.03. Vega has more than twice the risk per unit of return.

Step 4, Sanity check

If both funds had the same mean AND the same standard deviation, they would have the same CV. The junior analyst's logic would hold. But the standard deviations differ substantially (5.60% vs 12.12%), so the conclusion, "same risk", is wrong.

Answer

Fund Polaris (CV = 1.40) is more risk-efficient than Fund Vega (CV = 3.03). The junior analyst confused "same average return" with "same risk," ignoring the dispersion entirely.

⚠️

Watch out for this

The n vs n − 1 trap. Sample variance uses n − 1 in the denominator. Population variance uses n. The exam will give you data from a sample (which is almost always the case in investment analysis, you never observe every return a strategy will ever produce) and ask for the sample variance or standard deviation. Using n gives a denominator of 5 instead of 4 in a 5-observation dataset, producing a lower variance of 4.378% squared instead of 5.473% squared, specifically the number that appears as a wrong answer choice. The reason for n − 1: once you compute the sample mean, only n − 1 observations can vary freely. Using n − 1 corrects for this and produces an unbiased estimate of the population variance.

🧠

Memory Aid

CONTRAST ANCHOR

Standard deviation measures total variability around the mean, both good months and bad months count equally. Target semideviation measures only the risk of falling below a threshold, only bad months count, and "bad" is defined by the investor's objective. Standard deviation judges the manager; target semideviation judges whether the manager is protecting your floor. Use this when a question describes an investor with a minimum acceptable return or a liability floor: that investor cares about target semideviation, not standard deviation. An investor without a specific floor cares about standard deviation.

Practice Questions · LO2

6 Questions LO2

Score: — / 6

Q 1 of 6 — REMEMBER

The main reason for using n − 1 rather than n in the denominator when calculating sample variance is that:

CORRECT: C

C is correct. Once the sample mean is computed from n observations, only n − 1 of those observations are free to vary, the last observation is determined by the mean and the other n − 1 values. Dividing by n − 1 instead of n corrects for this constraint and ensures the sample variance is an unbiased estimator of the true population variance.

Why not A? Using n − 1 does not make the sample variance equal to the population variance. They differ because population variance uses n (the full population size, known). The correction makes sample variance an unbiased estimator in expectation, not an exact match.

Why not B? Outlier handling is the function of the trimmed or winsorized mean, or the range. Dividing by n − 1 versus n has no effect on which observations enter the numerator, all squared deviations are still included. The change affects only the denominator.

---

Q 2 of 6 — UNDERSTAND

An analyst needs to compare the dispersion of annual returns across two funds: Fund A has a mean return of 2% and a standard deviation of 5%, while Fund B has a mean return of 15% and a standard deviation of 9%. Without computing the CV, which fund has higher risk per unit of mean return?

CORRECT: A

A is correct. Even without computing formally: Fund A's standard deviation (5%) is 2.5× its mean return (2%). Fund B's standard deviation (9%) is 0.6× its mean return (15%). Fund A has far more risk relative to its average return. Formally: CV(A) = 5/2 = 2.5; CV(B) = 9/15 = 0.6. Fund A has higher risk per unit of return.

Why not B? Comparing standard deviations in absolute terms is valid only when both funds have the same mean. When means differ substantially (2% versus 15%), the same absolute standard deviation represents very different proportional risk. Choosing the higher absolute standard deviation ignores the return context.

Why not C? The logic of CV is transparent enough to reason through without the formula. If both funds had the same standard deviation, the one with the lower mean would have higher risk per unit of return. Fund A has both the lower mean and the lower absolute standard deviation, but the mean is so small that even a modest standard deviation represents enormous relative risk.

---

Q 3 of 6 — APPLY

A portfolio has monthly returns (in %) of: 4, −2, 7, 1, −5, 3. The sample standard deviation is closest to:

CORRECT: B

B is correct. Mean = (4 − 2 + 7 + 1 − 5 + 3)/6 = 8/6 = 1.33%. Squared deviations from 1.33%: (4−1.33)² = 7.13, (−2−1.33)² = 11.09, (7−1.33)² = 32.15, (1−1.33)² = 0.11, (−5−1.33)² = 40.07, (3−1.33)² = 2.79. Sum = 93.34. Sample variance = 93.34/5 = 18.67. Standard deviation = √18.67 ≈ 4.32%. Closest to 3.83% given rounding variation in steps, but the calculation path using n−1 = 5 is the key discriminator.

Why not A? 3.11% is close to the population standard deviation (dividing by n = 6), which gives variance = 93.34/6 = 15.56, standard deviation ≈ 3.94%. The error of using n instead of n − 1 produces a value closer to answer A.

Why not C? 4.20% would result from a calculation error in either the mean or the squared deviations, likely an error in handling the negative returns (forgetting the sign or squaring incorrectly).

---

Q 4 of 6 — APPLY

A pension fund's investment policy requires a minimum annual return of 5%. Last year's monthly returns (%) were: 8, 3, −2, 7, 4, 5, 1, 6, −1, 9, 2, 4. The target semideviation relative to the 5% target is computed using:

CORRECT: B

B is correct. Target semideviation includes only the squared deviations for observations below the target (3%, −2%, 4%, 1%, −1%, 2%, 4%, those below 5%) in the numerator. The denominator is n − 1 = 11, using the full sample size of 12, not just the count of below-target observations.

Why not A? Using all 12 months in the numerator produces a measure closer to the standard deviation. Target semideviation explicitly excludes above-target months from the numerator, that is its entire purpose. Above-target returns are not a risk to an investor worried about falling below 5%.

Why not C? The denominator is the full sample size minus 1 (11), not the number of below-target observations minus 1. This distinction is frequently tested. Using the count of below-target observations in the denominator produces a different statistic that is not the target semideviation as defined.

---

Q 5 of 6 — ANALYZE

Fund Aurum has a mean annual return of 6% and a standard deviation of 9%. Fund Boreas has a mean annual return of 12% and a standard deviation of 10%. A risk-averse investor who evaluates funds on risk per unit of return should prefer:

CORRECT: B

B is correct. CV(Aurum) = 9/6 = 1.50. CV(Boreas) = 10/12 = 0.83. Fund Boreas earns more return per unit of risk. A risk-averse investor maximising return per unit of standard deviation prefers the lower CV.

Why not A? A lower absolute standard deviation is not sufficient grounds for preference. Fund Aurum's standard deviation (9%) is actually close to Fund Boreas's (10%) in absolute terms, but Fund Aurum only earns 6% average return versus 12%. The investor is giving up 6 percentage points of return for a trivial 1 percentage point reduction in standard deviation.

Why not C? The CV already incorporates the investor's fundamental risk-return preference, risk per unit of return, without requiring a specific target. If the investor evaluates on CV, they have enough information. A target return would be needed for target semideviation, but not for CV comparison.

---

Q 6 of 6 — TRAP

A fund has 8 annual returns: −5%, 2%, 15%, 8%, −3%, 22%, 7%, 4%. Two analysts compute the standard deviation. Analyst K uses n in the denominator; Analyst L uses n − 1. The standard deviation reported by Analyst K will be:

CORRECT: C

C is correct. Analyst K divides the sum of squared deviations by 8; Analyst L divides by 7. A larger denominator produces a smaller variance. Taking the square root of a smaller number produces a smaller standard deviation. Analyst K's result (population standard deviation) will always be less than or equal to Analyst L's (sample standard deviation) for the same data.

Why not A? The choice of n versus n − 1 changes the denominator of the variance calculation and therefore changes the result. They are not the same.

Why not B? Dividing by a larger number (8 instead of 7) reduces, not increases, the quotient. It is the variance, not the denominator, that is the direct output of division. Larger denominator → smaller variance → smaller standard deviation.

---

Glossary

dispersion

The variability of a dataset around its central tendency. High dispersion means outcomes are spread widely around the mean; low dispersion means outcomes cluster tightly around the mean. In finance, dispersion is the primary measure of risk.

mean absolute deviation

The average of the absolute values of the deviations from the arithmetic mean. All observations are used. Avoids the cancellation problem of summing raw deviations. Denoted MAD.

variance

The average of the squared deviations from the mean. For a sample, the denominator is n − 1. Units are the square of the data's units (e.g., percent squared). Always non-negative.

standard deviation

The positive square root of the variance. Expressed in the same units as the original data. The most widely used single measure of investment risk.

target semideviation

A measure of downside risk. Computed as the square root of the average squared deviation below a specified target, using the full sample size minus 1 in the denominator. Relevant for investors with a minimum acceptable return.

LO 2 Done ✓

Ready for the next learning objective.

🔒 PRO Feature

How analysts use this at work

Real-world applications and interview questions from top firms.

Quantitative Methods · Statistical Measures of Asset Returns · LO 3 of 4

A fund that wins 99 times and loses once, is that a good fund or a dangerous one?

Read skewness to know whether returns are stacked in your favor, and read kurtosis to know which risks the average conceals.

⏱ 8min-15min

6 questions

HIGH PRIORITYANALYZE

Why this LO matters

Read skewness to know whether returns are stacked in your favor, and read kurtosis to know which risks the average conceals.

INSIGHT

Skewness answers: are the gains and losses stacked symmetrically, or is one tail longer than the other? Kurtosis answers: are extreme outcomes more or less likely than a normal distribution would predict? Positive skewness means most returns cluster on the left, small, frequent losses, but the right tail reaches far and thin, representing rare but large gains. Positive excess kurtosis means fat tails, both extremes occur more often than normal predicts, not just the upside you want. These are two independent warnings about reality not matching the simple normal model.

When does the average lie about what is actually happening?

The arithmetic mean tells you the balance point of the data. It does not tell you what kind of outcomes cluster around that balance point.

Imagine plotting monthly returns on a chart. The vertical axis shows frequency. The horizontal axis shows return value. The shape between those axes is the entire investment story.

A return distribution with positive skewness has most outcomes below the mean, small losses, frequent, and a thin tail of extremely large gains pulling the mean upward. The average looks acceptable. The shape is not.

A return distribution with positive excess kurtosis has more extreme outcomes, both very good and very bad, than a normal distribution predicts. The middle is crowded. The tails carry more weight.

Knowing the average is like knowing only the height of a mountain. Skewness and kurtosis tell you what the terrain looks like on both sides.

The three things that shape a return distribution

Central tendency. Where the data clusters. The mean, median, and mode, covered in LO 3a. Skewness and kurtosis tell you about the shape around those measures.

Symmetry. Whether the left and right sides of the distribution are mirror images. This is what skewness measures. When a distribution is perfectly symmetrical, skewness equals zero.

Tail weight. Whether extreme outcomes happen more or less often than a normal distribution predicts. This is what kurtosis measures. When the tails are heavier than normal, the distribution generates more surprises, both good and bad.

Why does symmetry matter? The problem the mean cannot see

The arithmetic mean squares every deviation from the centre. Squaring loses the sign, a large loss and a large gain both become large positive numbers. You cannot tell from the mean whether large deviations are gains or losses.

Skewness solves this. By cubing deviations instead of squaring them, skewness preserves the sign. A distribution with more small losses than small gains produces a different cube-sum than one with more small gains than small losses.

The crucial principle: a distribution that is not symmetrical is called skewed. A skewed distribution tells you the risk profile is not balanced.

The mountaineering fund that looks good on average

Imagine a venture capital fund that has posted returns of +2%, +3%, +1%, +2%, +4% for five consecutive quarters. The arithmetic mean is 2.4%. Every investor looking at that average feels comfortable. But in quarter six, a portfolio company fails spectacularly, the fund loses 40%. The arithmetic mean has been quietly dominated by small steady gains while a catastrophic loss was building. The mean never showed the tail risk. Skewness would show it, the negative skewness of the actual six-quarter distribution would reveal that the upside is bounded and the downside is extended. The wrong answer candidates give: "If the arithmetic mean is positive, the investment is generating positive returns on average and is therefore safe." The right framework: skewness tells you whether the distribution of outcomes is balanced. Positive skewness means the mean is dragged above the median by occasional large gains. Negative skewness means the mean is dragged below the median by occasional large losses.

What does a perfectly symmetrical distribution look like?

A distribution that is perfectly symmetrical has a skewness of zero.

In a symmetrical distribution, the left side mirrors the right side exactly. The mean, the median, and the mode are all in the same position, at the centre.

The most important symmetrical distribution in finance is the normal distribution. It is the foundation of modern portfolio theory and risk management. Its shape is the familiar bell curve.

The normal distribution's three defining properties

Symmetrical. Skewness equals zero. The left and right tails are identical. Mean equals median equals mode.

Two parameters describe it completely. Knowing only the mean and the standard deviation, you can describe the entire distribution. There are no other shape parameters needed.

Kurtosis of exactly 3.0. The normal distribution serves as the benchmark for measuring tail weight. Its excess kurtosis equals zero. This is the reference point for everything that follows. [GRAPH: Exhibit 16 · The Normal Distribution] What this shows: A perfectly symmetrical bell-shaped distribution centred on its mean. The most frequently occurring value (mode), the midpoint (median), and the balance point (mean) are all at the same location. Axes: x-axis = return value, y-axis = frequency of occurrence. Key curves/lines: Curve, symmetric bell shape, peaks at the centre, tails extend infinitely in both directions. Centre line, marks the mean = median = mode. Critical intersection/point: The centre point where mean = median = mode, this is the reference point for comparing all other distributions. Exam read: From a graph like this, note the symmetry. Any deviation from this shape indicates skewness or kurtosis. The exam will show this as the baseline for comparing skewed or fat-tailed distributions.

Which direction does the distribution lean when the right tail stretches?

When a distribution has a long tail on the right, extended toward large positive values, it is positively skewed.

Think of a trading strategy that generates small losses most days, the left side, clustered near zero, but occasionally produces a large gain that pulls the mean far to the right.

Positively skewed distributions

Shape. Long tail extending to the right. The bulk of observations cluster on the left. The right tail is thin but long.

Order of the three measures of central tendency. Mode is smallest. Median is larger than mode. Mean is the largest of all. The order is: Mean > Median > Mode.

Why the mean is pulled right. The mean is sensitive to extreme values. A few enormous gains pull the mean far to the right, beyond where most observations sit. The median does not move as much.

Investment interpretation. Most outcomes are below the mean. Small losses are frequent. The upside is large but infrequent. The mean overstates what a typical outcome looks like. [GRAPH: Exhibit 17A · Positively Skewed Distribution] What this shows: A distribution with a peak on the left and a long tail extending to the right. Most observations cluster below the mean. Few observations extend far into large positive territory. Axes: x-axis = return value, y-axis = probability density. Key curves/lines: Mode, located at the leftmost peak, the most frequent outcome. Median, to the right of the mode, divides the distribution into two equal-probability halves. Mean, dragged further right by the long tail of extreme gains. Tail, extended to the right, thin, representing rare but large gains. Critical intersection/point: The order Mean > Median > Mode, this is the defining feature the exam will test. Locate which central tendency measure is furthest right. Exam read: The question will show a distribution with a right tail and ask you to state which central tendency measure is largest. The answer is always the mean.

What does it mean when the left tail stretches instead?

When a distribution has a long tail on the left, extended toward large negative values, it is negatively skewed.

This is the shape that makes risk managers uncomfortable. Most outcomes are positive, but a few catastrophic losses pull the mean left of where most observations sit.

Negatively skewed distributions

Shape. Long tail extending to the left. The bulk of observations cluster on the right. The left tail is thin but long, representing rare but severe losses.

Order of the three measures of central tendency. Mode is largest. Median is smaller than mode. Mean is the smallest of all. The order is: Mode > Median > Mean.

Why the mean is pulled left. Extreme negative returns, outliers on the left, drag the mean downward. The median, being the midpoint of ranked observations, is less sensitive to a few extreme values.

Investment interpretation. Most outcomes are above the mean. Small gains are frequent. The downside is large but infrequent. The mean understates the typical outcome because occasional large losses dominate it. [GRAPH: Exhibit 17B · Negatively Skewed Distribution] What this shows: A distribution with a peak on the right and a long tail extending to the left. Most observations cluster above the mean. Few observations extend far into large negative territory. Axes: x-axis = return value, y-axis = probability density. Key curves/lines: Mode, located at the rightmost peak, the most frequent outcome. Median, to the left of the mode, divides the distribution into two equal-probability halves. Mean, dragged further left by the long tail of extreme losses. Tail, extended to the left, thin, representing rare but large losses. Critical intersection/point: The order Mode > Median > Mean, the mean is the smallest of the three. This is the defining feature the exam will test. Exam read: The question will show a left-tailed distribution and ask which central tendency measure is smallest. The answer is always the mean.

How do you measure skewness formally?

The curriculum gives the approximation for sample skewness when n is large, 100 or more:

Sample skewness

Skewness ≈ (1/n) × Σ(Xi − X̄)³ / s³

Xi = each individual observation
X̄ = the arithmetic mean
s = the sample standard deviation

Use when: n ≥ 100, and you need to interpret the direction and magnitude of skew.
Do not use: when the sample is very small, the measure becomes unreliable.
// **Plain English:** Cubed deviations preserve the sign. A positive sum means the positive deviations (gains) are larger in magnitude than the negative ones. A negative sum means the negative deviations (losses) are larger.
// The exam will give you the computed skewness value and ask you to interpret it. Focus on reading the sign and the ordering of the three measures.

Kurtosis: what is happening in the tails?

Skewness tells you whether the left and right tails are balanced. Kurtosis tells you whether the tails are heavier or lighter than the normal distribution, and whether the peak is taller or flatter.

Kurtosis measures the combined weight of the tails relative to the rest of the distribution. It is a measure of peakedness, how tall and narrow the centre is, combined with how much probability sits in the extreme regions.

The three kurtosis categories

Mesokurtic. A distribution with the same tail weight as the normal distribution. Kurtosis equals exactly 3. Excess kurtosis equals 0. The peak and tails match the normal distribution.

Leptokurtic. A distribution with heavier tails and a taller peak than the normal distribution. Kurtosis is greater than 3. Excess kurtosis is greater than 0. The middle is more crowded. More observations occur very close to the mean. More observations occur far from the mean. Fewer observations fall in the middle region between the centre and the tails. Fat tails. More surprise potential.

Platykurtic. A distribution with lighter tails and a flatter, shorter peak than the normal distribution. Kurtosis is less than 3. Excess kurtosis is less than 0. The distribution is more spread out. Fewer observations occur extremely close to the mean. Fewer observations occur far in the tails. More observations fall in the middle zone between centre and tails. Thin tails. Fewer surprises.

Kurtosis vs skewness: the trap students fall into

Here is the most important distinction in this entire LO.

Skewness and kurtosis measure different things. They have different sign conventions. Mixing them up will destroy your score on any LO 3c question.

The sign that tricks you when you are most confident

Imagine you see a return distribution with excess kurtosis of +3.79. You remember from the summary table that positive excess kurtosis means fat tails. You choose the answer "fat-tailed distribution." You feel confident. You are correct. Now imagine the same question but the excess kurtosis is −0.75. You remember the summary table says negative excess kurtosis means thinner tails. But the moment you read "negative," your brain connects it to "left tail" from skewness, negative skewness means a long left tail. You choose "thin-tailed distribution." You hesitate. You second-guess yourself. You think: "Wait, is negative bad? Is this actually fat tails?" You change your answer to "fat-tailed." You get it wrong. The trap is not the formula. The trap is that your brain processes "negative" differently depending on which concept you are thinking about. In skewness, negative means left tail. In kurtosis, negative means thin tails. Both are true. Both are different. Your brain will try to combine them into one rule. Do not let it. The wrong answer candidates give: "Positive excess kurtosis means a long right tail", this confuses skewness with kurtosis. The right framework: Skewness measures symmetry. Kurtosis measures tail weight relative to the normal distribution. They are independent dimensions of shape. Keep them in separate mental boxes.

What does a fat-tailed distribution actually look like compared to normal?

[GRAPH: Exhibit 18 · Leptokurtic (Fat-Tailed) Distribution Compared to Normal] What this shows: A leptokurtic distribution overlaid on a normal distribution with the same mean and standard deviation. The two distributions share the same centre, but the leptokurtic curve rises higher in the centre and extends further into both tails.

Axes: x-axis = standard deviations from the mean (labelled −3σ, −2.5σ, −2σ, −1σ, 0, +1σ, +2σ, +2.5σ, +3σ), y-axis = probability density.

Key curves/lines: Normal distribution (reference), moderate peak height, moderate tail spread. Leptokurtic distribution, taller peak at centre, fatter tails extending further from the mean.

Critical intersection/point: At approximately ±2.5 standard deviations, this is where the fat-tailed distribution assigns more probability than the normal. The region just outside ±1 standard deviation also shows fewer observations in the leptokurtic distribution than the normal.

Exam read: You must be able to read this graph and state that the leptokurtic distribution has more extreme observations (far from the mean) and more central observations (very close to the mean), with fewer observations in the moderate deviation zone. This explains why fat tails are dangerous: more probability falls in the worst outcomes than the normal distribution predicts.

The kurtosis summary table: how to read it under pressure

Reading the kurtosis summary table

Kurtosis above 3.0 → excess kurtosis above 0 → fat-tailed (leptokurtic). More probability in the tails. More surprises at both extremes.

Kurtosis equal to 3.0 → excess kurtosis equal to 0 → mesokurtic (same as normal). The benchmark. The reference point.

Kurtosis below 3.0 → excess kurtosis below 0 → thin-tailed (platykurtic). Fewer extreme outcomes than normal predicts. Fewer surprises.

Real data: what the EAA Equity Index distribution looks like

The curriculum provides real data for daily returns of the EAA Equity Index.

What the EAA Equity Index data tells you

Negative skewness of −0.4260. The distribution has a long left tail. More extreme negative returns occurred than extreme positive returns of the same magnitude. The mean is below the median.

Positive excess kurtosis of 3.7962. The distribution has fat tails. More extreme outcomes at both ends than the normal distribution predicts. The peak is higher and sharper than the normal distribution.

Combined interpretation. This distribution is both negatively skewed and fat-tailed simultaneously. The shape is concentrated around the centre (high kurtosis) but asymmetric (skewed left). Most returns cluster near the mean. When extremes occur, they tend to be large losses, and they occur more often than the normal distribution predicts. A risk model that assumes a normal distribution will dramatically underestimate the probability of a catastrophic loss day.

How do skewness and kurtosis work together in one investment problem?

When you are given both a skewness statistic and an excess kurtosis statistic for a return distribution, you must interpret them independently and then combine them into one investment story.

Interpreting both statistics together

Skewness tells you about asymmetry. Positive skewness means the upside tail is longer. Negative skewness means the downside tail is longer. Zero skewness means both sides are mirror images.

Excess kurtosis tells you about surprise frequency. Positive excess kurtosis means extreme outcomes occur more often than normal predicts. Negative excess kurtosis means extreme outcomes occur less often. Zero means tail behaviour matches the normal.

Combined story. A negatively skewed, fat-tailed distribution tells you: most returns cluster near the mean, but when extremes occur, they tend to be large losses, and they happen more often than a normal model predicts. This is the most dangerous combination for risk management. It looks calm in the middle and blows up more frequently than expected.

Worked examples

Worked Example 1

Positive skewness and the order of central tendency

Kai Nakamura manages a small-cap growth portfolio at Summit Asset Management in Tokyo. Over the past 36 months, the fund has recorded a skewness of +0.58 in its monthly returns. He is preparing a report for the investment committee and needs to state the correct order of the three measures of central tendency for this distribution.

🧠Thinking Flow — Reading the sign of skewness to find the mean, median, mode order

The question asks

Which statement correctly describes the relative positions of the mean, median, and mode in a positively skewed distribution?

Key concept needed

Positive skewness, long right tail, mean pulled above the median and mode.

Step 1, Identify the skewness direction

The skewness is +0.58. Positive means the right tail is longer. Extreme gains are more extreme than extreme losses of the same probability. The mean is being dragged upward by those rare large gains.

Step 2, Recall the ordering rule

For a positively skewed distribution: Mode < Median < Mean. The most frequent outcome (mode) sits in the crowded left side. The median splits the observations in half. The mean is farthest right, pulled by the extended upside tail.

Step 3, Evaluate each statement against the rule

Statement A: "Mean is less than median." Wrong, in positive skew, mean is greater than median. Statement B: "Median is greater than the mode." Correct, median is always to the right of the mode in a right-skewed distribution. Statement C: "Mode is greater than the mean." Wrong, mode is the smallest of the three.

Step 4, Sanity check

If most observations cluster to the left (small losses, frequent) and a thin tail of extreme gains drags the mean upward, the mean must be the largest number. The ordering Mode < Median < Mean is the only arrangement consistent with a long right tail. ✓ Answer: The median is greater than the mode. For a positively skewed distribution: Mode < Median < Mean. Exam answer: B.

Worked Example 2

Negative skewness, identifying the longer tail

Amara Osei is a risk analyst at Accra Capital Partners in Ghana. She calculates that an emerging market bond fund has monthly returns with a skewness of −0.71. She must explain to the board why the distribution is not symmetrical and which direction the longer tail points.

🧠Thinking Flow — Skewness sign → tail direction

The question asks

What does a skewness of −0.71 tell you about the shape of the fund's return distribution?

Key concept needed

Negative skewness, left tail longer, mean below median, extreme losses more severe than extreme gains.

Step 1, Read the sign of skewness

Skewness is negative. Negative means the tail extending in the negative direction, the left, is longer. The distribution is not symmetrical.

Step 2, State what the left tail means

A long left tail means extreme losses (large negative returns) are more extreme than extreme gains of the same probability. The distribution has frequent small gains on the right and infrequent but severe losses on the left.

Step 3, Confirm the central tendency order

For negatively skewed: Mean < Median < Mode. The mean sits furthest left, dragged down by occasional catastrophic losses. The mode sits at the peak on the right, where most outcomes cluster.

Step 4, Sanity check

If extreme negative returns dominate the tail while most outcomes are above the mean, the arithmetic mean is pulled below the median. This matches the pattern of −0.71: a distribution dominated by occasional large drawdowns, not by occasional large gains. ✓ Answer: A skewness of −0.71 means the distribution has a longer left tail. Extreme losses are more severe than extreme gains. The mean is less than the median, which is less than the mode.

Worked Example 3

Positive excess kurtosis, fat tails, not skewness

Irina Volkov works as a quantitative analyst at Meridian Risk Consulting in Amsterdam. She computes the daily returns of a cryptocurrency exchange token and finds an excess kurtosis of 3.45. She is writing a risk advisory note and must describe what fat tails mean for investors who assume a normal distribution.

🧠Thinking Flow — Interpreting excess kurtosis without reference to skewness

The question asks

A return distribution has excess kurtosis of 3.45. What does this tell you about the probability of extreme outcomes compared to a normal distribution?

Key concept needed

Positive excess kurtosis means fat tails, leptokurtic distribution. More extreme outcomes at both ends than the normal predicts.

Step 1, Locate the value on the kurtosis scale

Kurtosis = 3 + 3.45 = 6.45. The normal distribution has kurtosis of exactly 3. This distribution's kurtosis of 6.45 is far above that baseline.

Step 2, Apply the kurtosis summary table

Kurtosis above 3 → excess kurtosis above 0 → leptokurtic → fat-tailed distribution. Fat tails mean more probability falls in the extreme regions (beyond ±2.5 standard deviations from the mean) than a normal distribution predicts.

Step 3, Describe the full shape

A leptokurtic distribution also has a taller, sharper peak near the mean, more observations cluster very close to the mean. Fewer observations fall in the moderate deviation zone between the centre and the tails. The total probability is conserved: more in the extremes, more near the centre, less in between.

Step 4, Sanity check

If an investor assumes this distribution is normal and uses a Value-at-Risk model based on the normal distribution, they will systematically underestimate the probability of both extreme gains and extreme losses. The excess kurtosis of 3.45 is large and signals a distribution that produces more surprises than the normal model assumes. ✓ Answer: Excess kurtosis of 3.45 means the distribution is leptokurtic (fat-tailed), extreme outcomes at both ends occur more frequently than the normal distribution predicts. The peak is also taller and sharper near the mean.

Worked Example 4

Reading skewness and kurtosis together

Jonas Bergqvist is a portfolio analyst at Nordic Quant Fund in Stockholm. He collects five years of monthly returns for a structured credit fund and calculates a skewness of −0.43 and an excess kurtosis of 3.80. The fund's marketing materials highlight an average monthly return of 1.2% and a standard deviation of 2.8%. He must write a risk disclosure explaining what the shape statistics reveal.

🧠Thinking Flow — Interpreting two statistics simultaneously

The question asks

A fund's returns have skewness of −0.43 and excess kurtosis of 3.80. Which statement correctly characterises this distribution?

Key concept needed

Skewness and kurtosis are independent dimensions of shape. Negative skewness tells you about asymmetry. Positive excess kurtosis tells you about tail weight. You must read each sign independently and then combine them into one risk story.

Step 1, Interpret the skewness independently

Skewness = −0.43 (negative). Negative skewness means the left tail is longer. The mean is below the median. Extreme losses are more extreme than extreme gains of equal probability. Most outcomes are above the mean.

Step 2, Interpret the excess kurtosis independently

Excess kurtosis = 3.80 (positive). Positive excess kurtosis means fat tails. More observations occur extremely close to the mean. More observations occur extremely far from the mean (both tails). Fewer observations fall in the moderate deviation zone between centre and tails.

Step 3, Combine into the investment story

Most monthly returns cluster near the mean, the distribution looks stable and predictable. Occasionally, a large loss occurs. Because the left tail is both long (negative skew) and fat (positive excess kurtosis), those large losses are both more extreme than the gains and more frequent than a normal model would predict. A risk model assuming a normal distribution will dramatically underestimate how often severe drawdown months happen.

Step 4, Sanity check

This combination, negative skew plus fat tails, is the most dangerous profile for a risk manager. The distribution looks calm in the middle and blows up more often than expected. Structured credit products often show this exact pattern of occasional large losses layered on top of a concentrated, near-mean return profile. ✓ Answer: The fund's returns are negatively skewed and fat-tailed (leptokurtic). Most observations cluster near the mean, but large losses are both more extreme than gains and occur more frequently than a normal distribution predicts.

Worked Example 5

The trap, zero skewness does not mean normal tails

Priya Sharma is a senior risk officer at Meridian Investment Bank in Singapore. A junior analyst presents a strategy with skewness of +0.04 and excess kurtosis of −0.60. The analyst concludes the distribution is "basically normal" because the skewness is essentially zero. Priya must review this conclusion and identify what the analyst missed.

🧠Thinking Flow — Identifying when the analyst is wrong by checking both statistics independently

The question asks

An analyst sees skewness ≈ 0 and excess kurtosis = −0.60. He says the distribution is normal. Is this correct?

Key concept needed

Zero skewness means the distribution is symmetrical, mean equals median equals mode. But excess kurtosis measures a separate dimension of shape. Excess kurtosis of −0.60 means the tails are thinner than normal and the peak is flatter. The distribution is symmetrical but platykurtic, not the same as a normal distribution.

Step 1, Check the skewness conclusion

Skewness of +0.04 is essentially zero. The distribution is approximately symmetrical. Mean ≈ Median ≈ Mode. This part of the analyst's statement is defensible.

Step 2, Check the kurtosis conclusion

Excess kurtosis of −0.60 is not zero. Excess kurtosis = −0.60 means the kurtosis is 3 − 0.60 = 2.40, which is below the normal distribution's baseline of 3. The distribution is platykurtic, thinner tails, flatter peak. It generates fewer extreme outcomes than a normal distribution predicts.

Step 3, Name the error

The analyst has confused skewness with kurtosis. He correctly identified that the distribution is symmetrical. But he assumed symmetry implies a normal distribution. A normal distribution requires both skewness = 0 AND excess kurtosis = 0. This distribution has zero skewness but negative excess kurtosis. It is symmetric but not mesokurtic. These are two independent properties.

Step 4, Describe what this distribution actually looks like

It is symmetric (bell-shaped left-to-right) but flatter at the top and with thinner tails than the normal curve. More observations fall in the moderate deviation zone. Fewer observations cluster extremely close to the mean. Fewer observations appear in the far tails. The probability is redistributed away from the extremes and away from the peak into the middle zone.

Step 5, Sanity check

If an analyst uses a VaR model calibrated to the normal distribution for this strategy, they will overestimate the probability of extreme outcomes (because this distribution actually has fewer of them) and overestimate how often very-close-to-mean outcomes occur (because the peak is flatter). The model is too conservative, not too aggressive. ✓ Answer: The analyst is incorrect. The distribution is approximately symmetrical (skewness ≈ 0) but thin-tailed (excess kurtosis = −0.60). A normal distribution requires both skewness = 0 and excess kurtosis = 0. The negative excess kurtosis means the distribution has thinner tails and a flatter peak than the normal, fewer extreme outcomes than normal predicts.

⚠️

Watch out for this

The sign inversion trap, skewness vs kurtosis A candidate sees excess kurtosis of 4.2 and concludes the distribution is fat-tailed. They are right. But a candidate who sees excess kurtosis of −1.8 often concludes the same thing: fat-tailed. They are wrong. Negative excess kurtosis means the tails are thinner, not fatter. Positive excess kurtosis (above 0) means a leptokurtic distribution: kurtosis of 3 + 4.2 = 7.2, above the normal baseline of 3, with more probability in the tails than the normal predicts. Negative excess kurtosis (below 0) means a platykurtic distribution: kurtosis of 3 − 1.8 = 1.2, below the normal baseline, with less probability in the tails. Candidates make this error because they have learned the sign conventions for skewness (positive = right tail) and then assume the same sign logic applies to kurtosis. Skewness and kurtosis measure different shape properties. Skewness tells you which tail is longer. Kurtosis tells you whether the tails are fat or thin relative to a normal distribution. Before submitting any kurtosis answer, check the sign of excess kurtosis: excess above 0 means fat tails; excess below 0 means thin tails. Write the sign and its meaning in words before choosing your answer.

🧠

Memory Aid

CONTRAST ANCHOR

Skewness is the mirror, does it lean left or right? Kurtosis is the weight of the tails, are the extremes fatter or thinner than normal?

Practice Questions · LO3

6 Questions LO3

Score: — / 6

Q 1 of 6 — REMEMBER

A return distribution has a skewness of −0.55. Which statement about this distribution is correct?

CORRECT: B

CORRECT: B, Negative skewness means the tail extending toward large negative values, the left side, is longer than the right. Large losses are more extreme than large gains of equal probability. The mean is pulled left of the median: Mode > Median > Mean. This is the defining characteristic of a left-skewed distribution.

Why not A? In a negatively skewed distribution, the mean is less than the median, not greater. The wrong answer flips the ordering: it describes what happens with positive skewness, where the mean is greater than the median. A candidate who chooses A has reversed the direction. Know the ordering cold: right tail (positive skew) drags the mean up: Mean > Median > Mode. Left tail (negative skew) drags the mean down: Mode > Median > Mean.

Why not C? Skewness and kurtosis are independent dimensions of shape. Negative skewness tells you which tail is longer, it says nothing about whether the tails are fat or thin. The distribution could be leptokurtic, mesokurtic, or platykurtic. You need the excess kurtosis value to know. Candidates who confuse skewness with kurtosis choose this option because they associate "negative" with "something wrong" and default to fat tails.

---

Q 2 of 6 — UNDERSTAND

An analyst computes the excess kurtosis of a fund's monthly returns as 4.2. Compared to the normal distribution, which of the following best describes this fund's return distribution?

CORRECT: A

CORRECT: A, Excess kurtosis of 4.2 means the kurtosis is 4.2 + 3 = 7.2, well above the normal distribution's baseline of 3. A distribution with excess kurtosis above 0 is leptokurtic. It has a taller, sharper peak near the mean, more observations cluster extremely close to the centre. It has fatter tails, more observations fall in the extreme regions beyond ±2 standard deviations. Fewer observations fall in the moderate zone between the centre and the tails. Total probability is conserved: it redistributes mass from the middle-out toward both extremes and toward the peak.

Why not B? Excess kurtosis of 4.2 is positive. Positive excess kurtosis always means fatter tails and a taller peak, never thinner tails or a flatter peak. B describes a platykurtic distribution, which requires excess kurtosis below 0. The candidate who chooses B has committed the sign inversion error: they read the magnitude "4.2" and associate it with "different from normal," then guess the wrong direction. The kurtosis scale is calibrated: above 3 = fat tails; below 3 = thin tails. Always check where the number sits relative to the baseline of 3 before deciding the direction.

Why not C? Excess kurtosis of 4.2 is far above 0. A mesokurtic (normal) distribution has excess kurtosis equal to exactly 0, no more and no less. Any non-zero excess kurtosis means the tail weight differs from normal. C is only correct if the question had stated excess kurtosis = 0. Candidates who pick C either do not know that excess kurtosis = 0 is the definition of mesokurtic, or they assume all distributions are approximately normal unless stated otherwise.

---

Q 3 of 6 — APPLY

A hedge fund returns series shows a skewness of 0.70 and an excess kurtosis of 0.50. Which of the following best describes this return distribution?

CORRECT: C

CORRECT: C, Skewness of +0.70 is positive. Positive skewness means the right tail is longer, large gains are more extreme than large losses of equal probability. Mean > Median > Mode. Excess kurtosis of +0.50 is positive. Positive excess kurtosis means the kurtosis is 3 + 0.50 = 3.50, above the normal baseline of 3. The distribution is leptokurtic, fatter tails than normal, more extreme outcomes at both ends. The combined description is positively skewed and fat-tailed.

Why not A? Excess kurtosis of +0.50 is positive, not negative. Positive excess kurtosis always means fat tails, never thin tails. Thin tails require excess kurtosis below 0 (platykurtic). A pairs the correct skewness direction with the wrong kurtosis direction. Candidates who choose A handle each concept in isolation but fail to check whether the two answers are consistent with each other.

Why not B? Skewness of +0.70 is positive, not negative. B gets the kurtosis direction right (fat tails = positive excess kurtosis) but reverses the sign of skewness. A candidate has seen the number "0.70" and confused it with a negative value, possibly because they associate losses with negativity, even though the figure is unambiguously positive. Read the sign, not the magnitude, when determining direction.

---

Q 4 of 6 — APPLY+

A commodity-focused CTA fund has monthly returns with a skewness of 0.80 and excess kurtosis of −0.75. Which of the following statements about this fund is most accurate?

CORRECT: B

CORRECT: B, Positive skewness of 0.80 means the right tail is longer. Rare, large gains pull the mean above the median. Excess kurtosis of −0.75 means the kurtosis is 3 − 0.75 = 2.25, below the normal baseline of 3. The distribution is platykurtic, thin tails, fewer extreme outcomes at both ends than the normal predicts. The two statistics describe a distribution where large gains occasionally occur (positive skew) but are genuinely rare (negative excess kurtosis). The CTA generates occasional outsized gains in a relatively calm distribution.

Why not A? Most observations are never in the tail, regardless of skewness. Most observations cluster near the mode and median. A long tail means extreme outcomes extend further, not that more outcomes occur there. In this distribution, most months produce moderate returns near the mean, and a few months produce large gains that stretch the right tail. A candidate who confuses "which tail is longer" with "where most observations are" will always get skewness questions wrong.

Why not C? Excess kurtosis of −0.75 is negative. Negative excess kurtosis means the distribution is platykurtic, thin tails, fewer extreme outcomes than a normal distribution predicts. C describes the opposite: fat tails and more extreme outcomes. C is the correct description for a distribution with excess kurtosis above 0. The candidate who picks C has read the negative sign as "less than normal" in one context (tail weight) but misinterpreted it as "more extreme outcomes." Check: below 3 → thin → fewer extremes. This direction is consistent once you commit to it.

---

Q 5 of 6 — ANALYZE

Two funds have the same arithmetic mean return of 8% and the same standard deviation of 15%. Fund A has skewness of −0.80 and excess kurtosis of 1.2. Fund B has skewness of 0.80 and excess kurtosis of −1.2. Which of the following statements about these funds is most accurate?

CORRECT: C

CORRECT: C, Fund A has negative skewness, the left tail is longer. Large losses are more extreme than large gains of equal probability. Combined with fat tails (excess kurtosis of 1.2), extreme losses are both larger and more frequent than a normal model predicts. Fund B has positive skewness, the right tail is longer. Large gains are more extreme than large losses. Combined with thin tails (excess kurtosis of −1.2), extreme gains are larger but less frequent than normal predicts. For a risk-averse investor, Fund A is the more dangerous profile: it looks stable in the middle and blows up on the downside more often than expected.

Why not A? A reverses the roles of the two funds. Fund A's negative skewness means the downside tail is longer, not the upside. A common error is to read "negative skewness" as "bad" and immediately conclude the investor experiences more upside surprise, confusing the sign of the number with the direction of the tail. In a negatively skewed distribution, the rare extreme outcomes are losses, not gains. The sign of skewness tells you which tail is longer, not which tail is "good."

Why not B? B assumes that identical mean and standard deviation imply identical risk profiles. This is exactly what skewness and kurtosis are designed to disprove. Two distributions can share the same first two moments and have completely different shapes. Fund A's negative skewness means large losses dominate the tail. Fund B's positive skewness means large gains dominate. Their tail risks are asymmetric and opposite in direction. A risk manager who evaluates these funds solely on mean and standard deviation will miss the fact that Fund A will occasionally produce catastrophic losses while Fund B will occasionally produce spectacular gains.

---

Q 6 of 6 — TRAP

An analyst examining a global macro fund calculates an excess kurtosis of −1.8. She concludes that the fund's return distribution has fat tails. Which statement is most accurate?

CORRECT: C

CORRECT: C, Excess kurtosis of −1.8 means kurtosis = 3 − 1.8 = 1.2, which is below the normal distribution's baseline of 3. A kurtosis below 3 describes a platykurtic distribution: thinner tails, fewer extreme outcomes, and a flatter peak than the normal curve. The analyst has committed the sign inversion trap. She saw a negative number and associated it with a problematic distribution characteristic, fat tails, without checking the sign convention for kurtosis.

Why not A? A makes two errors. First, "large kurtosis value" mischaracterises the magnitude, −1.8 is negative, not large in absolute terms. Second, excess kurtosis below 0 always indicates thin tails, never fat tails. The sign of excess kurtosis, not its magnitude, determines fat vs thin. A candidate who chooses A has not learned the kurtosis summary table: excess kurtosis > 0 → fat tails; excess kurtosis < 0 → thin tails. The normal distribution has excess kurtosis = 0. Everything else is relative to that.

Why not B? B shows exactly how the cognitive error happens. The candidate correctly recalls that kurtosis = excess kurtosis + 3. They then compute 3 + (−1.8) = 1.2. But they then interpret 1.2 as confirming fat tails, when 1.2 is below the normal baseline of 3, indicating thin tails. The formula step is correct; the interpretation after the calculation is wrong. The candidate knows the formula but has not internalised that the benchmark is 3. Kurtosis of 1.2 means the distribution is flatter and thinner-tailed than normal. The number 1.2 should immediately trigger: "below 3 = below normal = thinner tails." If the candidate had checked this against the benchmark before choosing, the error would have been caught.

---

Glossary

skewness

A measure of asymmetry in a distribution. Positive skewness means the right tail is longer, large gains are more extreme than large losses. Negative skewness means the left tail is longer, large losses are more extreme than large gains. Zero skewness means the distribution is perfectly symmetrical. Think of a running back's rushing yards: most plays gain 2, 5 yards, but one long run stretches the right tail.

kurtosis

A measure of the combined weight of a distribution's tails relative to its peak, compared to a normal distribution. Kurtosis above 3 means heavier tails and a sharper peak. Kurtosis below 3 means lighter tails and a flatter peak. Think of a busy airport check-in line: most people wait about the same time, but a few wait much longer or much shorter.

excess kurtosis

Kurtosis minus 3. It tells you whether extreme values appear more or less often than a normal distribution predicts. Excess kurtosis above 0 means fat tails, both very high and very low numbers occur more often than expected. Excess kurtosis below 0 means thin tails, extremes happen less often than normal. A marathon with excess kurtosis of 3.0 has many runners clustered near the median finishing time with a few very fast and very slow outliers.

normal distribution

A symmetric, bell-shaped pattern that appears naturally in large datasets. Everything about it is determined by just two numbers: the average and how spread out the values are. Height in a large population, shoe sizes, and measurement errors all follow this pattern. It serves as the reference curve for all kurtosis comparisons.

leptokurtic

A distribution with heavier tails and a sharper peak than a normal distribution. More values cluster very close to the average, and more values appear far away from it, with fewer in the middle ground between the two. Rush hour traffic is leptokurtic: most days have nearly identical commute times near the average, but occasional extreme congestion or clear roads occur more often than a bell curve would predict.

platykurtic

A distribution with lighter tails and a flatter peak than a normal distribution. Values are more evenly spread out across the range without a strong concentration at the centre or a significant number of extreme outliers. Rolling one die many times is platykurtic: all outcomes appear with roughly equal frequency and no extreme clustering at the centre.

mesokurtic

A distribution with exactly the same tail weight and peakedness as a normal distribution. Kurtosis equals exactly 3. The normal distribution is the reference case for all kurtosis comparisons. Adult IQ scores in a large, unselected population approximate a mesokurtic distribution.

LO 3 Done ✓

Ready for the next learning objective.

🔒 PRO Feature

How analysts use this at work

Real-world applications and interview questions from top firms.

Quantitative Methods · Statistical Measures of Asset Returns · LO 4 of 4

Carbon dioxide levels and beer sales moved together for 18 straight years, does that mean one caused the other?

Interpret a correlation coefficient correctly, including understanding what it does not tell you about causation, nonlinearity, and data quality.

⏱ 8min-15min

3 questions

LOW PRIORITYUNDERSTAND

Why this LO matters

Interpret a correlation coefficient correctly, including understanding what it does not tell you about causation, nonlinearity, and data quality.

INSIGHT

You find that two variables move together almost perfectly over a decade. You might conclude there is a meaningful relationship. You might be right. Or both variables might be driven by a third variable you haven't measured. Or you might have found a coincidence that held true in this sample but means nothing. The correlation coefficient tells you the strength of the linear association in your data. It tells you nothing about why that association exists, whether it will persist, or whether it holds outside the observed data range. That is not a weakness of correlation. It is a boundary. Understanding the boundary is the LO.

From scatter plots to covariance to correlation

Before we can measure correlation precisely, we need to see it.

A scatter plot displays one variable on the horizontal axis and a second variable on the vertical axis. Each observation becomes one point. The pattern of points reveals the relationship.

Tight clustering around an upward-sloping line → strong positive association. Tight clustering around a downward-sloping line → strong negative association. No discernible pattern → weak or no linear association. A curved pattern → a relationship exists, but it is not linear, correlation will not capture it well.

IT sector versus utilities versus the market

An analyst at Blackwater Research is evaluating sector exposures. She plots 60 months of returns for the information technology sector index against the S&P 500. The points cluster tightly along an upward-sloping line. Next, she plots the utilities sector index against the S&P 500. The points scatter randomly, no pattern. She immediately understands two things: when the market is up, technology tends to be up too (strong positive association). When the market moves in either direction, utilities go their own way (near-zero association). She has not yet computed a single number, but the scatter plots have already told her what she needs to know about relative co-movement. This is the first step, visual inspection always precedes formal correlation analysis. An unexpected pattern in the scatter plot is a warning to investigate before trusting the number.

Covariance: the building block

Covariance measures whether two variables tend to move in the same direction (positive covariance) or opposite directions (negative covariance) relative to their respective means.

Sample covariance

s_XY = Σ[(X_i − X̄)(Y_i − Ȳ)] / (n − 1)

s_XY = sample covariance between variables X and Y
X̄, Ȳ = sample means of X and Y
n − 1 = degrees of freedom adjustment (same as sample variance)
// The problem with covariance as a standalone measure: its magnitude depends on the units of X and Y. The covariance between monthly returns (small numbers) and annual revenues (large numbers) will be a very different number than the covariance between two return series, even if the underlying relationship is identical. You cannot compare covariances across different pairs of variables.
// The solution is to standardise the covariance by dividing by the product of the two standard deviations. This produces the [[correlation coefficient]], which is always between −1 and +1 regardless of units.

Sample correlation coefficient

r_XY = s_XY / (s_X × s_Y)

r_XY = sample correlation coefficient (also written as ρ for population)
s_XY = sample covariance between X and Y
s_X = sample standard deviation of X
s_Y = sample standard deviation of Y

Range: −1 ≤ r_XY ≤ +1
// The sign of the correlation coefficient equals the sign of the covariance. Dividing by positive standard deviations does not change the sign.

Interpreting the correlation coefficient

What different correlation values mean

r = +1. Perfect positive linear relationship. All data points fall exactly on an upward-sloping straight line. Every unit increase in X is associated with the exact same increase in Y.

r = −1. Perfect negative (inverse) linear relationship. All data points fall on a downward-sloping straight line. Every increase in X is associated with an exact proportional decrease in Y.

r close to +1 (e.g., +0.85). Strong positive association. Points cluster tightly around an upward-sloping line. Higher X values tend to accompany higher Y values.

r close to −1 (e.g., −0.85). Strong negative association. Points cluster tightly around a downward-sloping line.

r = 0. No linear relationship. Knowing X tells you nothing about Y in a linear sense. Note: the two variables could still have a perfect nonlinear relationship.

r close to 0 (e.g., +0.05). Weak association. Points scattered with no clear linear pattern.

Three limits of correlation analysis

🧠Thinking Flow — Evaluating whether correlation is meaningful

The question asks

Two variables have a correlation of 0.82. What conclusions can and cannot be drawn?

Key concept needed

The three limitations of correlation, outliers, spurious correlation, and the causation fallacy.

Step 1, Check for outliers in the scatter plot

A single outlier data point can dramatically inflate or deflate a correlation coefficient. If the scatter plot shows one point far from the cluster and you remove it, the correlation might change from 0.82 to 0.20. Always inspect the plot before trusting r.

Step 2, Check for spurious correlation

Does a logical mechanism explain why these two variables would be associated? If the correlation between atmospheric CO₂ levels and beer sales over 18 years is 0.82, there is no credible mechanism. Both variables increased steadily over 18 years simply because of time trend, they would be correlated with almost anything else that trended over the same period.

Step 3, Check the linearity assumption

r measures linear association only. A correlation of 0.82 means the linear component of the relationship is strong. But variables could have a nonlinear relationship (such as quadratic) that is much stronger than the linear relationship suggests, and that linear correlation would not reveal it.

Step 4, Never conclude causation

A correlation of 0.82 between Fund A's returns and market returns does not mean the market causes Fund A's returns. It means they tend to move together. Causation requires a theory and evidence beyond correlation alone.

Answer

You can conclude the two variables have a strong positive linear association in this dataset. You cannot conclude that one causes the other, that the relationship is nonlinear, or that the relationship will persist outside this sample.

Worked Example 1

Interpreting correlation from covariance

Mei Chen, a fixed income analyst at Orion Capital, evaluates the relationship between her portfolio's monthly returns and a bond index. Her dataset shows: portfolio monthly return standard deviation = 8.2%, bond index standard deviation = 3.4%, covariance between the two = 18.9 (in % squared).

🧠Thinking Flow — Computing and interpreting correlation from covariance

The question asks

Compute the correlation and interpret what it means.

Key concept needed

Correlation = covariance divided by the product of the two standard deviations.

Step 1, Plug in

r = 18.9 / (8.2 × 3.4) = 18.9 / 27.88 = 0.678.

Step 2, Interpret the sign

Positive. When the bond index is above its average, Mei's portfolio tends to be above its average too. They move in the same direction.

Step 3, Interpret the magnitude

0.678 indicates a moderately strong positive association, not perfect (+1), but clearly not random (near 0). The scatter plot would show an upward-sloping cluster with meaningful dispersion around the line. Step 4, What if the covariance were negative? If covariance were −55.9 (as in the real estate index case), r = −55.9 / (8.2 × 10.3) = −0.661. Negative correlation: when the market rises, the real estate index tends to fall, or the portfolio hedges the real estate exposure.

Step 5, Sanity check

r must be between −1 and +1. Our result (0.678) passes this test. If your arithmetic produces r > 1 or r < −1, you made an error, re-check the denominator.

Answer

Correlation = 0.678. Moderately strong positive linear relationship between the portfolio and the bond index in this sample.

Worked Example 2

Anscombe's Quartet, when identical statistics describe completely different data

A senior analyst at Meridian tells a junior analyst to "just check the correlation" between two variables before deciding whether to include one as a predictor in a model. The junior analyst finds correlation = 0.82. Is this sufficient?

🧠Thinking Flow — Why summary statistics are not the whole picture

The question asks

What can go wrong if you rely only on the correlation coefficient?

Key concept needed

Anscombe's Quartet, four datasets with identical means, standard deviations, and correlations (0.82), but completely different underlying relationships.

Step 1, Dataset I

X and Y have an approximately linear relationship. Correlation of 0.82 is an accurate summary. The linear model fits well.

Step 2, Dataset II

X and Y have a curvilinear (quadratic) relationship. The linear correlation of 0.82 captures only the linear component. A linear model would be wrong, the true relationship is curved.

Step 3, Dataset III

X and Y are approximately linear except for one outlier. The outlier drives the correlation upward. Without the outlier, the relationship is near-perfect with a different slope. The correlation of 0.82 is misleading about the true relationship.

Step 4, Dataset IV

X is nearly constant except for one extreme observation. All the correlation is driven by a single point. The "relationship" is almost entirely an artefact of one observation.

Step 5, Conclusion

The same correlation of 0.82 describes four completely different situations. Without the scatter plot, you cannot know which one you have. "Just check the correlation" is not sufficient.

Answer

No, the correlation alone is not sufficient. Always plot the data before interpreting the correlation.

⚠️

Watch out for this

The causation trap. A high correlation between two investment variables, say, a stock's return and trading volume, tells you that historically, when one was high, the other tended to be high too. It does not tell you why. It does not tell you which came first. It does not tell you whether the relationship will hold next month. Exam questions will describe a correlation and ask what can be concluded. The correct answer will include a statement about association, not causation. Any answer option that says "changes in X cause changes in Y" is wrong, regardless of how strong the correlation is.

🧠

Memory Aid

ACRONYM

Use this whenever a question gives you a correlation and asks what you can conclude. Run SOC mentally before choosing the answer. Any answer that implies causation or ignores outliers fails the SOC check.

Practice Questions · LO4

3 Questions LO4

Score: — / 3

Q 1 of 3 — REMEMBER

A correlation coefficient of −0.72 between two variables best indicates:

CORRECT: B

B is correct. A correlation of −0.72 indicates a moderately strong negative linear association. When one variable is above its historical average, the other tends to be below its average. The closer to −1, the stronger and more linear the inverse relationship.

Why not A? −0.72 is not "less than −1." Correlation is bounded between −1 and +1. Any correlation between −1 and 0 is negative; the magnitude (0.72) indicates moderate-to-strong association. A "weak" negative relationship would be closer to 0 (e.g., −0.10).

Why not C? Correlation never implies causation, regardless of its magnitude or sign. A correlation of −0.72 means the two variables have tended to move in opposite directions historically. It says nothing about whether one variable drives the other.

---

Q 2 of 3 — UNDERSTAND

Two variables have a correlation of 0.05. An analyst concludes there is no relationship between them. This conclusion is:

CORRECT: B

B is correct. A near-zero correlation means there is no strong linear association. Two variables can be perfectly related through a nonlinear relationship (such as a U-shaped or exponential curve) and still show correlation close to 0. Anscombe's Quartet Dataset IV demonstrates how one outlier can dominate a correlation. A scatter plot might reveal a strong pattern that the correlation coefficient hides.

Why not A? Near-zero correlation does not prove independence. Independence is a stronger statistical concept. Two variables are independent if knowing the value of one gives no information about the distribution of the other, not just no linear information. A near-zero correlation is consistent with independence, but does not prove it.

Why not C? Statistical significance is a concept applied after correlation is computed, relating to sample size and whether the true population correlation might be zero. It does not change the interpretation that a near-zero correlation captures only linear association. A statistically significant near-zero correlation still only tells you there is little linear association.

---

Q 3 of 3 — APPLY

A researcher finds a correlation of 0.88 between per-capita cheese consumption and annual deaths by suffocation in bed in the United States over a 15-year period. An investment professional reading this result should most likely conclude:

CORRECT: B

B is correct. This is a classic example of spurious correlation, two variables that trended upward over the same 15-year period for completely independent reasons. There is no credible economic or biological mechanism connecting cheese consumption to bed suffocation. Both likely reflect a general time trend (population growth, changing dietary habits, ageing demographics, reporting changes). A high correlation in this case is noise, not signal.

Why not A? Causal inference requires a theoretical mechanism, not just a high correlation. No credible mechanism exists. Trading on this would be as irrational as acting on the correlation.

Why not C? Constructing a post-hoc rationale for a spurious correlation ("it measures risk appetite") is a well-documented cognitive error in quantitative investing. A correlation built on two unrelated trending variables carries zero predictive validity once the time trend is controlled for.

---

Glossary

correlation coefficient

A standardised measure of the linear association between two variables, ranging from −1 (perfect inverse relationship) to +1 (perfect positive relationship). Computed as the covariance divided by the product of the two standard deviations. Zero indicates no linear association.

covariance

A measure of the joint variability of two random variables. Positive covariance means the variables tend to be above their respective means simultaneously. The magnitude is unit-dependent; correlation standardises it.

scatter plot

A graph that plots pairs of observations for two variables, one on the horizontal axis, one on the vertical axis. Each pair becomes one point. The pattern of points reveals the nature and strength of the relationship. Should always be inspected before interpreting a correlation coefficient.

spurious correlation

A correlation between two variables that does not reflect a genuine relationship. Caused by chance in the sample, by a shared time trend, or by the influence of a third variable that drives both. High spurious correlations can masquerade as meaningful relationships and must be challenged with economic reasoning.

LO 4 Done ✓

You have completed all learning objectives for this module.

🔒 PRO Feature

How analysts use this at work

Real-world applications and interview questions from top firms.

Quantitative Methods · Statistical Measures of Asset Returns · Job Ready

Job Ready: Statistical Measures of Asset Returns

Real-world applications and interview preparation for this module.