Skip to main content
Quantitative Methods · Estimation and Inference · LO 1 of 3

An auditor and a fund manager walk into the same trap, but which one knows they fell for it?

You will know exactly which sampling method to use, why it produces or prevents bias, and why the most dangerous error has nothing to do with randomness.

Why this LO matters

You will know exactly which sampling method to use, why it produces or prevents bias, and why the most dangerous error has nothing to do with randomness.

INSIGHT
You cannot eliminate sampling error. That is not the point. The point is knowing how large it is. Probability methods let you estimate it. Non-probability methods hide it entirely. And the most dangerous mistake is not choosing the wrong sampling method, it is combining data from two different populations and treating the result as one.

When you cannot study everyone: what is sampling and why does it matter?

An analyst studying all 2,400 mid-cap stocks on European exchanges cannot compute statistics for every single one. The same is true for an auditor inspecting 40,000 transactions or a portfolio manager replicating a bond index with 8,000 issues. In each case, the analyst selects a subset and uses what they find to draw conclusions about the whole.

That subset is a sample. The full collection of items the analyst wants to study is the population. A number that describes the population, the true mean return, the true default rate, is a population parameter. You almost never know these. You compute the equivalent from your sample: a sample statistic, the sample mean, the sample standard deviation, and use it as your best estimate of the population parameter.

Here is the part the curriculum does not foreground enough: your sample statistic is itself a random variable. Draw a different subset of the same size from the same population and you get a different number. That variability is the engine of everything that follows in the curriculum: sampling distributions, confidence intervals, and hypothesis testing all depend on it.

The single dividing line: does every member have an equal chance?

Every sampling method lives on one side or the other of one question: does every population member have an equal probability of selection?

Probability sampling answers yes. Every member of the population has an equal chance. This does not guarantee a perfect sample, it guarantees that your sample statistics are valid estimates of the population parameters you are after. If you can use probability sampling, you know how much sampling error your estimate contains.

Non-probability sampling answers no by design. Selection is based on something other than equal chance, ease of access, the researcher's expertise, or deliberate targeting. The risk is a non-representative sample that systematically biases your conclusions. Non-probability methods are not inherently wrong. They are appropriate when speed, cost, or expert judgment matters more than statistical representativeness.

The question is never "which is better." The question is "which is appropriate for this situation."

The three probability methods: how you divide before you draw

When equal chance is your goal, you have three structural choices about how to organise the population before you sample.

The three probability sampling methods
1
Simple random sampling. Every population member has an equal probability of selection. Selection is entirely random, no human judgment involved. Works best when the population is homogeneous, meaning all members share broadly similar characteristics. If the population is heterogeneous, a simple random sample may by chance miss key subgroups entirely.
2
Stratified random sampling. Divide the population into non-overlapping subgroups called strata based on one or more classification criteria. Draw a simple random sample from each stratum in proportion to that stratum's share of the total population. Every stratum appears in the final sample. This method guarantees representation of key subdivisions and produces more precise parameter estimates than simple random sampling when the population is heterogeneous.
3
Cluster sampling. Divide the population into clusters, each intended to be a miniature version of the whole. Select clusters randomly. Unselected clusters are entirely excluded, no members from them appear in the sample. Two variants: one-stage draws all members from selected clusters; two-stage draws a random subsample from each selected cluster. Clusters are most often geographic. This is the most time-efficient and cost-efficient probability method for large populations, but given equal sample size, it usually produces lower accuracy than stratified or simple random sampling.
Priya is writing her thesis on starting salaries for new finance graduates in Singapore. She sends her survey to 200 graduates she can reach easily, LinkedIn connections, alumni group members, people who responded quickly to her first post. She gets a mean estimate of SGD 72,000. She submits this as her estimate for all new finance graduates in Singapore. The problem: her 200 respondents are not a random cross-section. They are whoever she could reach fastest. Graduates who are harder to contact, those in smaller firms, in less prominent roles, or outside her professional network, are absent. Her sample systematically excludes them. The wrong answer candidates give: "Priya used simple random sampling because she selected 200 graduates." The right framework: convenience sampling selects because data is accessible, not because every member of the population had an equal chance. Her estimate of SGD 72,000 may be too high or too low, you have no way to know because the sample is not representative.

Systematic sampling: the practical shortcut

Systematic sampling is a practical shortcut within simple random sampling. You cannot always identify every member of a population, assign each a number, and randomly select from a hat. In those situations, you instead pick every kth member, for example, every tenth transaction, until you reach your target sample size.

This procedure does not give equal probability of selection in the strict mathematical sense, but if the population order is random with respect to the characteristic you are measuring, the resulting sample behaves like a simple random sample.

The key condition: the population must not have a hidden periodic pattern that aligns with your interval k. If customers in positions 1, 11, 21, 31 are systematically different from the rest, a systematic sample with k = 10 will systematically misrepresent the population.

Bond indexing: stratified sampling in action

Bond indexing constructs a portfolio to replicate a specified bond index. The full-replication approach owns every bond in the index in proportion to its market-value weight. Many bond indexes contain thousands of issues. Owning all of them is costly and impractical, many bonds trade infrequently, and the bid-ask spread on illiquid bonds erodes returns.

Stratified sampling solves this. The manager identifies the major risk factors of a bond portfolio, duration, sector, credit quality, coupon structure, and divides the index bonds into cells (strata) based on these factors. A simple random sample is then drawn from each cell in proportion to that cell's weight in the index. The resulting portfolio mimics the index's risk characteristics without owning every bond.

This is not perfectly random, selection within cells often uses additional criteria like bond liquidity. For indexing purposes, this does not matter. For statistical inference about a population parameter, it would matter enormously.

The two non-probability methods: when representativeness is not the goal

Non-probability methods abandon equal chance. They are appropriate when speed, cost, or expert judgment matters more than statistical representativeness.

The two non-probability sampling methods
1
Convenience sampling. Select elements because they are easy to access. The researcher chooses based on accessibility, not on probability. The sample is not necessarily representative of the population. The method's value is speed and low cost. It is commonly used in pilot studies, preliminary research, and situations where budget or time constraints make probability sampling impractical. The risk is systematic bias, easy-to-reach elements may share characteristics that differ from the rest of the population.
2
Judgmental sampling. The researcher deliberately selects elements based on their professional knowledge, experience, and judgment. The sample is not random, it reflects the researcher's beliefs about which elements are most informative. The method is appropriate under time constraints or when specialist expertise can produce a more targeted sample than a random draw would. The risk is researcher bias: the sampler may unconsciously select elements that confirm their prior hypothesis.
FORWARD REFERENCE
Confidence intervals, once you have a sample statistic, confidence intervals tell you the range within which the true population parameter is likely to fall.
For this LO, you only need the concept that sampling error is unavoidable when sampling, and that probability methods let you quantify how large it is likely to be. You will study confidence interval construction fully in LO 7b.
→ Quantitative Methods

Worked Examples

Identifying simple random sampling from a scenario with equal probability of selection

The question you must answer: does this scenario give every population member an equal chance, or does it use some other selection rule?

Worked Example 1
Simple random sampling in a large-cap equity fund
Fatima Al-Rashid manages a fund focused on mid-cap European equities. She wants to estimate the average price-to-book ratio of all mid-cap companies listed on European exchanges. There are 2,400 such companies, too many to analyse individually. Her analyst assigns each company a unique ID number from 1 to 2,400, enters them into a random number generator, and selects 80 IDs. The resulting 80 companies form her sample.
🧠Thinking Flow — Identifying simple random sampling
The question asks
Which sampling method does Fatima's analyst use?
Key concept needed
simple random sampling, every population member has an equal probability of selection, achieved through a random draw.
Step 1, Identify the selection rule
The analyst assigned every one of the 2,400 companies a number. Then a random draw selected 80 numbers. Every company had the same probability of being selected. There was no grouping, no geographic constraint, no researcher judgment involved. The selection went directly from population to random draw. That is the defining feature of simple random sampling.
Step 2, Check against the other probability methods
Is this stratified random sampling? No, there was no division into subgroups first. The analyst did not split companies by sector or country and then draw from within each split. Is this cluster sampling? No, the analyst did not select whole groups and exclude unselected groups entirely. Every company was individually in the running. Is this systematic sampling? No, there was no fixed interval like "every tenth company." The random number generator replaced that.
Step 3, Sanity check
If the population of 2,400 companies is broadly similar across sectors and sizes, mid-cap by definition limits the size range, then equal probability of selection is appropriate. Simple random sampling is well-suited to a relatively homogeneous population. ✓ Answer: Simple random sampling. Every member had an equal probability of selection via random number draw. Exam answer: A.

Identifying stratified random sampling from a scenario with pre-divided population and proportional draws

The question you must answer: does the scenario divide the population first, then draw from within each division, with every division represented?

Worked Example 2
Stratified random sampling in bond indexing
Dmitri Volkov manages a bond portfolio indexed to the Panorama Global Corporate Index. The full index contains 8,000 bonds, far too many to hold at practical cost, especially because many are illiquid. Dmitri's team divides the index bonds into 60 cells based on three classification criteria: issuer type (government, agency, corporate) × maturity band (10 bands) × coupon level (above or below 6%). They then draw a simple random sample from each cell in proportion to that cell's market-value weight in the index. The resulting 120-bond portfolio mimics the index's duration and credit profile.
🧠Thinking Flow — Identifying stratified random sampling
The question asks
Which sampling method is Dmitri using?
Key concept needed
stratified random sampling, divide into strata first, then draw a simple random sample from within each stratum so every stratum appears in the final sample.
Step 1, Locate the division step
Dmitri's team does not draw 120 bond IDs randomly from the full list of 8,000. They first partition the 8,000 bonds into 60 cells based on three risk characteristics. That division is the defining first step of stratified random sampling. It is not an optional extra, it is the method's structural foundation.
Step 2, Check the proportional draw from each cell
Within each cell, a simple random sample is drawn. The size drawn from each cell is proportional to that cell's market-value weight in the index. Dmitri's team does not draw more from the biggest cell just because it is liquid, nor does it ignore a small cell entirely. Every one of the 60 cells appears in the final 120-bond portfolio.
Step 3, Confirm this is not cluster sampling
The most common mistake at this point is to say "Dmitri divided the bonds into groups and sampled from some groups, that is cluster sampling." It is not. In cluster sampling, unselected clusters are entirely excluded, no member from them appears in the sample. Dmitri's approach includes every one of the 60 cells. The distinction: stratified keeps all subgroups in the sample (at reduced element count); cluster discards entire subgroups.
Step 4, Note on randomness within cells
The curriculum states explicitly that stratified sampling used in bond indexing is not strictly random, additional criteria like liquidity are often applied within cells. For the purpose of index replication, this does not matter. For statistical inference about a population parameter, it would matter enormously because equal probability of selection is violated. ✓ Answer: Stratified random sampling. Every stratum appears in the final sample; elements are drawn from within each stratum in proportion to the stratum's weight in the population. Exam answer: B.

Identifying cluster sampling from a geographic selection scenario

The question you must answer: does the scenario select whole groups as units, excluding unselected groups entirely?

Worked Example 3
Cluster sampling in a national investor survey
Naomi Osei works for a research institute studying whether individual investors in Kenya are bullish, bearish, or neutral on the Nairobi Stock Exchange. She cannot survey all Kenyan investors, there are millions. She divides Kenya into its 47 counties. Each county's investor population is treated as a mini-representation of all Kenyan investors. She randomly selects 8 of the 47 counties. She then surveys every investor she can contact within those 8 counties.
🧠Thinking Flow — Identifying cluster sampling
The question asks
Which sampling method is Naomi using?
Key concept needed
cluster sampling, divide into clusters, select clusters randomly as whole units, exclude unselected clusters entirely.
Step 1, Identify the cluster structure
Naomi divides the population (all Kenyan investors) into 47 geographic clusters (counties). Each cluster is treated as a self-contained microcosm of the whole population. That is the structural signature of cluster sampling: the cluster, not the individual element, is the unit of selection.
Step 2, Confirm the random selection of clusters
Naomi randomly selects 8 of the 47 counties. Those 8 counties are in the sample. The remaining 39 counties are entirely excluded, not a single investor from Nakuru, Kiambu, or Kisii appears in the sample. This is the critical distinction from stratified random sampling. In stratified sampling, all strata appear. In cluster sampling, unselected clusters are completely absent.
Step 3, Identify the variant
Naomi surveys every investor she can contact within the 8 selected counties. She does not then draw a further random sample within those counties. That makes this one-stage cluster sampling. If she had randomly sampled investors within the 8 counties, it would be two-stage cluster sampling.
Step 4, Sanity check on accuracy
Cluster sampling is the most time-efficient and cost-efficient probability method for a vast population spread across a geographic area. The trade-off: lower accuracy compared with stratified or simple random sampling at the same sample size, because any single county may not perfectly represent all Kenyan investors. ✓ Answer: Cluster sampling. Geographic clusters are selected as whole units; unselected counties are entirely excluded from the sample. Exam answer: C.

Choosing between judgmental and convenience sampling under time pressure

The question you must answer: is the selection driven by easy access, or by the researcher's deliberate professional judgment?

Worked Example 4
Choosing judgmental sampling for an auditor under time pressure
Kinzua is a senior auditor at Whitford & Associates. She is auditing the transaction ledger of Conemaugh Corporation for the past fiscal year. Conemaugh processed over 40,000 transactions, inspecting all of them would be impossible within the audit deadline. Kinzua has 18 years of experience auditing manufacturing companies. She uses her professional knowledge to select transactions that historically carry the highest risk of misstatement: large round-number entries, entries posted on public holidays, and reversals posted after month-end. She constructs a targeted sample of 200 transactions.
🧠Thinking Flow — Choosing the right non-probability method
The question asks
Which sampling method is most appropriate for Kinzua's situation?
Key concept needed
judgmental sampling, selection based on the researcher's professional knowledge and expertise, not on ease of access.
Step 1, Rule out convenience sampling first
Many candidates answer "convenience sampling" because Kinzua is under time pressure. Time pressure is indeed a condition for non-probability sampling. But convenience sampling selects because data is accessible, whoever is easiest to reach, closest at hand, fastest to retrieve. Kinzua is not picking the 200 transactions sitting in the most accessible files. She is picking them based on her expert knowledge of where error risk concentrates. That is the distinction.
Step 2, Apply the definition of judgmental sampling
Judgmental sampling selects elements deliberately based on the researcher's professional judgment. Kinzua has 18 years of domain expertise. She identifies the specific transaction categories that carry elevated audit risk: large round numbers (possible fictitious entries), holiday postings (possible manual overrides), and reversals (possible corrections of errors that should not have been made). Her expertise is the selection tool, not convenience.
Step 3, Note the risk
Judgmental sampling carries the risk of researcher bias, Kinzua may unconsciously select transactions that confirm her prior suspicions, missing other types of errors. This is the acknowledged trade-off of the method. It is not a reason to reject it in this context; it is a reason to be aware of it. ✓ Answer: Judgmental sampling. Kinzua's 18 years of professional expertise drive the selection of the highest-risk transactions, not ease of access. Exam answer: A.

Distinguishing stratified random from cluster sampling in a multi-method exam question

The question you must answer: does every subgroup appear in the final sample, or are whole groups selected as units and excluded?

Worked Example 5
Comparing stratified random and cluster sampling
A quantitative research team at Meridian Capital is testing factor exposure across a universe of 3,000 stocks. They consider two approaches:

Approach Alpha: Divide the 3,000 stocks into ten deciles by market capitalisation. Draw a simple random sample of 30 stocks from each decile. Combine all 300 stocks into the research dataset.

Approach Beta: Divide the 3,000 stocks into 20 industry groups. Randomly select 5 industry groups. Include all stocks from those 5 groups in the research dataset. Exclude the other 15 groups entirely.

🧠Thinking Flow — Comparing stratified and cluster sampling
The question asks
What type of sampling does each approach use?
Key concept needed
The core distinction between stratified random sampling and cluster sampling.
Step 1, Analyse Approach Alpha
The team divides the population into ten deciles (strata). From each decile, a simple random sample of 30 is drawn. All ten deciles appear in the final dataset. No decile is excluded. The sampling unit is the individual stock within each stratum. This is stratified random sampling by definition.
Step 2, Analyse Approach Beta
The team divides the population into 20 industry groups (clusters). Five groups are randomly selected. The remaining 15 groups are completely absent, none of their stocks appear in the dataset. The sampling unit is the cluster (industry group), not the individual stock. Non-selected clusters are fully excluded. This is cluster sampling.
Step 3, The distinguishing feature
The single question that separates these two methods: does every subgroup appear in the final sample? For stratified sampling, the answer is yes, every decile contributes elements. For cluster sampling, the answer is no, non-selected clusters are absent. Approach Beta intentionally excludes 15 of 20 industry groups. Approach Alpha includes all 10 deciles.
Step 4, Accuracy comparison
Approach Alpha (stratified) produces greater precision in factor estimates because every size category is represented. Approach Beta (cluster) is faster and cheaper but sacrifices representation of excluded industries, a technology-heavy cluster may look very different from a consumer staples cluster, so the resulting dataset may not reflect the full factor distribution. ✓ Answer: Approach Alpha = stratified random sampling; Approach Beta = cluster sampling.

Recognising the same-population trap when pooling data from different distributions

The question you must answer: does the pooled dataset describe one real population, or does it describe nothing?

Worked Example 6
The same-population trap in a Sharpe ratio calculation
Hiro Yamamoto manages the Sakura Growth Fund. The fund's annual report reports a single Sharpe ratio based on eight quarterly excess returns spanning two calendar years. In Year 1, the fund followed a conservative strategy with low volatility: quarterly excess returns of −3%, +5%, −3%, +5%, averaging 1.0% with a quarterly standard deviation of 4.62%. In Year 2, the fund switched to an aggressive growth strategy: quarterly excess returns of −12%, +20%, −12%, +20%, averaging 4.0% with a quarterly standard deviation of 18.48%. The annual report pools all eight quarters and reports a combined Sharpe ratio of 0.199. The benchmark Sharpe ratio is 0.21. The report concludes the fund underperformed the benchmark.
Period Q1 Q2 Q3 Q4 Mean Std Dev Sharpe
Year 1 −3% +5% −3% +5% 1.0% 4.62% 0.22
Year 2 −12% +20% −12% +20% 4.0% 18.48% 0.22
Pooled , , , , 2.5% 12.57% 0.199
🧠Thinking Flow — Detecting the same-population violation
The question asks
What is wrong with the pooled Sharpe ratio, and what does it misrepresent?
Key concept needed
The same-population assumption, all observations in a sample must come from one population. Pooling observations from two different populations violates this.
Step 1, Identify the population change
The fund manager changed strategy between Year 1 and Year 2. The two sets of quarterly returns were generated by fundamentally different investment processes. Year 1 returns come from a low-volatility, conservative strategy. Year 2 returns come from a high-volatility, aggressive strategy. These are two distinct populations, not one.
Step 2, What the wrong approach produces
The analyst pools all eight quarters as if they came from a single population. The pooled mean is (1.0% + 4.0%) ÷ 2 = 2.5%. The pooled standard deviation is 12.57%. The pooled Sharpe ratio is 2.5 ÷ 12.57 = 0.199. This number misrepresents both years: it is too low for Year 1 (true Sharpe 0.22) and too high for Year 2 (true Sharpe 0.22). It describes a fund that does not exist.
Step 3, What the correct approach yields
Keep the two populations separate. Each year, the Sharpe ratio is 0.22 against a benchmark of 0.21. The manager outperformed the benchmark in both years. The pooled figure reverses this conclusion.
Step 4, The cognitive error
The analyst believed that a larger sample (n = 8) is statistically superior to a smaller sample (n = 4). This is generally true, but only when all observations come from the same population. Larger samples from a mixed population are not more accurate; they are more misleading. The pooled Sharpe ratio of 0.199 is not a cautious estimate. It is a number that corresponds to no real fund at any point in time. ✓ Answer: The pooled Sharpe ratio violates the same-population assumption. The correct conclusion: the manager outperformed the benchmark in both years (Sharpe 0.22 > 0.21). The pooled ratio of 0.199 misrepresents the fund because it averages two different distributions.
⚠️
Watch out for this
The "divided-into-groups" trap A candidate reads "the population is divided into groups and a random sample is drawn from each group" and selects cluster sampling. The correct answer is stratified random sampling: all groups are present in the final sample and elements are drawn from within each. Candidates make this error because they focus on the "dividing into groups" action and assume cluster sampling, without checking whether all groups appear in the final sample or only selected ones do. The defining feature of stratified random sampling is that every stratum is represented in the final sample. In cluster sampling, unselected clusters are entirely absent, no member from them appears anywhere. Before selecting your answer, ask one question: does every division appear in the final sample, or only the selected clusters?
🧠
Memory Aid
CONTRAST ANCHOR
S
S.C.R.E.A.M, Stratified: Complete representation. Cluster: Exclude the rest
Stratified random sampling keeps every group in the sample. Cluster sampling keeps only the selected clusters. When a question describes "dividing into groups and sampling from within each," ask: are all groups present, or only the chosen ones? All groups present = stratified. Only selected groups = cluster. If the scenario does not specify whether non-selected groups are completely absent, look for whether a random draw selected clusters as units (cluster) or whether a proportional draw was made from every subdivision (stratified). When in doubt, check the output: if the final sample includes every subdivision, it is stratified, not cluster.
Practice Questions · LO1
6 Questions LO1
Score: — / 6
Q 1 of 6 — REMEMBER
An equity analyst wants to estimate the average price-to-earnings ratio of all companies in the FTSE 100. She assigns every company a unique ID number and uses a random number generator to select 100 IDs from 1 to 100. Every company had an equal chance of selection. Which sampling method is this?
CORRECT: B

CORRECT: B, Simple random sampling gives every population element an equal probability of selection through a random draw. Assigning IDs and using a random number generator to select from them is the textbook mechanism for this method. There is no division into subgroups, no geographic grouping, and no researcher judgment involved.

Why not A? Stratified random sampling requires dividing the population into non-overlapping subgroups (strata) first, then drawing a simple random sample from within each stratum. The scenario contains no mention of any grouping step. A would be correct only if the analyst had first split companies by sector or market cap and then sampled from each group.

Why not C? Cluster sampling divides the population into clusters (typically geographic) and selects entire clusters as units, completely excluding non-selected clusters. Here, every company was individually in the running, no geographic grouping or cluster-level exclusion occurred. C would be correct if the analyst had randomly selected five UK cities and surveyed every FTSE 100 company in those cities, ignoring companies in the other cities.

---

Q 2 of 6 — UNDERSTAND
Why does stratified random sampling generally produce more precise estimates of population parameters than simple random sampling when the population is heterogeneous?
CORRECT: A

CORRECT: A, When a population is heterogeneous, its members differ significantly from one another. A simple random sample of a given size might by chance miss a key subgroup entirely or over-represent it. Stratified random sampling prevents this by ensuring every subgroup (stratum) appears in the final sample. This structural guarantee narrows the range of plausible values for any parameter estimate, the estimates are more precise because the sampling variance is reduced.

Why not B? Stratified random sampling does not increase any individual element's probability of selection above what a simple random sample provides. The total number of elements sampled is the same; what changes is how those elements are distributed across subgroups. This statement describes something that does not occur in stratified sampling.

Why not C? Cluster sampling is a separate method, not a component of stratified sampling. The precision gain in stratified sampling comes from proportional representation of all strata, not from excluding any method. C confuses two distinct sampling techniques.

---

Q 3 of 6 — APPLY
A bond portfolio manager indexed to a government bond benchmark divides the full bond index into cells by maturity band and credit rating, then draws a simple random sample from each cell in proportion to that cell's market-value weight in the index. Every cell is represented in the final portfolio. Which sampling method is she using?
CORRECT: B

CORRECT: B, The manager divides the population into cells (strata) based on two classification criteria, maturity and credit rating. She then draws a simple random sample from within each cell, and every cell appears in the final portfolio. That is the structural definition of stratified random sampling: divide into strata, then sample within each, guaranteeing representation of all subgroups.

Why not A? Simple random sampling draws directly from the whole population without any preliminary division into subgroups. The scenario explicitly describes a division into cells before sampling, that division step is the distinguishing feature of stratified sampling, not simple random sampling.

Why not C? In cluster sampling, entire clusters are selected as units and non-selected clusters are entirely excluded. Here, every maturity/rating cell contributes to the final portfolio, none are excluded. C would be correct if the manager had randomly selected a few credit rating categories and included all maturities within those categories, completely ignoring the others.

---

Q 4 of 6 — APPLY+
A researcher studying retail investor sentiment in Australia divides the country into its six states and two territories, randomly selects three states, and surveys every investor she can contact within those three states. She excludes the other five states and territories entirely. Which sampling method is this, and what is its primary limitation compared with stratified random sampling?
CORRECT: A

CORRECT: A, The researcher divides Australia into geographic clusters (states and territories), randomly selects three, and surveys all investors within them. The other five regions are entirely absent. This is cluster sampling. The primary limitation is lower accuracy: any given state may not perfectly represent the national population of investors. Three states cannot capture all regional economic and cultural diversity across a country as large as Australia.

Why not B? This is not stratified random sampling. In stratified random sampling, every stratum (in this case, every state and territory) appears in the final sample, only the elements within each region are sampled. Here, five of eight regions are completely excluded. That is cluster sampling, not stratified.

Why not C? This is not simple random sampling. Simple random sampling gives every individual investor an equal chance of selection with no geographic grouping. Here, geography is the structural basis for selection, and only three geographic units are included, the defining feature of cluster sampling.

---

Q 5 of 6 — ANALYZE
A junior analyst needs a preliminary estimate of average price-to-book ratios for a new research report on publicly listed companies. She uses her firm's existing internal database of 120 companies, firms her team already covers and has data readily available for. A senior manager tells her this estimate is unreliable for drawing inferences about all publicly listed companies. Which sampling method did she use, and why is it unreliable in this context?
CORRECT: C

CORRECT: C, The analyst selected companies because they were already accessible in her firm's internal database, not because every publicly listed company had a probability of selection. She chose based on the convenience of existing access. That is the defining feature of convenience sampling. It is unreliable for estimating population parameters because easy-to-reach companies may systematically differ from the broader population, her firm may disproportionately cover large-cap or profitable companies, excluding smaller firms and less-covered sectors.

Why not A? Judgmental sampling uses the researcher's professional knowledge to deliberately target the most informative elements for a specific purpose. The analyst here did not apply expertise to identify which companies are most representative, she used whatever happened to be in her existing database. That is convenience, not expertise-based selection.

Why not B? Simple random sampling requires that every population member have an equal probability of selection via a random draw. She did not use a random draw from the full universe of publicly listed companies. She used an existing non-random database built for internal purposes. This is not simple random sampling.

---

Q 6 of 6 — TRAP
An analyst computes a fund's Sharpe ratio by pooling quarterly excess returns from two consecutive years. During Year 1 the fund pursued a low-volatility strategy. During Year 2 the fund switched to a high-volatility strategy. The analyst reports a pooled Sharpe ratio of 0.199. The benchmark Sharpe ratio is 0.21. The analyst concludes the fund underperformed its benchmark. What is the specific error, and what is the correct conclusion?
CORRECT: A

CORRECT: A, The analyst violated the same-population assumption. The Year 1 and Year 2 returns were generated by two fundamentally different investment processes (low-volatility versus high-volatility). Pooling them treats two distinct populations as one, producing a pooled Sharpe ratio of 0.199 that misrepresents both periods. Year 1 Sharpe: 1.0% ÷ 4.62% = 0.22. Year 2 Sharpe: 4.0% ÷ 18.48% = 0.22. Both exceed the benchmark of 0.21. The manager outperformed in both years. The pooled figure is not a cautious estimate, it describes a fund that does not exist at any point in time.

Why not B? Sample size is not the issue here. A sample of 8 is larger than a sample of 4, but a larger sample from a mixed population is more misleading, not more accurate. The issue is not how many observations were used but whether all observations come from the same underlying population. Eight quarters from two different strategies is still two populations.

Why not C? The Sharpe ratio formula was applied correctly within each year. The error is not computational, it is conceptual. The formula was applied to data that should never have been combined. The Sharpe ratio can use any return frequency; quarterly excess returns are perfectly valid inputs when all returns belong to the same underlying strategy.

---

Glossary
population
Every member of the group you want to study. Example: all listed companies on the JSE. You want to know something about them but cannot examine every single one.
sample
The subset of the population you actually observe and compute statistics from. Example: 80 companies selected from the JSE for a debt ratio survey.
population parameter
A number that describes the entire population, such as the mean return or variance of all members. Almost never known in practice, that is why we sample.
sample statistic
A number computed from sample data, such as the sample mean or sample standard deviation. Used as the best estimate of the corresponding population parameter.
probability sampling
Any sampling method in which every population member has an equal probability of selection. Example: writing every company name on a card and drawing 50 cards blindly from a hat.
non-probability sampling
Any method where selection is based on factors other than equal probability, such as easy access or a researcher's judgment. Example: surveying the ten companies already in your firm's database because you already have their data.
simple random sampling
A probability method where each population element has an equal chance of selection, achieved through a random draw. Best when the population is broadly similar throughout, no major subgroups to represent.
strata
Subgroups of a population created by dividing based on one or more classification criteria. Example: dividing a bond index by duration bucket (1, 3 years, 3, 5 years, 5, 10 years).
stratified random sampling
Divide the population into strata first, then draw a simple random sample from within each stratum in proportion to that stratum's size. Every stratum appears in the final sample.
cluster sampling
Divide the population into clusters (each intended to represent the whole), select clusters randomly as units, and exclude unselected clusters entirely. Example: dividing a country into provinces, randomly selecting three, and surveying only residents within those three provinces.
convenience sampling
Select elements because they are easy to access, not because they were randomly drawn. Example: surveying the colleagues in your own office because they are closest and most likely to respond quickly.
judgmental sampling
Select elements based on the researcher's professional knowledge and expertise rather than random selection. Example: an experienced auditor selecting transactions that historically carry the highest risk of misstatement.
systematic sampling
A shortcut within simple random sampling. You cannot always number every member, so instead you pick every kth element, every tenth transaction, for instance, until you reach your target sample size.
bond indexing
An investment strategy that builds a portfolio to replicate a specified bond index. Managers often use stratified sampling to hold a representative subset of bonds in the right proportions rather than owning every issue.
sampling error
The gap between your sample statistic and the true population parameter, caused by the fact that a sample is always a subset. It is unavoidable, the price of not inspecting every member.
bias
A systematic error where the sampling method consistently produces estimates that are too high or too low, rather than varying randomly around the true value. Example: a survey of office workers about commuting times will miss everyone who cycles to work before office hours.

LO 1 Done ✓

Ready for the next learning objective.

🔒 PRO Feature
How analysts use this at work
Real-world applications and interview questions from top firms.
Quantitative Methods · Estimation and Inference · LO 2 of 3

Why does taking a bigger sample make your estimate more trustworthy, even if the population is chaotic?

The central limit theorem shows that sample means cluster around the true population mean in a predictable, normal distribution, regardless of the population's actual shape, and standard error measures exactly how tight that clustering is.

Why this LO matters

The central limit theorem shows that sample means cluster around the true population mean in a predictable, normal distribution, regardless of the population's actual shape, and standard error measures exactly how tight that clustering is.

INSIGHT
You never actually know the true population mean. What you do know is this: if you took many random samples of the same size from a population, calculated the mean of each sample, and plotted those means, they would form their own distribution. That distribution is normal, predictable, and gets tighter as your sample size grows, no matter what the population itself looks like. This is what the central limit theorem tells you. It is the foundation of everything else in statistical inference.

What the central limit theorem actually says

Here is the wrong way to think about this.

Most people assume: "If my population data is messy and skewed, then any estimate I make from it must also be messy and skewed." That feels logical. It is wrong.

Think about what happens when you average a group of numbers. One extreme value has less and less power to distort the average as the group gets larger. If you are averaging 5 numbers, one outlier pulls the mean hard. If you are averaging 500 numbers, that same outlier barely moves it. Averaging is a smoothing process.

Now imagine doing that averaging not once, but thousands of times. Each time, you draw a fresh sample of the same size, compute the mean, and write it down. You end up with thousands of sample means. Plot them. What shape do you get?

Normal. Every time. For any population shape. That is the central limit theorem.

The Central Limit Theorem and Its Components
1
Sampling distribution of the mean. The probability distribution of all possible sample means that could be computed from repeated samples of equal size drawn from the same population. This is not the distribution of the raw data. Use it to understand how close your single sample mean is likely to be to the true population mean.
2
The central limit theorem (CLT). States that the sampling distribution of the sample mean becomes approximately normal distribution|normal when the sample size is large (n ≥ 30), regardless of the shape of the underlying population distribution. Use it to apply normal distribution logic to any population, even when you do not know the population's true shape.
3
Standard error of the sample mean. The standard deviation of the sampling distribution. It measures how much variation exists among sample means. Computed as σ/√n (when you know the population standard deviation) or s/√n (when estimating from the sample). Use it to quantify how much inaccuracy to expect in your sample mean as an estimate of the true population mean.
4
The role of sample size (n). As n increases, the standard error decreases by a factor of √n, not by n. Larger samples produce sampling distributions more tightly clustered around the true population mean. Use this to understand why larger samples are more precise, and by exactly how much.
5
Standard error vs. standard deviation. Standard deviation measures dispersion of the raw data around its mean: data description. Standard error measures how much uncertainty surrounds your estimate of a population parameter: inference precision. They have different purposes and are never interchangeable, even though both measure spread.
FORWARD REFERENCE
Hypothesis testing and confidence intervals, what you need for this LO only
These are statistical inference methods that build directly on the sampling distribution and standard error. They answer questions like "Is this result statistically significant?" and "What range probably contains the true population mean?" For this LO, you only need to understand that the standard error is the tool that makes these methods possible: it quantifies the precision of your sample estimate. You will study both methods fully in Quantitative Methods Module 4.
→ Quantitative Methods

How to apply the CLT: two worked examples

Both worked examples below show the CLT in practice. The first is conceptual. The second is numerical. Together they cover the two question types you will see on the exam for this LO.

Worked Example 1
Why the population's shape doesn't matter for the sample mean
Priya Mehta is a junior analyst at Thornfield Asset Management in Singapore. She is studying daily trading volume data for a small-cap equity index. The distribution of daily volume is highly skewed: most days see modest activity, but occasional spikes push the data far to the right. Priya's supervisor asks her to explain why, despite this skewed population, she can still treat the distribution of sample means as approximately normal when drawing large samples.
🧠Thinking Flow — CLT and population shape independence
The question asks
Does the sampling distribution of the sample mean depend on the shape of the population distribution?
Key concept needed
The central limit theorem. Many candidates assume the sample mean distribution must mirror the population shape. This is the wrong approach. The CLT says the opposite.
Step 1, Identify the wrong approach
Many candidates reason: "The population is skewed, so the sample mean must also follow a skewed distribution." This feels logical but is exactly what the CLT overturns. The sample mean is not a raw data point. It is an average of many data points, and averages behave very differently from individual observations.
Step 2, Apply the CLT
The CLT states that when sample size n is at least 30, the distribution of sample means is approximately normal, regardless of the population's shape. Priya's population is skewed. But as long as her samples contain at least 30 observations each, the distribution of those sample means will be approximately normal, centred on the true population mean μ.
Step 3, Sanity check
If this result were not true, statistical inference would be impossible for any non-normal population, which is most real-world data. The fact that hypothesis testing and confidence intervals work in practice for skewed data confirms the CLT is doing exactly this job. ✓ Answer: Priya can treat the distribution of sample means as approximately normal, because the CLT guarantees this for n ≥ 30 regardless of population shape. The population's skewness is irrelevant to the shape of the sampling distribution.
Worked Example 2
Calculating standard error and the effect of sample size
Rafael Ortega is a research analyst at Castillo Capital in Mexico City. He is estimating the mean annual return of growth fund managers across a universe of funds. He assumes the population standard deviation of returns is 6%. Rafael wants to understand how precise his estimate will be for two different sample sizes: 36 managers, and 576 managers.
🧠Thinking Flow — Standard error formula and the √n relationship
The question asks
What is the standard error of the sample mean for each sample size, and what does the difference tell us?
Key concept needed
Standard error of the sample mean (σ/√n). Many candidates divide by n instead of √n, which produces a much smaller and incorrect result.
Step 1, Identify the wrong approach
A common error: dividing the population standard deviation directly by n, not by √n. For n = 36, dividing by n gives 6% / 36 = 0.167%. That is the wrong formula and the wrong number. The correct formula uses √n.
Step 2, Calculate standard error for n = 36
Standard error = σ / √n = 6% / √36 = 6% / 6 = 1.00% The spread of the distribution of sample means, when each sample contains 36 managers, is 1.00%.
Step 3, Calculate standard error for n = 576
Standard error = σ / √n = 6% / √576 = 6% / 24 = 0.25%
Step 4, Compare and interpret
Increasing the sample size from 36 to 576 is a 16-fold increase in n. The standard error falls from 1.00% to 0.25%, a fourfold reduction. This is the √n relationship: to cut the standard error in half, you must quadruple the sample size, not double it.
Step 5, Sanity check
The standard error must always be smaller than the population standard deviation (6%), because averaging across a sample smooths out extreme values. Both answers (1.00% and 0.25%) are smaller than 6%. The larger sample gives the smaller standard error. Both pass. ✓ Answer: Standard error is 1.00% for n = 36 and 0.25% for n = 576. Quadrupling the sample size cuts the standard error in half. This is a direct consequence of the √n relationship in the standard error formula.
⚠️
Watch out for this
The "divide by n, not √n" trap. A candidate who divides the population standard deviation directly by the sample size gets 6% / 36 = 0.167% for Rafael's first scenario, instead of the correct 1.00%. The correct standard error is σ / √n = 6% / √36 = 1.00%. Candidates make this error because the formula looks like a simple fraction, and the instinct is to divide by the whole number n, when the denominator actually requires the square root of n. Before finalising a standard error answer, always write √n explicitly as a number first, then divide.
🧠
Memory Aid
ACRONYM
S
S.N.A.P — , the four things the CLT locks in about the sampling distribution of the sample mean.
S
Shape — Approximately normal when n ≥ 30, regardless of the population's shape.
N
Number (sample size) — Must be at least 30 for the approximation to hold.
A
Average (mean) — The mean of the sampling distribution equals the true population mean μ.
P
Precision (standard error) — Spread equals σ divided by the square root of n, not n itself.
When a question describes a skewed or unusual population and asks what the distribution of the sample mean looks like, run through S.N.A.P. in order. If you hesitate on the P step, write the square root sign before you write n. That one habit prevents the most common numerical error on this LO.
Practice Questions · LO2
3 Questions LO2
Score: — / 3
Q 1 of 3 — REMEMBER
According to the central limit theorem, the sampling distribution of the sample mean is approximately normal when which condition is met?
CORRECT: B

CORRECT: B, The central limit theorem states that the sampling distribution of the sample mean approaches normality when n ≥ 30, regardless of the shape of the underlying population. This is the theorem's most powerful feature: it does not require the population to be normal, and it does not depend on whether σ is known.

Why not A? This is the most common misconception about the CLT. The population does not need to be normal for the sampling distribution to be approximately normal. A highly skewed, bimodal, or uniform population will still produce an approximately normal sampling distribution of means, as long as n ≥ 30. If normality of the population were required, the theorem would be almost useless in practice, because most real-world data is not normally distributed.

Why not C? Knowing the population standard deviation affects how you calculate the standard error: you use σ/√n instead of s/√n. But it has no bearing on the shape condition stated in the CLT. The CLT's shape guarantee holds whether σ is known or unknown. Knowing σ is a convenience for computation, not a requirement for the theorem to apply.

---

Q 2 of 3 — UNDERSTAND
An analyst draws repeated samples of size 50 from a population with a strongly right-skewed distribution. Which statement best describes the distribution of the resulting sample means?
CORRECT: C

CORRECT: C, This is the CLT in action. With n = 50, which exceeds the n ≥ 30 threshold, the sampling distribution of the sample mean is approximately normal regardless of the population's skew. Its centre equals the true population mean μ, and its spread, the standard error, equals σ/√n. All three elements (shape, centre, spread) are determined by the CLT and the standard error formula together.

Why not A? This is the intuitive-but-wrong answer. It assumes the sample mean distribution inherits the shape of the population distribution. In fact, the process of averaging smooths out extreme values. Each sample of size 50 produces a mean that is already an average of 50 observations, which pulls extreme values toward the centre. The CLT quantifies exactly how this smoothing produces a normal shape.

Why not B? The shape of the sampling distribution can be described: it is approximately normal, and the CLT guarantees this. What μ and σ do affect is where the distribution is centred and how wide it is, not whether it is approximately normal. The shape guarantee holds even when the precise values of μ and σ are unknown.

---

Q 3 of 3 — APPLY
Fatima Al-Rashid is a quantitative analyst at Meridian Investment Group in Dubai. She is studying the monthly expense ratios of equity mutual funds. The population standard deviation of expense ratios is 0.45%. Fatima draws a random sample of 81 funds. What is the standard error of the sample mean?
CORRECT: B

CORRECT: B, Standard error = σ / √n = 0.45% / √81 = 0.45% / 9 = 0.050%. The key step is taking the square root of the sample size first. √81 = 9, so the standard error is 0.45 divided by 9, giving 0.050%.

Why not A? The result 0.056% comes from dividing 0.45% by √64 = 8, which would be correct if n were 64, not 81. This is a computational slip: using the wrong value of n before taking the square root. Always confirm which n the question provides before computing √n.

Why not C? The result 0.006% comes from dividing the population standard deviation by n itself rather than √n: 0.45% / 81 ≈ 0.006%. This is the most common formula error on this topic, and it is exactly the trap named above. The denominator in the standard error formula is always √n. Dividing by n instead of √n produces a result far too small. In Rafael's earlier example, the same error turned 1.00% into 0.167%. Here it turns 0.050% into 0.006%. The wrong number will appear as an answer choice precisely because it is such a common mistake.

---

Glossary
sampling distribution of the mean
The probability distribution of all possible sample means you could compute by drawing repeated samples of equal size from the same population. Think of it like repeatedly rolling a handful of dice and recording the average each time: the collection of all those averages has its own distribution, which is what the central limit theorem describes.
standard deviation
A measure of how spread out individual data points are around the mean of a dataset. A small standard deviation means the data clusters tightly around the mean; a large one means the data is widely scattered. Think of it as the average distance each data point sits from the centre of the distribution.
central limit theorem
The statistical result stating that the sampling distribution of the sample mean is approximately normal when the sample size n is at least 30, regardless of the shape of the underlying population distribution. It is the foundation for most statistical inference in practice.
standard error of the sample mean
The standard deviation of the sampling distribution of the mean, calculated as σ/√n (when the population standard deviation σ is known) or s/√n (when estimated from a sample). It measures how much uncertainty surrounds a sample mean as an estimate of the true population mean. Smaller standard error means a more precise estimate.
normal distribution
A symmetric, bell-shaped probability distribution fully described by its mean and standard deviation. Most people's heights in a country follow a roughly normal distribution: most cluster around the average, with fewer very tall or very short people at the edges.
sample size
The number of individual observations included in a sample, denoted n. Larger sample sizes reduce the standard error by a factor of √n, making estimates more precise. This is why political polls with 1,000 respondents are more trustworthy than polls with 100.
population
The entire group about which you want to draw conclusions. In statistics, the population has fixed but often unknown parameters (mean μ, standard deviation σ). Studying the entire population is usually impossible or impractical, like trying to weigh every fish in the ocean.
sample mean
The arithmetic average of observations in a sample, written as x̄ (x-bar). Just as one thermometer reading estimates the true temperature of a room, the sample mean estimates the true population mean from a subset of observations.

LO 2 Done ✓

Ready for the next learning objective.

🔒 PRO Feature
How analysts use this at work
Real-world applications and interview questions from top firms.
Quantitative Methods · Estimation and Inference · LO 3 of 3

When you have only one sample but need to know how much you can trust it, what do you do?

Treat your one sample as if it were the population, resample it thousands of times, and let the resamples reveal what the true sampling distribution looks like.

Why this LO matters

Treat your one sample as if it were the population, resample it thousands of times, and let the resamples reveal what the true sampling distribution looks like.

INSIGHT
You will never know the true population parameter. But your one sample is a micro version of the entire population. If you resample from it thousands of times, the variation you see in those resamples mirrors the variation you would see if you could sample from the real population over and over again. That variation is the sampling distribution. Bootstrap lets you build it without ever touching the population itself.

What resampling actually does

When you flip a coin once and get heads, that single flip tells you nothing reliable about the true probability of heads. Flip it a hundred times and you learn something. But what if you only have one sample of data and you cannot go back for more?

That is the problem resampling solves.

Instead of sampling from the population again (which you cannot do), you sample repeatedly from the sample you already have. You treat your single sample as if it were the population. You draw thousands of mini-samples from it, compute your statistic each time, and watch the pattern emerge. The distribution of those statistics becomes your estimate of the true sampling distribution.

This is how you estimate how much error is baked into your one sample.

Resampling Methods
1
Sampling distribution. The probability distribution of a statistic if you repeatedly drew samples of the same size from the population. You will never observe the true sampling distribution directly, but resampling approximates it from the data you have.
2
Bootstrap resampling. You draw many new samples from your original sample, with replacement, meaning the same data point can appear more than once in a single resample. Each resample is the same size as the original. Compute your statistic (mean, median, standard deviation) on each resample, and the distribution of those statistics approximates the true sampling distribution.
3
Jackknife resampling. You create resamples by leaving out one observation at a time from the original sample, without replacement. For a sample of size n, you produce exactly n resamples. Use this when you need to reduce bias or when bootstrap results show too much random variation across runs.
4
Standard error (resampling definition). The standard deviation of your resampled statistics across all B resamples. It measures how much the statistic varies from sample to sample. Lower standard error means your sample statistic is a more reliable estimate of the true population parameter.
5
Model-free resampling. A synonym for bootstrap. It is called model-free because you do not assume any particular distribution for the population. You let the actual data distribution speak for itself. This makes bootstrap especially useful when no analytical formula exists for the statistic you need.

Why resampling exists: when formulas run out

The reason resampling exists is simple. Most statistics have no analytical standard error formula.

You know the formula for the standard error of the sample mean: s/√n.

The sample median? No formula exists. The 75th percentile? No formula. A custom ratio you invented? No formula.

Bootstrap solves this. It works for any statistic. Jackknife is the alternative when you need perfectly consistent output across runs, or when you need to measure estimator bias explicitly.

The wrong move is reaching for s/√n out of habit and applying it to a statistic it was never derived for. The right move is asking first: does a closed-form formula exist for this statistic? If yes, use it. If no, use bootstrap.

FORWARD REFERENCE
Sampling distribution, what you need for this LO only
A sampling distribution is the probability distribution of a statistic (like the sample mean) if you drew infinite random samples of the same size from the population. It represents the pattern of variation you would observe across many independent samples. For this LO, you only need to recognise that resampling approximates the sampling distribution by mimicking repeated sampling from the population using the data you already have. Full treatment: Quantitative Methods Module 2.
→ Quantitative Methods
FORWARD REFERENCE
Confidence interval, what you need for this LO only
A confidence interval is a range of values built around a sample statistic that likely contains the true population parameter. Bootstrap sampling distributions can be used to construct confidence intervals without assuming a normal distribution. For this LO, you only need to understand that resampling creates the empirical distribution used to estimate where the true parameter lies. Full treatment: Quantitative Methods Module 4.
→ Quantitative Methods

How to apply resampling: worked examples

The following examples build from concept recognition to numerical calculation. Work through them in order. Each one adds a layer to the skill you will need on exam day.

Worked Example 1
Choosing the right method for a statistic with no formula
Priya Nair is a junior analyst at Meridian Asset Management in Singapore. She has 60 months of return data for a thinly traded infrastructure fund and wants to estimate the standard error of the sample median return. Her colleague suggests using the formula s/√n. Priya suspects this is wrong but cannot immediately explain why.
🧠Thinking Flow — bootstrap vs analytical formula for the median
The question asks
Which method correctly estimates the standard error of the sample median?
Key concept needed
Model-free resampling (bootstrap). The analytical formula s/√n applies only to the sample mean. Candidates who apply s/√n to the median produce a number that looks plausible but rests on a derivation that simply does not exist for the median.
Step 1, Name the wrong approach first
Many candidates reach for s/√n immediately. This formula is the standard error of the sample mean, derived from the Central Limit Theorem. It works because the sampling distribution of the sample mean is well-characterised analytically. The sample median has no equivalent closed-form result. Applying s/√n to the median is like using a recipe for beef stew on a fish dish. The instructions exist, but they apply to the wrong thing. The result is a number, but not a statistically valid one.
Step 2, Apply the correct concept
Bootstrap resampling is designed precisely for this situation. Priya should treat her 60-month dataset as the population, draw thousands of resamples of size 60 with replacement, compute the median of each resample, and then calculate the standard deviation of those medians using the bootstrap standard error formula. This produces an empirically grounded estimate of the standard error of the median, requiring no analytical formula at all.
Step 3, Sanity check
Ask: does a closed-form formula exist for this statistic? The median has no standard error formula. Therefore bootstrap must be used. If the question had asked about the sample mean, either method would be valid. But the question specifies the median, so only bootstrap applies.
Answer
Bootstrap is the correct method. The analytical formula is wrong for the median because the Central Limit Theorem's standard error result applies only to the sample mean. Exam answer: B.
Worked Example 2
Computing bootstrap standard error from given resample data
Carlos Romero works in the risk analytics team at Albatross Capital in São Paulo. He runs a bootstrap procedure on 108 months of return data for a small-cap equity fund, drawing 200 resamples with replacement, each of size 108. The program reports that the mean of the 200 resample means is 0.0261, and the sum of squared deviations of the resample means from that average is 0.835. Carlos must calculate the bootstrap standard error of the sample mean.
🧠Thinking Flow — applying the bootstrap standard error formula
The question asks
What is the bootstrap standard error of the sample mean, given B, the mean of resample means, and the sum of squared deviations?
Key concept needed
The bootstrap standard error formula. The most common error is using B (here 200) in the denominator instead of B−1 (here 199). This produces 0.0646 versus the correct 0.0648. The values are close enough to be tempting, but the denominator is wrong.
Step 1, Identify the formula and its components
The bootstrap standard error formula is: s = √[ (1 / (B−1)) × Σ(θ̂_b − θ̄)² ] Where: - B = number of resamples = 200 - Σ(θ̂_b − θ̄)² = sum of squared deviations of resample means from the grand mean = 0.835 - B−1 = 199 (the denominator, not B)
Step 2, Calculate the quantity inside the square root
Divide the sum of squared deviations by B−1: 0.835 ÷ 199 = 0.004196
Step 3, Take the square root
√0.004196 = 0.0648
Step 4, Sanity check
The standard error should be small relative to the returns themselves. A monthly return standard error of approximately 6.5% is plausible for a small-cap fund. Now check the wrong-denominator result: 0.835 ÷ 200 = 0.004175, √0.004175 = 0.0646. That value is close, but it uses B not B−1. This is the same conceptual error as using n instead of n−1 when computing sample variance. Both are wrong for the same reason: the denominator corrects for degrees of freedom. ✓ Answer: 0.0648. Exam answer: B.
Worked Example 3
Bootstrap vs jackknife, identifying which description fits which method
Fatima Al-Rashid is preparing for her CFA Level 1 exam and reviewing two resampling techniques in her study notes. She reads two method descriptions and must match each to its correct name. Method X leaves out one observation at a time, always produces the same output when run again, and requires exactly n repetitions. Method Y draws samples randomly with replacement, produces slightly different results each run, and lets the analyst choose how many repetitions to use.
🧠Thinking Flow — distinguishing bootstrap from jackknife by their defining properties
The question asks
Which method is bootstrap and which is jackknife?
Key concept needed
The three distinguishing properties of each method: replacement rule, consistency across runs, and number of repetitions.
Step 1, Identify the signal words
"Leaves out one observation at a time" and "always the same output" are the defining signals for jackknife. "With replacement" and "produces different results each run" are the defining signals for bootstrap. Method X: leaves out one observation, always the same output, exactly n repetitions → jackknife. Method Y: random draws with replacement, different results each run, analyst-chosen repetitions → bootstrap.
Step 2, Apply the logic of each method
Jackknife is systematic. For a sample of size n, you leave out observation 1 and compute the statistic, restore it, leave out observation 2, and so on. You do this exactly n times. Because there is no randomness, you get the same result every time. Bootstrap is random. Each resample is drawn by picking data points at random, with replacement. Because the selection is random, two runs of bootstrap on the same data produce slightly different resample means and slightly different standard errors. You choose how many resamples to draw. More is better, and 1,000 is a common starting point.
Step 3, Sanity check
Ask: does the number of repetitions depend on the sample size, or does the analyst choose it? For jackknife: repetitions = n. Fixed by the data. For bootstrap: repetitions = B, chosen by the analyst. These two rules cannot be swapped.
Answer
Method X is jackknife. Method Y is bootstrap. Jackknife is mechanical (n steps, no randomness). Bootstrap is simulation-based (B steps, analyst-chosen, random each time).
Worked Example 4
Why bootstrap is called model-free resampling
Tariq Osei is a quantitative analyst at Riverstone Partners in Accra. He is explaining to a junior colleague why bootstrap is sometimes called "model-free" or "non-parametric" resampling. The colleague asks: "Does model-free just mean we don't use a computer model?" Tariq needs to give a precise answer.
🧠Thinking Flow — understanding what model-free resampling means
The question asks
What does "model-free" mean in the context of bootstrap resampling, and why is this property valuable?
Key concept needed
Model-free resampling. The term means the method does not assume a specific probability distribution (normal, uniform, exponential, etc.) for the population. This is different from saying no computation is involved.
Step 1, Name the wrong interpretation first
Many candidates read "model-free" and think it means "no formula is needed" or "no computer is needed." Both are wrong. Bootstrap requires substantial computation, typically thousands of resampled calculations. "Model-free" refers to the absence of a distributional assumption, not the absence of a formula or a machine.
Step 2, Apply the correct meaning
Conventional statistical formulas (like z-statistics or t-statistics) assume the population follows a specific distribution, usually normal. If that assumption is wrong, the formula gives misleading results. Bootstrap makes no such assumption. It builds the sampling distribution directly from the data itself. Whatever distribution the data actually follows, bootstrap reflects it, without the analyst having to specify it in advance. This is why bootstrap is particularly valuable for: - Statistics with no analytical formula (like the median, trimmed mean, or a custom ratio) - Populations whose distribution is unknown or non-normal - Complex estimators where deriving a formula analytically would be prohibitively difficult
Step 3, Sanity check
If a population is known to be perfectly normal and the sample is large, bootstrap and the analytical formula should give very similar standard error estimates. The difference is that bootstrap works even when normality fails. The analytical formula does not. Model-free means broadly applicable, not computationally simple.
Answer
"Model-free" means bootstrap does not impose a distributional assumption on the population. It lets the data's own structure determine the sampling distribution. This makes it applicable to any statistic, whether or not an analytical formula exists, unlike parametric methods that require the population to follow a specific assumed distribution.
⚠️
Watch out for this
The B vs B−1 denominator trap. A candidate who divides the sum of squared deviations by B (the number of resamples, for example 200) gets a bootstrap standard error of 0.0646 instead of the correct answer of 0.0648. The correct approach divides by B−1 (here 199), giving √(0.835 ÷ 199) = 0.0648. Candidates make this error because they treat the resamples as a complete population and reach for the population variance formula (divide by n), when the bootstrap standard error formula uses B−1 to correct for degrees of freedom, exactly as sample variance uses n−1 instead of n. Before submitting, confirm the denominator is B−1, not B, and that you have taken the square root of the entire expression, not just the fraction.
🧠
Memory Aid
FORMULA HOOK
Subtract one from the resamples, divide the squared gaps, then root the whole thing.
Practice Questions · LO3
6 Questions LO3
Score: — / 6
Q 1 of 6 — REMEMBER
In bootstrap resampling, each resample is drawn from the original sample in which of the following ways?
CORRECT: B

CORRECT: B, Bootstrap resampling draws with replacement. This means any data point in the original sample can appear zero, one, or multiple times in a single resample. Each resample is the same size as the original sample, but its composition varies randomly because selected observations are returned to the pool before the next draw. The variation across resamples is exactly what allows bootstrap to estimate the sampling distribution.

Why not A? This describes sampling without replacement. In bootstrap, observations are not removed after selection. Sampling without replacement would exhaust the original data set after n draws and produce a resample identical to the original, defeating the purpose of generating variation across resamples. Removing after selection is how jackknife works in spirit (though jackknife removes systematically, not randomly).

Why not C? Systematically leaving out one observation at a time is the defining procedure of jackknife resampling, not bootstrap. Jackknife produces exactly n resamples for a sample of size n, one for each possible single-observation exclusion. Bootstrap produces B resamples where B is chosen by the analyst, typically in the hundreds or thousands, and each resample is drawn randomly rather than systematically.

---

Q 2 of 6 — UNDERSTAND
An analyst wants to estimate the standard error of the sample median for a portfolio of 80 monthly returns. A colleague argues that the formula s/√n is sufficient. Why is the colleague wrong?
CORRECT: C

CORRECT: C, The formula s/√n is derived from the Central Limit Theorem and describes the standard error of the sample mean. That derivation does not extend to the sample median. No analytical closed-form expression exists for the standard error of the sample median under general conditions. This is precisely the situation bootstrap resampling was designed for: estimating the standard error of any statistic when no formula is available.

Why not A? Sample size is not the issue here. The formula s/√n would apply to the sample mean regardless of whether n is 80 or 800. The problem is not the size of the sample. It is the identity of the statistic. The median is structurally different from the mean, and no version of s/√n, adjusted or otherwise, correctly estimates its standard error.

Why not B? While it is true that s/√n performs best under normality, the deeper problem is more fundamental. Even if normality could be established, s/√n still would not estimate the standard error of the median. It estimates the standard error of the mean. Normality is a secondary concern. The formula's limited scope is the primary issue, and bootstrap resolves it by making no distributional assumption at all.

---

Q 3 of 6 — APPLY
Yuki Tanaka is a research analyst in Tokyo. She has 120 months of hedge fund return data and uses bootstrap resampling with 500 resamples to estimate the standard error of the sample mean. The mean of all 500 resample means is 0.0312, and the sum of squared deviations of the individual resample means from 0.0312 is 1.195. What is the bootstrap standard error of the sample mean?
CORRECT: A

CORRECT: A, The bootstrap standard error formula is √[(1/(B−1)) × Σ(θ̂_b − θ̄)²]. Here B = 500, so B−1 = 499. Dividing: 1.195 ÷ 499 = 0.002394. Taking the square root: √0.002394 = 0.04893, which rounds to 0.0489. The key step is using B−1 = 499 in the denominator, not B = 500, matching the degrees-of-freedom correction used in sample variance.

Why not B? Option B (0.0488) comes from dividing by B = 500 instead of B−1 = 499: 1.195 ÷ 500 = 0.002390, √0.002390 = 0.04889, which rounds to 0.0489 by standard rounding but produces 0.0488 when truncated. The denominator is wrong. Bootstrap standard error uses B−1, not B, for the same reason sample variance uses n−1: the resamples are treated as a sample of estimates, not a complete population, and the degrees-of-freedom correction applies.

Why not C? A result of 0.0100 would come from using the original sample size n = 120 in the denominator instead of B−1 = 499: 1.195 ÷ 120 = 0.009958, √0.009958 ≈ 0.0998, which does not match either, but any attempt to substitute n for B produces a result far from the correct answer. The original sample size determines how large each resample is, not the denominator in the standard error formula. The formula's denominator is always B−1, where B is the number of resamples.

---

Q 4 of 6 — APPLY+
Gabriela Ferreira is a risk manager at a pension fund in Lisbon. She has 90 quarterly returns for a real estate investment trust and wants to estimate both the standard error of the sample mean and the standard error of the interquartile range. She runs two analyses: Analysis 1 uses the standard analytical formula. Analysis 2 uses bootstrap with 1,000 resamples. Which of the following correctly describes the appropriate use of each analysis?
CORRECT: B

CORRECT: B, The analytical formula s/√n applies specifically to the sample mean, where the Central Limit Theorem provides the derivation. The interquartile range is an order statistic with no standard closed-form standard error formula under general conditions. Bootstrap is the correct tool here because it estimates the sampling distribution of any statistic by treating the original sample as the population and drawing resamples with replacement. Using bootstrap for the mean would also work, but the analytical formula is equally valid and simpler when its conditions are met.

Why not A? Both statistics being summaries of the same dataset is irrelevant to whether the analytical formula applies. The formula s/√n is derived for the mean specifically. Applying it to the interquartile range has no theoretical basis. The result would be a number, but not a statistically valid standard error for that statistic. The scope of a formula is determined by its derivation, not by what data it is applied to.

Why not C? Bootstrap is not universally more accurate than the analytical formula. When an analytical formula exists and its assumptions are met, it is exact, not approximate. Bootstrap is an approximation that converges to the true sampling distribution as B increases. For the sample mean of a large dataset, the analytical formula is typically at least as accurate as bootstrap and requires far less computation. The reason to use bootstrap is the absence of an analytical formula, not a belief that simulation always outperforms theory.

---

Q 5 of 6 — ANALYZE
An analyst compares bootstrap and jackknife resampling on the same dataset of 75 observations. She runs bootstrap twice and gets standard error estimates of 0.0412 and 0.0419. She runs jackknife twice and gets estimates of 0.0415 and 0.0415. Which of the following best explains this pattern?
CORRECT: C

CORRECT: C, Bootstrap draws resamples randomly, with replacement, each time it runs. Because the selection is random, two runs on the same data will almost always produce slightly different sets of resample statistics and therefore slightly different standard error estimates. Jackknife eliminates all randomness: it leaves out observation 1, computes the statistic, restores it, leaves out observation 2, and so on, the same n resamples every time. For a sample of 75, jackknife always produces exactly 75 resamples with exactly the same composition, giving identical output every run.

Why not A? Effective sample size is not the reason for bootstrap's variation across runs. Each bootstrap resample is the same size as the original sample (here 75 observations), so there is no reduction in effective sample size. The source of variation is randomness in which observations are selected, not any reduction in the amount of data used per resample.

Why not B? Jackknife's consistency is not due to having more resamples than bootstrap. For this dataset, jackknife always produces exactly n = 75 resamples. A bootstrap run might use B = 1,000 resamples, far more. The consistency of jackknife comes from its fully deterministic procedure, not from averaging over a large number of repetitions. More bootstrap repetitions reduce run-to-run variation but never reduce it to zero. Jackknife achieves perfect consistency through systematic exhaustion of all single-exclusion possibilities.

---

Q 6 of 6 — TRAP
Diego Castillo is an analyst in Buenos Aires. He runs a bootstrap procedure on a fund's return data, drawing 300 resamples with replacement. After recording the mean of each resample, he calculates that the sum of squared deviations of the 300 resample means from their grand mean is 1.272. He divides 1.272 by 300 and takes the square root, arriving at 0.0651. A second analyst reviews his work and says the answer should be 0.0652. Which answer is correct, and what error did Diego make?
CORRECT: B

CORRECT: B, The bootstrap standard error formula divides the sum of squared deviations by B−1, not by B. Here B = 300, so the correct denominator is 299. Correct calculation: 1.272 ÷ 299 = 0.004254, √0.004254 = 0.0652. Diego divided by B = 300 instead: 1.272 ÷ 300 = 0.004240, √0.004240 = 0.0651. The difference is small but the denominator is wrong. The B−1 correction is the same degrees-of-freedom adjustment used in sample variance (n−1 instead of n): the resamples are a sample of estimates, not the full population of all conceivable resamples.

Why not A? Diego's error is real, not arithmetic. Dividing by B treats the 300 resample means as if they were a complete population, which would call for a population variance formula. But the 300 resample means are themselves a sample of the possible bootstrap outcomes. They are not the full population of all conceivable resamples. The degrees-of-freedom correction requires B−1 = 299, and the second analyst is right to flag this.

Why not C? The original sample size plays no role in the bootstrap standard error formula's denominator. The formula uses B−1, where B is the number of resamples drawn by the analyst. The original sample size determines how large each resample is (each resample matches the original sample in size), but it does not appear in the variance computation across resample statistics. Substituting the original sample size for B or B−1 confuses two entirely separate quantities in the procedure and would produce an incorrect result.

---

Glossary
Sampling distribution
The probability distribution of a statistic (like the sample mean) if you repeatedly drew samples of the same size from a population. If you flip a coin 100 times, record the number of heads, and repeat that experiment thousands of times, the distribution of all those head-counts is a sampling distribution.
Bootstrap resampling
A method where you draw many new samples from your original sample, with replacement, meaning the same data point can appear more than once in a single resample. Think of it as shuffling a deck of cards, drawing one card, writing it down, putting it back, shuffling again, and repeating thousands of times. Each round of draws becomes a new resample.
Jackknife resampling
A method where you create resamples by leaving out one observation at a time from the original sample, without replacement. For a dataset of 100 points, you produce exactly 100 resamples, one for each way of removing a single point. Because the procedure is systematic and deterministic, jackknife always produces identical results when applied to the same data.
Standard error
The standard deviation of a statistic (like the sample mean) across many repeated samples. It measures how much a statistic varies from sample to sample. If you draw 1,000 different samples from a population and compute the mean of each, the standard deviation of those 1,000 means is the standard error of the mean.
Model-free resampling
A resampling method that does not assume the population follows any particular probability distribution (normal, uniform, exponential, etc.). The method lets the actual data distribution speak for itself. Bootstrap is model-free. A formula like a z-statistic that assumes normality is not.
Jackknife
See Jackknife resampling. Jackknife systematically removes one observation at a time, producing exactly n resamples for a sample of size n, with no randomness and identical output across runs.
Confidence interval
A range of values, typically built around a sample statistic, that is expected to contain the true population parameter with a specified probability (usually 95%). If you surveyed 100 people and found the average age was 35, a 95% confidence interval might be 33 to 37, meaning you are 95% confident the true population average falls within that range. Bootstrap sampling distributions can be used to construct confidence intervals without assuming a normal distribution.

LO 3 Done ✓

You have completed all learning objectives for this module.

🔒 PRO Feature
How analysts use this at work
Real-world applications and interview questions from top firms.
Quantitative Methods · Estimation and Inference · Job Ready

Job Ready: Estimation and Inference

Real-world applications and interview preparation for this module.

This section requires a MasterCFA Pro subscription.