Data collection, bias, interpretation - Mathematics: Applications & Interpretation IB Study Notes

Overview
# Data Collection, Bias, and Interpretation This lesson examines systematic approaches to collecting and analysing data, emphasizing the identification and mitigation of sampling bias, measurement error, and confounding variables. Students learn to distinguish between population and sample, evaluate sampling methods (random, stratified, quota, convenience), and critically assess how data collection methodology affects statistical validity. The content is essential for Paper 1 and Paper 2 examinations, where students must demonstrate ability to critique statistical studies, identify potential sources of bias in given scenarios, and justify appropriate data collection strategies—skills frequently assessed through context-based questions worth 4-6 marks that require both technical understanding and written justification.
Core Concepts & Theory
Data Collection Methods form the foundation of statistical analysis. Primary data is information collected firsthand for a specific purpose (surveys, experiments), while secondary data is pre-existing information from other sources (databases, published research).
Sampling Techniques determine how we select subjects:
- Random sampling: Every member has equal selection probability
- Systematic sampling: Selecting every nth member from an ordered list
- Stratified sampling: Dividing population into groups (strata) and sampling proportionally
- Quota sampling: Non-random selection meeting predetermined quotas
- Convenience sampling: Selecting easily accessible subjects
Bias represents systematic error causing unrepresentative results. Selection bias occurs when sampling methods favour certain groups. Response bias happens when question wording influences answers. Non-response bias emerges when certain groups don't participate.
Key Formulas:
- Sample size for proportion: n = (z²pq)/E² where z is z-score, p is estimated proportion, q = 1-p, E is margin of error
- Sampling error: E = z√(pq/n)
Variables are classified as:
- Quantitative: Numerical (discrete or continuous)
- Qualitative: Categorical (nominal or ordinal)
Memory Aid - BRASS: Bias ruins accuracy, Random reduces bias, Adjust for non-response, Stratify for diversity, Sample size matters!
Data interpretation requires examining context, identifying patterns, and recognizing limitations. Always question whether conclusions are justified by the data presented.
Detailed Explanation with Real-World Examples
Understanding data collection is like being a detective gathering evidence—how you collect determines what you can conclude.
Real-World Application: Political Polling Imagine predicting election results. Using convenience sampling (polling only people at a shopping mall) introduces selection bias—you miss working professionals, rural residents, and homebound elderly. A stratified sample dividing by age, region, and income provides more accurate predictions. The 1936 Literary Digest poll famously predicted Roosevelt's defeat using telephone directories (when only wealthy owned phones)—a catastrophic sampling bias!
Medical Research Example Testing a new medication requires careful data collection. Random sampling prevents confounding variables. If researchers only recruit from urban hospitals, results may not apply to rural populations. Response bias occurs if patients exaggerate symptom improvement to please doctors. Double-blind studies eliminate this bias—neither patient nor administrator knows who receives the actual drug.
Social Media Analytics Companies analyzing customer satisfaction from Twitter comments face self-selection bias—only extreme opinions (very happy or angry customers) typically post. This secondary data misrepresents the silent majority.
Analogy: The Cookie Jar Imagine estimating chocolate chip distribution in a jar. Taking only cookies from the top (convenience sampling) might miss how they settle. Random selection throughout gives better estimates. If different layers represent demographics, stratified sampling ensures you taste each layer proportionally.
Census vs. Sample A census surveys everyone (expensive, time-consuming) while sampling studies a subset. The UK Census occurs every 10 years—comprehensive but costly. Opinion polls use samples of ~1,000-2,000 for accuracy within ±3%.
Worked Examples & Step-by-Step Solutions
**Example 1: Identifying Bias** (6 marks) *A researcher surveys students about homework time by standing outside the library at 8 PM. Identify potential biases and suggest improvements.* **Solution:** **Selection bias** [2 marks]: Only students at the library are surveyed—excluding those studying a...
Unlock 3 More Sections
Sign up free to access the complete notes, key concepts, and exam tips for this topic.
No credit card required · Free forever
Key Concepts
- Data: Facts or pieces of information, like numbers, words, or observations, that you collect to learn something.
- Population: The entire group of people, objects, or events that you want to study or learn about.
- Sample: A smaller group chosen from the population that you actually collect data from, hoping it represents the whole population.
- Bias: A systematic error or tendency in data collection or interpretation that makes the results unfairly lean in a certain direction, not truly reflecting the population.
- +6 more (sign up to view)
Exam Tips
- →Always identify the **population** and **sample** clearly in any problem involving data collection.
- →When asked to identify bias, don't just say 'it's biased'; explain *what kind* of bias it is (e.g., sampling bias, question bias) and *why* it makes the data unfair.
- +3 more tips (sign up)
More Mathematics: Applications & Interpretation Notes