MCS Guide To The Datasets 022014 PDF
MCS Guide To The Datasets 022014 PDF
MCS Guide To The Datasets 022014 PDF
February 2014
The Centre for Longitudinal Studies (CLS) is an ESRC Resource Centre based at the
Institution of Education. It provides support and facilities for those using the three
internationally-renowned birth cohort studies: the National Child Development Study (1958),
the 1970 British Cohort Study and the Millennium Cohort Study (2000). CLS conducts
research using the birth cohort study data, with a special interest in family life and parenting,
family economics, youth life course transitions and basic skills. The views expressed in this
work are those of the author(s) (amend as necessary) and do not necessarily reflect the
views of the Economic and Social Research Council. All errors and omissions remain those
of the author(s).
ii
Table of Contents
Acknowledgements ...................................................................................1
Overview of document ..............................................................................3
PART ONE: BACKGROUND ......................................................................4
PART TWO: SAMPLING .............................................................................5
1. Sample Design of MCS1 ................................................................................... 5
2. The achieved sample at MCS1 .......................................................................... 7
3. Sampling at MCS2 ........................................................................................... 10
4. The MCS2 Achieved Sample ........................................................................... 10
5. Sampling at MCS3 ........................................................................................... 12
6. The MCS3 Achieved Sample ........................................................................... 12
7. The MCS4 Sample .......................................................................................... 13
8. The MCS5 Sample .......................................................................................... 14
PART THREE: SURVEY DEVELOPMENT ...............................................25
1. Development and Piloting of MCS1 ................................................................. 25
2. Development and Piloting of MCS2 ................................................................. 26
3. Development and Piloting of MCS3 ................................................................. 26
4. Development and Piloting of MCS4 ................................................................. 27
5. Development and Piloting of MCS5 ................................................................. 28
PART FOUR: SURVEY CONTENT ...........................................................30
PART FIVE: FIELDWORK ........................................................................38
1. Fieldwork for MCS1 ......................................................................................... 38
2. Fieldwork for MCS2 ......................................................................................... 42
3. Fieldwork for MCS3 ......................................................................................... 43
4. Fieldwork for MCS4 ......................................................................................... 45
5. Fieldwork for MCS5 ......................................................................................... 47
PART SIX - THE DATA .............................................................................50
1. Structure of the Datasets ................................................................................. 50
2. How to Link the Datasets ................................................................................. 52
3. The Household Grid ......................................................................................... 54
4. Cohort Member Cognitive Assessments .......................................................... 62
5. BAS Verbal Similarities (MCS5) ....................................................................... 65
6. Cohort Member Behavioural Development ...................................................... 77
iii
7. Cohort Member Physical Measurement ........................................................... 80
8. Income data ..................................................................................................... 82
9. Feed Forward Data .......................................................................................... 92
10. Data Cleaning ................................................................................................ 92
11. Coding and Editing ........................................................................................ 93
12. Geographically Linked Data including IMD & Rural Urban Indicators ............. 96
13. Educational datasets...................................................................................... 97
Part 7. ETHICAL CONSIDERATIONS ......................................................98
1. MREC for MCS1 .............................................................................................. 98
2. MREC for MCS2 .............................................................................................. 98
3. MREC for MCS3 .............................................................................................. 98
4. MREC for MCS4 .............................................................................................. 98
5. MREC for MCS5 .............................................................................................. 99
6. Codes of Practice ............................................................................................ 99
7. Consents ......................................................................................................... 99
List of Tables
Table 1: MCS1 Sample Size – Clusters, Children Families, by Country .................... 7
Table 2: Response Rates by Stratum and Country for MCS1 .................................... 9
Table 3: MCS2 Overall response............................................................................. 11
Table 4: MCS2 Overall response for families that were productive in MCS1 ........... 11
Table 5: Overall Response for the New Families ..................................................... 12
Table 6: MCS3 Overall response............................................................................. 13
Table 7: MCS4 Overall response............................................................................. 14
Table 8: MCS5 Overall response............................................................................. 15
Table 9: MCS Cases by stratum and country .......................................................... 16
Table 10: MCS sample design weights by stratum and country (weight1) ............... 16
Table 11: MCS sample design weights by stratum for the UK (weight2) .................. 16
Table 12: Response rates in all MCS sweeps. ........................................................ 19
Table 13: Response rates by country in MCS5........................................................ 20
Table 14: Response rates by stratum in MCS5. ...................................................... 20
Table 15: monotone vs. non-monotone response in MCS5. .................................... 21
Table 16: The logit response model. ....................................................................... 22
Table 17: EOVWT1, Wave5 overall weight for single country analysis. ................... 23
iv
Table 18: EOVWT2, S5 overall weight for whole of the UK analysis. ....................... 23
Table 19: MCS1 – Summary of MCS1 Survey Elements. ........................................ 30
Table 20: MCS2 – Summary of MCS2 Survey Elements ......................................... 31
Table 21: MCS3 – Summary of MCS3 Survey Elements ......................................... 32
Table 22: MCS4 – Summary of MCS4 Survey Elements. ........................................ 34
Table 23: FIFTH – Summary of FIFTH Survey Elements. ....................................... 36
Table 24: Fieldwork timetable for MCS1. ................................................................. 39
Table 25: MCS1 Data Collection Errors ................................................................... 41
Table 26: Fieldwork timetable for MCS2 .................................................................. 42
Table 27: Fieldwork timetable for MCS3 – Main Survey .......................................... 44
Table 28: Fieldwork timetable for MCS3 – Teacher Survey in Wales, Scotland and
Northern Ireland ...................................................................................................... 45
Table 29: Fieldwork timetable for MCS4 – Main Survey .......................................... 46
Table 30: Fieldwork timetable for MCS4 – Teacher Survey ..................................... 47
Table 31: Fieldwork timetable for MCS5 – Main Survey .......................................... 48
Table 32: Fieldwork timetable for MCS5 – Teacher Survey (England and Wales only)48
Table 33: MCS1 Parent interview response by sex of respondent and relationship to
cohort member ........................................................................................................ 57
Table 34: MCS2 Parent interview response by sex of respondent and relationship to
cohort member ........................................................................................................ 58
Table 35: MCS3 Parent interview response by sex of respondent and relationship to
cohort member ........................................................................................................ 59
Table 36: MCS4 Parent interview response by sex of Main respondent and
relationship to cohort member ................................................................................. 60
Table 37: MCS5 Parent interview response by sex of Main respondent and
relationship to cohort member ................................................................................. 61
Table 38: Income data collection across the cohort studies
Table 39: Completeness of MCS banded household net income data (number of
families)................................................................................................................... 84
Table 40: OECD household equivalence scales ...................................................... 86
Table 41: Conventions for Suffixes in Variable Names ............................................ 90
Table 42: Abbreviations used in Variable Labels ..................................................... 91
Table 43: Consents at each sweep of MCS ............................................................. 99
v
Acknowledgements
We are grateful for the entirely voluntary co-operation of the children who form the
Millennium Birth Cohort and their mothers, fathers and other family members.
We wish to acknowledge the initiation and funding of the survey by the Economic
and Social Research Council, and the consortium of Government Departments
The work could not have been accomplished without the involvement of a large
number of advisers drawn from academe, policy-makers and funders, who we
consulted throughout the design of the surveys.
The members of the Centre for Longitudinal Studies (CLS) tracing team, IT team,
database, survey and research teams.
The MCS internal team who contributed to the deposit and documentation of MCS1,
2, 3, 4 and 5, as listed below:
1
Tina Roberts Data Officer
We are also deeply grateful to the late Prof. Neville Butler who made an
invaluable contribution to the Millennium Cohort Study until his final year (2007).
2
Overview of document
This document provides an overview of the MCS data from the first to the fifth
sweep. The document is laid out as follows:
2. Part Two explains the sample, the achieved samples at different sweeps and
related issues of weighting.
3
PART ONE: BACKGROUND
1. Introduction
A renewed interest in child wellbeing in the late 1990s in the UK provided the context
for the development of a new and distinctive child cohort study, after a gap of 30
years (since the 1970 British Cohort Study, the 1958 National Child Development
Study and the 1946 National Survey of Health and Development). The Millennium
Cohort Study (MCS) was developed as a multidisciplinary survey which could
capture the influence of early family context on child development and outcomes
throughout childhood, into adolescence and subsequently through adulthood.
MCS’s field of enquiry covers such diverse topics as parenting; childcare; school
choice; child behaviour and cognitive development; child and parental health;
parents’ employment and education; income and poverty; housing, neighbourhood
and residential mobility; and social capital and ethnicity.
To date there have been 5 surveys: the first (MCS1) when the children were around
9 months old, the second (MCS2) when the children were 3 years of age, the third
(MCS3) when they were 5, the fourth (MCS4) when they were 7 and mostly recently
the fifth (MCS5) returned when the children were 11 years old.
4
PART TWO: SAMPLING
For a more comprehensive discussion of the sampling procedure used, please refer
to the MCS Technical Report on Sampling (4th Edition) (Plewis 2007).
Unlike its predecessor studies which followed the same survey design: (a systematic
random sample of all children born in a particular week) the MCS had a new sample
design. Firstly, the sample (fully discussed in Plewis, 2007) is drawn from a
population of children born between 1 September 2000 and 31 August 2001 (for
England and Wales), and between 24 November 2000 and 11 January 2002 (for
Scotland and Northern Ireland) who were living in the UK at nine months of age and
whose families were eligible to receive Child Benefit at that age. Sampling births
across a 16 month period rather than a particular week not only makes it easier for
fieldwork agencies by spreading interviews across a longer, less intense period but
also means the MCS is well-placed to identify any season-of-birth effects, which
have been shown to be important in other studies in a range of outcomes including
academic achievement.
Secondly, the MCS is the first British birth cohort to include all four countries of the
UK, reflecting increasing moves towards devolution, and allowing for the first time,
researchers to not only look at relationships within each country but also make
comparisons between the countries.
Thirdly, the MCS oversampled children from deprived backgrounds, so that the
effects of disadvantage on children’s outcomes could be better addressed. Fourthly,
and finally, the MCS set out to reflect the increasing diversity of the UK, and clear
evidence of differential health, educational and social outcomes across ethnic
groups, by oversampling from areas of relatively high ethnic minority concentration.
To take account of this design the population was stratified. In England, the
population was stratified into three strata: The first an 'ethnic minority' stratum where
the proportion of ethnic minorities in that ward in the 1991 Census was at least 30
per cent. The second, a 'disadvantaged' stratum is comprised of children living in
wards, other than those falling into the 'ethnic minority' stratum, which fell into the
poorest 25 per cent of wards using the Child Poverty Index for England and Wales.1
And finally, an 'advantaged' stratum which captured children living in wards other
than those above.
For Wales, Scotland and Northern Ireland, due to the low percentages of ethnic
minority groups (around 1% of the population) (Plewis, 2007) there were only two
strata: a 'disadvantaged' stratum: children living in wards (known as Electoral
1
For more information on the CPI and Indices of Deprivation in general see:
https://www.gov.uk/government/publications/english-indices-of-deprivation-2010
5
Divisions in Wales) that fell into the poorest 25 per cent of wards using the Child
Poverty Index. And an 'advantaged' stratum: children living in other wards in these
countries.
It is important to bear in mind that both the ethnic minority indicator and the Child
Poverty Index used for stratification purposes are area-level measures. That means
the design will be good at identifying those who are disadvantaged or from an ethnic
minority background groups for those people who live in areas with others from a
similar background but is less likely to find people who are equally part of these
groups but do not live in areas with similar people . Indeed, focusing on families in
poverty Plewis (2007) found that In England in 1998, about 37per cent of
disadvantaged families with a child under 16 were living in advantaged wards, 54 per
cent were in disadvantaged wards and 10 per cent in ethnic minority ward (Plewis,
2007).
The MCS sample was randomly selected within each stratum in each country
producing a disproportionately stratified cluster sample. This means that the sample
is not self-weighting and so weighted estimates of means, variances etc. are needed
(Plewis, 2007).
Once the sample wards were selected, a list of all children turning nine months old
during the 16 month survey window and living in those wards was generated from
the Child Benefit register provided by the Department of Social Security (DSS),
subsequently the Department for Work and Pensions (DWP). At that time, Child
Benefit was a universal provision, payable (usually to the mother) from the child's
birth.2 The DWP wrote to all eligible families asking the CB recipient to opt out if they
did not want to be included in the survey. An opt out procedure tends to be more
inclusive of marginal and low literacy respondents than an opt-in procedure and also
results in higher response rates. The DWP withdrew sensitive cases from the issued
sample. These included families where children had died or had been taken into
local authority care by that point or where there was an investigation into benefit
2
Child Benefit claims cover virtually all of the child population except those ineligible due to
recent or temporary immigrant status.
6
fraud within the family.3 Also if families had already taken part in the DWP, Families
and Children Survey (FACS) they were excluded from the sample.4
It was recognized that the Child Benefit records would not reveal all families who had
moved into the sample wards as the child approached 9 months of age so for this
reason Health Visitors were approached to find these families. It was thought that as
local community health professionals health visitors would be aware of families
transferring into areas. They were asked to see if families meeting the eligibility
criteria who had recently moved into survey wards were willing to be recruited.
Health Visitors reported 220 cohort families moving into the selected areas with
children over 6 months of age, however only 56 had not been found by DWP.56
The MCS1 survey reached 18,552 families, which, after allowance for 246 sets of
twins and 10 sets of triplets, amounted to 18,818 cohort children. Six families have 2
singletons in the sample. The table below shows how these respondents are
distributed over the 4 countries of the UK. Further details by stratum appear in the
Technical Report on Sampling (4th Edition) (Plewis 2007).
Achieved Responses **
Number of sample Target sample
'wards' * as boosted Families
Children
interviewed
3
This represents less than 3 per cent of cases (Hansen, 2012).
4
This affected only 40 cases.
5
There were several problems which may explain the rather disappointing result of this
exercise. First, helping with the survey was not part of the Health Visitors’ already demanding
normal duties. Second, Health Visitors’ caseloads do not neatly coincide with electoral wards.
Third, there is no central list of Health Visitors for easy contact.
6
DWP also discovered 1,389 new families in England who were living in the sample wards at sweep 1,
but their addresses reached DWP too late to be included in the first survey so they were added to the
sample at sweep 2.
7
Response Rates
It is arguable that the eligible population should also include families who do not
claim Child Benefit; but we make the simplifying assumption that numbers of such
families who are permanently resident in the UK are negligible. The Technical Report
on Sampling (4th Edition) (Plewis 2007) makes two alternative assumptions about
how many undetected in-movers there are. The estimates quoted in Table 2 below
assume that there is an undetected in-comer for every detected out-mover, on
average, in each stratum.
The alternative estimate sets undetected in-moves to zero, which raises all overall
response rates (except Northern Ireland) above the target or assumed response rate
set in the design of the survey and shown in the first column. This table shows that,
when undetected in-migration is counted as a leakage, the overall response rate is
68 per cent for the (unweighted) UK sample, modestly below the 71 per cent
expected. It is below target in every stratum except the advantaged wards of Wales,
but only markedly so in Northern Ireland, with overall response rates in the combined
strata of 63 per cent where 71 per cent had been somewhat optimistically set, given
the lack of a tradition of such surveys in that country. Northern Ireland is also the
only country where inability to assign Child Benefit claimants to a ward was a
significant problem. Survey work in the ethnic areas of England was also something
of an unknown quantity. A cautious target of 65 per cent was missed by 3
percentage points.
8
Table 2: Response Rates by Stratum and Country for MCS1
Expected In-scope
Achieved Overall
Overall Response Rate
Response Rate
Response Rate Fieldwork
Out of the cases issued to field some have been deemed ineligible because they are
known or thought to have moved out of the survey area before the child reached 9
months of age. Of the remaining eligible or ‘in-scope’ sample, the response in
fieldwork averaged 82 per cent giving at least one interview. It varied by stratum as
expected, but more so. The ethnic wards as anticipated had least 76 per cent, and
the advantaged areas of Wales the highest 89 per cent, with both strata in Northern
Ireland being below the stratum average for Great Britain.
9
3. Sampling at MCS2
The survey attempted to follow all the 18,553 families who took part in MCS1 where
the child was still alive and living in the UK. It also attempted to make contact with
another 1,389 ‘New Families’ in England who appeared to have been living in
sample wards at the time of MCS1, but whose addresses reached DWP records too
late to be included in the first survey.
There are two components to the MCS2 issued sample, families that were
productive in MCS1 and the so-called new families. There were 18,552 productive
families in the first survey of the Millennium Cohort Study. The new families were
families that although eligible, did not participate in MCS1. These were identified
through DWP, of whom 1,389 families were eligible to be issued for MCS2 fieldwork.
From the paragraph above, the issued sample should have been 19,941 i.e. 18,552
+1,389, but 71 families from the MCS1 productive families were not issued to the
field for various reasons. Their outcomes were known and recorded before the start
of the fieldwork. Therefore, the MCS2 issued sample was 19,870; 18481 were the
productive families in MCS1 and the 1,389 new families.
All response frequencies in this report are unweighted. The outcome codes were
derived as:
Productive All families with some data from one of 6 data collection instruments
other than what was carried forward. The 6 data collection
instruments were: Main Interview, Partner Interview, Proxy Partner
Interview, BAS, Bracken, Height and Weight.
Uncertain Families that were away temporarily and those whose eligibility was
Eligibility uncertain, including untraced movers.
10
All MCS2 Families Response
There were 19,941 families originally considered eligible for MCS2 survey, 15,590 of
these were productive in the survey which is 78 per cent of all MCS2 families. There
were 15,808 cohort members in the 15,590 productive families.
Table 4 below shows that a slightly higher proportion of those that were productive in
MCS1 (80 per cent) took part in MCS2 compared to the overall proportion in Table 3,
which also includes New Families.
Table 4: MCS2 Overall response for families that were productive in MCS1
11
The New Families Response
Ineligible ** 88 6.3
Other 25 1.8
5. Sampling at MCS3
The sample issued for MCS3 comprised all those who had responded to the survey
at least once, i.e. to MCS1 or the 631 additional cases who had responded to MCS2
in the New Families, less those known to have become ineligible through the death
or emigration of the cohort child, and also less those deemed to have made a
permanent refusal (also excluding the one case in the original MCS1 total of 18,533
subsequently discovered to have been invalid). Thus nearly all non-respondents to
the second survey who had been interviewed in the first survey were given the
opportunity to rejoin the survey at age 5. The non-respondents to the New Families
sample were not reissued.
There were 19,244 families potentially eligible for inclusion in the issued sample.
These were 18,552 families who were productive at MCS1 and 692 ‘New Families’
who were productive at MCS2. However, 718 families were not issued to the field
due to ineligibility (death or emigration), permanent refusal and sensitive family
circumstances. Their outcomes were known and recorded before the start of the
fieldwork. The families not issued due to sensitive family circumstances are recorded
as ‘unproductive other’. Two families who were not productive at either MCS1 or
MCS2 were issued in error.
Therefore, the MCS3 issued sample was 18,528 (19,244 – (718 + 2)).
12
This section provides MCS3 response for the 19,244 families, i.e. including the 718
families not issued and excluding the two families issued in error.
All response frequencies here are unweighted. The outcome codes in this report
were again derived as for MCS2. There were 19,244 families potentially eligible for
the MCS3 survey, 15,246 of these were productive in the survey, which is 79.2 per
cent of all MCS3 families. There were 15,459 cohort children in the 15,246
productive families.
There were 19,244 families potentially eligible for inclusion in the issued sample.
However, 2213 families were not issued to the field due to ineligibility due to death or
emigration (n=362), permanent refusal (n=1,705), permanent untraced (n=136) and
sensitive family circumstances (n=10). Their outcomes were known and recorded
before the start of the fieldwork. The families not issued due to sensitive family
circumstances are recorded as ‘unproductive other’.
This section provides MCS4 response for the 19,244 families, i.e. including the 2213
families not issued. All response frequencies in this report are unweighted.
13
Outcome codes are:
Productive All families with some data from one of five data
collections instruments other than what was carried
forward. The 5 data collection instruments were: Main
Interview, Partner Interview, Proxy Partner Interview,
Cohort Child Cognitive Assessments and Cohort Child
Physical Measurements.
Uncertain Eligibility Families that were away temporarily and those whose
eligibility was uncertain, including untraced movers.
There were 19,244 families potentially eligible for inclusion in the issued sample at
MCS5. However, 2,851 families were not issued to the field due to ineligibility due to
death or emigration (n=545), permanent refusal (n=2,215), permanent untraceability
(n=86) and sensitive family circumstances (n=5). Their outcomes were known and
recorded before the start of the fieldwork. The families not issued due to sensitive
family circumstances are recorded as ‘unproductive other’.
14
Therefore, the MCS5 issued sample was 16,393 (19,244 less 2,851).The response
for all the 19,244 families, i.e. including the 2,851 families not issued are shown in
Table 8. Response frequencies are unweighted.
In total, 13,287 families were productive in the survey, which is 69 per cent of all
MCS families and 81 per cent of all families issued. There were 13,469 cohort
members in the 13,287 productive families.
Other 11 0.01
Weighting
As discussed above the sample of births selected for the first survey of the MCS was
clustered, geographically, and disproportionately stratified to over-represent areas
with high proportions of ethnic minorities in England, residents of areas of high child
poverty and residents of the three smaller countries of the UK respectively. The
distribution of the cases in the dataset across strata for each country is given in
Table 9 below.
15
Table 9: MCS Cases by stratum and country
Strata N % n % N % n % N %
Advantaged 4828 39.49 832 30.14 1145 49.04 723 37.69 7528 39.12
Disadvantaged 4806 39.31 1928 69.86 1191 50.96 1200 62.31 9125 47.42
Ethnic 2591 21.19 n/a n/a n/a n/a n/a n/a 2591 13.46
Total 12225 100.00 2760 100.00 2336 100.00 1923 100.00 19244 100.00
The sample design weights or probability weights can be used to correct for MCS
cases having unequal probabilities of selection that result from the stratified cluster
sample design. The sample weights to be used depend on whether the analysis is
confined to data relating to a single country, see Table 10 for country-specific
weights, or whether the analysis covers all countries of the UK, see Table 11 for UK
weights.
Table 10: MCS sample design weights by stratum and country (weight1)
Ethnic 0.24
Table 11: MCS sample design weights by stratum for the UK (weight2)
Ethnic 0.37
Further details are included in The Millennium Cohort Study: Technical Report on
Sampling, 4th Edition. Plewis, I. (Ed.) July 2007.
16
Two variables have been included in the dataset to facilitate such weighting by
providing the sample weights attached to each case. These are:
weight1: This variable should be used when your analysis is within one
country only.
weight2: This variable should be used when your analysis covers the whole
of the UK.
One way of adjusting for possible biases generated by systematic unit non-response
is to use non-response weights. Unit non-response in MCS1 and non-response from
MCS1 to MCS2 was studied by Plewis (2007). The correlates of non-response for
MCS1 and MCS2 were studied and used to produce non-response weights that can
be used to adjust for non-response. For MCS2, there are three different types of
weights to consider: 1) the sample design weights; 2) the non-response weights at
wave 1 which when multiplied by the sample weights produce the overall weights at
wave 1 (see Table 11.1 in The Millennium Cohort Study: Technical Report on
Sampling, 4th Edition. Plewis, I. (Ed.) July 2007); and 3) the non-response weights at
sweep 2 which when multiplied by the overall weights at sweep 1 produce the overall
weights at sweep 2 (see Table 3 in Plewis (2007) for the mean and standard
deviation of these weights by stratum for whole UK analyses as well as further
technical details on their calculation). Note that the sample at sweep 2 was
supplemented by New Families who were eligible at MCS1, but excluded because
their addresses held by the Child Benefit Office were not up to date. For these new
families, their non-response weight at sweep 2 is defined to be 1. There were 97
sweep 2 productive families that were not used to generate non-response weights
due to missing data on the variables used in the response model. These 97
productive families were given a non-response weight of 1.
All family level weights and response level variables are in a file called:
mcs longitudinal family level information. (The user needs to link this file to
other files.)
The relevant variable names and value labels are below, where s1, s2, s3 and s4
denote sweeps 1, 2, 3 and 4 respectively and ‘inc nr adjustment’ denotes including
non-response adjustment:
pttyp2 stratum within country fieldwork point number inc. superwards
weight1 mcs weight to use on single country analyses
weight2 mcs weight to use on whole uk analyses
aovwt1 s1: overall weight (inc nr adjustment) single country analysis
aovwt2 s1: overall weight (inc nr adjustment) whole uk analysis
bovwt1 s2: overall weight (inc nr adjustment) single country analysis
bovwt2 s2: overall weight (inc nr adjustment) whole uk analysis
17
Weighting (including non-response adjustment) for MCS3
Weighting methods to compensate for attrition are available for monotone patterns of
non-response. For a monotone pattern, a sequential weighting procedure is typically
used. The longitudinal weight at sweep 1 is defined as the sample (design) weight.
For each sweep thereafter, the longitudinal weight is the product of the longitudinal
weight at the previous sweep multiplied by a non-response weight for the current
sweep. Typically, at each sweep the non-response weight is the estimated inverse of
the probability of responding based on a logistic regression model. These logistic
models use data from previous sweeps to predict response at the current sweep.
However, for non-monotone patterns of non-response, some cases have missing
data for previous sweeps and therefore the standard approach cannot be easily
applied. For MCS, 1,444 unproductive families at MCS2 were recovered at MCS3,
thus yielding a non-monotonic pattern of non-response.
In order to calculate non-response weights for MCS3, multiple imputation was used
to impute the required missing data at sweep 2 for the logistic regression model for
the probability of responding. With the missing data ‘filled in’, the pattern of non-
response was monotone and then the standard sequential weighting procedure
could be used to estimate non-response weights. Note that imputation of missing
values was only done for variables found in earlier non-response analyses to be
related to non-response, not for all variables in the MCS2 with missing values.
Multiple imputation was used to impute missing values at sweep 2 due to unit non-
response for unproductive cases and item non-response for productive cases. For
example, for the 1,444 unproductive families at MCS2 which were recovered at
MCS3, missing housing tenure at MCS2 was imputed using their housing tenure at
MCS1 and MCS3 along with other predictor variables in the imputation model. We
expect that the imputation of missing values of housing tenure at MCS2 to be ‘good’
as the imputation model ‘loosely speaking’ involves ‘interpolation’ of the values at
MCS1 and MCS3. Further detail on the non-response predictor variables and
imputation models used will be provided in the Second Edition of the Technical
Report on Response.
At sweep 3 all families in the MCS ‘active’ sample, the 1,922 families had a non-
response adjusted weight at sweep 2 and therefore we didn’t have to deal with
missing weights at sweep 2. As a result of using multiple imputation, all 18,526
issued cases were used in the logistic modelling of response at sweep 3. Missing
values were imputed 10 times and a logistic model of responding at sweep 3 was
estimated 10 times, once for each imputed dataset. This yielded 10 estimated non-
response weights at sweep 3 and the weights issued for sweep 3 were the average
of the 10 weights. The overall weights, including non-response adjustment, for single
country analysis and whole UK analysis are:
covwt1 s3: overall weight (inc nr adjustment) single country analysis
covwt2 s3: overall weight (inc nr adjustment) whole uk analysis.
18
respond were small compared to the unequal selection probabilities built into the
sample design. The logistic modelling of sweep 3 non-response also found that
these differences in the probability to respond were small compared to the unequal
selection probabilities built into the sample design. It is, therefore, unlikely that any
weighting adjustment for wave 3 non-response would have a substantial effect on
most analyses.
As a result of using multiple imputation to deal with missing data, all issued cases at
sweep 4 were used in the logistic modelling of response at sweep 4. Missing values
were imputed 10 times and a logistic model of responding at sweep 4 was estimated
10 times, once for each imputed dataset. This yielded 10 estimated non-response
weights at sweep 4 and the weights issued for sweep 4 were the average of the 10
weights.
In Table 12, response and non-response rates are presented by category. The table
shows that the proportion of productive cases dropped over time from 96.4% in
MCS1 to 69% in MCS5. The proportions in all other categories rose as the
proportion of non-respondents grew.
Ineligible: includes child deaths, sensitive cases and temporary and permanent
emigrants.
19
MCS1 MCS2 MCS3 MCS4 MCS5
Categories
Freq. % Freq. % Freq. % Freq. % Freq. %
Non-Contact 0 0.0 930 4.8 546 2.8 123 0.6 438 2.3
Other unproductive 0 0.0 131 0.7 290 1.5 408 2.1 6 0.0
Total 19,244 100.0 19,244 100.0 19,244 100.0 19,244 100.0 19,244 100.0
Table 13 shows that response rates were very similar across all four countries with
the highest response rate being in England.
Table 14 shows that the response rates vary across ward types within country.
Advantaged households systematically have higher response rates than
disadvantaged ones while the ethnic stratum in England has a relatively high
response rate.
Adv: Advantaged ward. Dis: Disadvantaged ward. Ethn: Ethnic minority ward.
Table 15 shows that 54.3% of all respondents participated in all waves of MCS. In
contrast, 23.9% have interrupted response patterns. In other words, they participated
in a number of waves then dropped out before participating again in subsequent
waves. 21.9% of all respondents have monotone response patterns. In other words,
they participated in a number of waves before definitely dropping out.
20
Table 15: monotone vs. non-monotone response in MCS5.
The same procedure used for predicting non-response at wave 4 was again used at
sweep 5. Missing data for predictor variables due to non-monotone non-response or
item missingness were imputed using simple and multiple imputations. Sweep 5 non-
response predictors were mostly the same as at sweep 4. Multiple imputations were
carried out using the MI command in Stata 12.
As a result of the use of simple and multiple imputations, the sample used in the logit
response model consisted of 16393 observations (i.e. the issued sample in MCS5).
Weights were constructed for all respondents in MCS5. The dependent variable in
the logit model is binary (1 for response and 0 otherwise) and the predictors are: the
cohort member’s gender, mother’s age at first live birth, ethnicity, housing tenure,
accommodation type, national vocational qualification, breastfeeding, main
respondent’s work status, whether the household is a new family which joined the
survey in wave 2, and income item non-response. These variables came from all
four previous waves.
Simple imputations: ethnicity, accommodation type and NVQ were imputed using the
most recent available data from previous waves with simple replacement
imputations. The questions on accommodation type and NVQ were only asked if
accommodation or NVQ have changed since the last wave of data collection.
Multiple imputations: main respondent’s work status and housing tenure were
missing for 2744 observations. Breastfeeding was missing for the new families (617
observations). These three variables were imputed using 10 multiple imputations.
Different imputation procedures were used depending on the nature of the variable:
a logit procedure for work status and breastfeeding and a multinomial logit for
housing tenure. The explanatory variables for the imputation of work status and
housing tenure in wave 4 were the exact same variables from the previous three
sweeps. For the imputation of breastfeeding I used different variable related to social
class as explanatory variables, these are: ethnicity, NVQ, number of parents in
household, and type of accommodation.
It should be noted that some variables such as cohort member’s gender and whether
the household is a new family did not have any missing values and therefore did not
21
require any imputation. Income item non response was constructed as a binary
variable which takes the value of 1 if the respondent did not answer the income
question. Mother’s age at first live birth was missing for only 49 observations; these
were replaced by the average age of the non-missing cases.
Table 16 shows the odds ratios of the response logit model estimated using the 10
imputed datasets. The linear predicted values were generated from this model then
an inverse-logit transformation was carried out to transform the predicted values into
predicted probabilities. The non-response weights at sweep 5 were constructed as
the inverse of the predicted probabilities. Two overall weights were constructed by
multiplying the aforementioned non-response weights with the same weights from
sweep 4. These overall weights adjust for both sampling and attrition. The weights
are:
In tables 17 and 18, the means, minimums and maximums of the two weights are
presented by ward type and for the UK as a whole.
Table 17: EOVWT1, Wave5 overall weight for single country analysis.
23
Recommendations
24
PART THREE: SURVEY DEVELOPMENT
For a more comprehensive discussion of survey development, please refer to the
MCS1 Technical Report on Fieldwork (NatCen 2004) or the MCS2 Technical Report
on Fieldwork (NOP 2006) or the MCS3 Technical Report on Fieldwork (NatCen
2007) or the MCS4 Technical Report on Fieldwork (NatCen 2010) or the MCS5
Technical Report on Fieldwork (Ipsos MORI 2013).
The questionnaire was developed by the CLS team with input from 55 potential
users of the dataset from academe and government departments who attended a
consultation meeting on 11 October 2000. An instrument was initially piloted in
January 2001 and redeveloped into a shorter version for the second Dress
Rehearsal Pilot in April 2001.
First Pilot
The first pilot in January 2001 was conducted as a paper interview and computer-
aided self-completion interview (CASI) in order to assess the timing of the instrument
before the major work to convert the interview schedule into computer-aided
personal interview (CAPI) format. The sample size was boosted from 30 to 60 thanks
to the ONS consortium funding. Further details are in the NatCen Technical Report
on Fieldwork (NatCen 2004).
The second pilot took place during April 2001 and was fully computer-based (CAPI
and CASI). As a ‘dress rehearsal’ for the main stage, all the contact and
administrative processes were tested as well as the near final form of the survey
instruments. Thirteen wards were selected for this pilot, including one in each of
Wales and Scotland. The wards in England and Wales were chosen from those that
were to be used in the main stage. As the Scottish wards had not yet been selected,
a large deprived ward was purposively picked.
The DWP sampling route was tested with letters sent from the DWP at Newcastle to
parents of babies born between 12 June and 22 July 2000 on the Child Benefit
register in the chosen wards. The use of an advance letter sent by interviewers was
also piloted.
In addition, Health Visitors (HVs) were approached in the 12 English and Welsh
wards in order to pilot their contribution. Two HV supervisors declined to help, as we
had not received Multi-centre Research Ethics Committee (MREC) approval at that
time.
25
2. Development and Piloting of MCS2
The questionnaire was developed by the CLS team with input from a team of
external MCS2 collaborators. The questionnaire development was discussed at a
consultative meeting on 22 April 2002. An instrument was initially piloted in May
2003, and redeveloped for the second Dress Rehearsal Pilot in June 2003.
First Pilot
The first pilot in May 2003 was carried out as CAPI and CASI interviews of around
30 families in order to establish the time taken to carry out the early drafts of the
interview, self-completion and child assessments. It was also designed to identify
other problems such as flow, question wording recall and filtering.
The dress rehearsal for the study took place in June 2003. All of the procedures
planned for main-stage sampling and fieldwork were tested, including the taking of
saliva samples from the children; home and neighbourhood observations; and the
self-completion questionnaire for older siblings. The sample used for the MCS2
dress rehearsal consisted of respondents from the MCS1 dress rehearsal. Forty-
eight families were interviewed in 13 wards in England, Wales and Scotland.
The questionnaire was developed by the CLS team with input from a team of
external MCS3 collaborators. The questionnaire development was discussed at a
consultative meeting in July 2004. An instrument was initially piloted in May 2005,
and redeveloped for the second Dress Rehearsal Pilot in September/October 2005.
First Pilot
The first pilot in May 2005 was carried out as CAPI and CASI interviews of 49
families in order to establish the time taken to carry out the early drafts of the
interview, self-completion and child assessments and measurements. It was also
designed to identify other problems such as flow, question wording recall and
filtering. The sample was a quota sample recruited by interviewers.
The dress rehearsal for the study took place in September/October 2005. All of the
procedures planned for main-stage sampling and fieldwork were tested.
The sample used for the MCS3 dress rehearsal consisted, in England, Scotland and
Wales, of respondents from the MCS1 dress rehearsal and additional families
sampled for MCS3. Northern Ireland was included in the dress rehearsal for the first
26
time at MCS3; and all families in Northern Ireland were newly sampled for MCS3.
The dress rehearsal sample was drawn from Child Benefit records in 14 wards of the
UK and109 families were interviewed.
The dress rehearsal also included a postal teacher survey in Wales, Scotland and
Northern Ireland. This was in order to collect data equivalent to the Foundation
Stage Profile in England (which was obtained through data linkage for consenting
families in England).
The data collection instruments were developed by the CLS team with input from a
team of external MCS4 advisors. The development work started with a consultative
conference in July 2008 at which the convenors of the MCS4 advisory groups
presented their recommendations. A consultation on the first draft questionnaire for
parents and cohort members took place in January/February 2007 and on the first
draft teacher questionnaire in February/March 2007. The first pilot took place in
March-June 2007 and the Dress Rehearsal Pilots for families and teachers in July-
August 2007 and October-December 2007, respectively.
First Pilot
The first pilot in March/April 2007 was carried out as CAPI and CASI interviews of 38
families in order to establish the time taken to carry out the early drafts of the parent
interviews and self-completion; child self-completion; and child assessments and
measurements. It was also designed to identify other problems such as flow,
question wording recall and filtering. Of the 38 interviewed families, 26 had
previously been interviewed at MCS3 pilot 1, and 12 were newly recruited by
interviewers. It was a quota sample and covered Great Britain only.
The teacher survey pilot took place in May-June 2007. Of the families who took part
in the main pilot, 32 gave consent for their child’s teacher to be approached. Of
these, 23 returned a questionnaire after 2 reminders, giving a response rate of
around 72 per cent.
The dress rehearsal for the study took place in July/August 2007. All of the
procedures planned for main-stage sampling and fieldwork were tested.
The longitudinal dress rehearsal sample, drawn from Child Benefit records in 14
wards of the UK, consisted, in Great Britain, of respondents sampled for the MCS1
dress rehearsal and additional families sampled for MCS3. In Northern Ireland it
consisted of respondents sampled at MCS3, and 102 families were interviewed. This
was in excess of the target sample of 100 families.
27
The dress rehearsal also included a postal teacher survey which was carried out in
October-December 2007. In all, 84 teachers were approached (consenting families
in the main dress rehearsal) and 38 questionnaires were returned after 2 reminders,
giving a response rate of 45 per cent.
The data collection instruments were developed by the CLS team with input from a
team of external MCS5 advisors. The development work started with a consultative
conference in July 2010 at which the convenors of the MCS5 advisory groups
presented their recommendations. A consultation on the first draft questionnaires
took place in November 2010-January 2011. The first pilot took place in March-April
2011 for families and in May-June 2011 and the Dress Rehearsal Pilot for families
and teachers in August-September 2011 and October-November 2011, respectively.
First Pilot
The first pilot in March/April 2011 was carried out as CAPI and CASI interviews of 45
families in order to establish the time taken to carry out the early drafts of the parent
interviews and self-completion; child self-completion; and child assessments and
measurements and to test the feasibility of saliva sample collection. It was also
designed to identify other problems such as flow, question wording recall and
filtering. It took place in five areas on Great Britain only. All of the families newly
recruited by interviewers using quota sampling.
The teacher survey pilot took place in August-September 2011 covering England
and Wales only. Of the 37 families in England and Wales who took part in the main
pilot, 31 gave consent for their child’s teacher to be approached. Of these, 19
returned a questionnaire after reminders, giving a response rate of around 61 per
cent.
The dress rehearsal for the study took place in August/September 2011. All of the
procedures planned for main-stage sampling and fieldwork were tested.
The longitudinal dress rehearsal sample, drawn from Child Benefit records in 14
wards of the UK, consisted, in Great Britain, of respondents sampled for the MCS1
dress rehearsal and additional families sampled for MCS3. In Northern Ireland it
consisted of respondents sampled at MCS3. Additional families in England were
sampled through the Department for Education’s National Pupil Database and in
England through the Welsh Government’s record of pupils in Wales. In total, 126
families were interviewed. This was in excess of the target sample of 100 families.
28
The dress rehearsal also included a postal teacher survey which was carried out in
October-November 2011. In all, 103 teachers were approached (consenting families
in the main dress rehearsal in England and Wales) and 56 questionnaires were
returned after reminders, giving a response rate of 54 per cent.
29
PART FOUR: SURVEY CONTENT
The chart below shows the content of the MCS surveys at a glance.
Teachers Teachers
E & W only
Tables 19-23 below show in detail elements included at each sweep of the MCS. For
more details of the content for all surveys, please refer to the respective
questionnaires.
30
Respondent Mode Summary of content
- Domestic tasks
- Previous pregnancies
- Mental health
- Attitudes to relationships, parenting, work,
Etc
Interview Module J: Employment, income, education
Module K: Housing and local area
Module L: Interests
Father/Partner* Interview Module B: Father’s involvement with baby
Module C: Pregnancy, labour and delivery (where
applicable)
Module F: Grandparents and friends
Module G: Parental health
Self-completion Module H: Self-completion
- Baby’s temperament & behaviour
- Relationship with partner
- Previous partners
- Previous children
- Mental health
- Attitudes to marriage, parenting, work, etc
Interview Module J: Employment and education
Module L: Interests
* In the majority of cases, the Main interview was undertaken by the mother/mother figure while the
Partner interview was undertaken by the father/father figure. See Table 20.
31
Respondent Mode Summary of content
Module N : Older siblings
Father/Partner* Interview
Module B: Father’s involvement with baby
Module C: Pregnancy, labour and delivery (where
applicable)
Module F: Grandparents and friends
Module G: Parent’s health
Self-completion Module H: Self-completion
- Baby’s temperament & behaviour
- Relationship with partner
- Previous partners
- Previous children
- Mental health
- Attitudes to marriage, parenting, work, etc
Interview Module J: Employment and education
Module L: Interests
Interviewer Observations Home environment
Neighbourhood
Child Assessments BAS Naming Vocabulary
Bracken Basic Concept Scale
Height and weight
Oral fluids
Older sibling Self-completion**
* In the majority of cases, the Main interview was undertaken by the mother/mother figure and the
Partner interview was undertaken by the father/father figure. See Table 21.
** England only.
32
Respondent Mode Summary of content
- Relationship with partner
- Previous relationships, children living
elsewhere, non-resident parents
- Attitudes and ethnic identity
- Racial harassment and discrimination
- Work-life balance and life satisfaction
- Older Siblings’ temperament and behaviour
Interview Module OS: Older siblings
Module Z: Consents and contact information
Father/Partner* Interview Module FC: Family context
Module ES: Early education, schooling and childcare
(some)
Module PA: Parenting activities
Module PH: Parental health
Module EI: Employment, education and income
Module OM: Other Matters
Self-completion Module SC: Self-completion
- Parenting and parent-child relationship
- Mental health and drug-taking
- Relationship with partner
- Previous relationships, children living
elsewhere
- Attitudes and ethnic identity
- Racial harassment and discrimination
- Work-life balance and life satisfaction
Interview Module Z: Consents and contact information
33
Table 22: MCS4 – Summary of MCS4 Survey Elements.
34
Respondent Mode Summary of content
Interviewer Observations Cognitive assessment
Child Assessments Story of Sally and Anne
British Ability Scales: Word Reading
British Ability Scales: Pattern Construction
35
Table 23: MCS5 – Summary of FIFTH Survey Elements.
36
Respondent Mode Summary of content
Interviewer Observations Cognitive assessment
Cohort Member Assessments British Ability Scales: Verbal Similarities
Future education
Parents
Class groupings & setting
Child’s class
Teacher profile
* In the majority of cases, the Main interview was undertaken by the mother/mother figure and the
Partner interview was undertaken by the father/father figure. See Table (n) below.
37
PART FIVE: FIELDWORK
For a more comprehensive discussion of fieldwork please refer to the MCS1
Technical Report on Fieldwork (NatCen 2004) or the MCS2 Technical Report on
Fieldwork (NOP 2006) or the MCS3 Technical Report on Fieldwork (NatCen 2007) or
the MCS4 Technical Report on Fieldwork (NatCen 2010) or the MCS5 Technical
Report on Fieldwork (Ipsos MORI).
Following a competitive tender process NatCen was appointed to carry out the
fieldwork for MCS1. The fieldwork in Northern Ireland was sub-contracted by NatCen
to the Central Survey Unit of NISRA (the Northern Ireland Statistics and Research
Agency). For the most part it took place in 2002, having started in England and
Wales in June 2001, and in Scotland and Northern Ireland in September 2001. It
finished in January 2003.
Briefings
Briefings for the 232 interviewers who were to work in England and Wales were held
in 17 regional one-day meetings between 31 May and 15 June 2001. A further 42
interviewers working in Scotland were briefed at 4 sessions between 29 August and
6 September. These training sessions were conducted jointly by researchers from
NatCen and CLS. In Northern Ireland, some 50 interviewers were briefed at 4
sessions between 17 and 28 August.
Fieldwork Timetable
The fieldwork for MCS1 (and MCS2) was carried out in 17 consecutive waves. Each
issued wave of fieldwork contained babies born in a 4-weekly birth cycle (apart from
the last), with the first wave covering the births between 1 and 28 September 2000 in
England and Wales. This rhythm of recruiting the sample was dictated by the cycle
of DWP procedures, scanning the Child Benefit database every 4 weeks.
Interviewers arranged interviews as soon as possible after the addresses were
issued, aiming to reach the families while the baby was as close as possible to 9.5
months of age. Interviews with partners could be delayed until the child’s first
birthday (as were some main interviews where the address had been issued late).
The process for drawing each wave of the DWP sample is as follows:
Prior to fieldwork, the DWP sent opt-out letters to all parents of children with an
eligible birth date who were registered (for Child Benefit purposes) as living within
one of the sampled wards, apart from any cases flagged as sensitive. Batches of
letters, including an information leaflet, were sent every 4 weeks to families whose
babies were approximately 7 months old. The letter invited parents to take part in the
study and gave them the opportunity to opt out of the study by telephoning or writing
to the DWP. Any parents who opted out of the study were then removed from the
sample.
38
The final stage was for the DWP to remove cases which they discovered had
subsequently moved out of the sampled wards and to update the addresses for
cases which had moved within or between sampled wards. At this stage any late opt-
outs or newly sensitive cases were also removed.
The data were sent by the DWP to CLS in two stages, a week apart, in order to
ensure that any late opt-outs or changes of addresses could be notified as near to
the start of fieldwork as possible. After the final data were received serial numbers
were assigned to each valid case and the data were sent to NatCen, for issue to the
field.
The fieldwork timetable for the project detailing the dates of birth and fieldwork is
shown in the table below.
39
Waves 1-13 of fieldwork took place in England and Wales from June 2001 to July
2002. The last wave in England and Wales, wave 13, which included babies born on
31 August, was delayed by 4 weeks for operational reasons, so this wave contained
interviews mostly conducted at 10 rather than 9 months for these 2 countries. The
last wave in Scotland and Northern Ireland, wave 17, was the extended sample
spanning 7 weeks of births. The latest interview (with a partner) took place in
Northern Ireland on the last-but-one eligible day, 10 January 2003. Fieldwork in
Scotland (and with all main informants) finished before the end of 2002.
The aim was that the fieldwork for each wave should be as self-contained as
possible, with the minimum amount of overlap. Interviewers were briefed to interview
families when the baby was 9 months and 15 days old, ideally, in order to
standardise the data being collected as far as possible. Allowing for delayed
interviewing due to tracing problems, the window of opportunity to interview was
brief, up to 11 months of the babies’ age for the main interview and up to 12 months
for the partner.
Seventy-five per cent of main interviews took place while the baby was aged 9
months – 3,579 (19 per cent) at 10 months with 541 (3 per cent) at 8 months –
representing babies born towards the end of the 4-week span interviewed early in
the fieldwork period. However, 479 interviews took place late, 475 at 11 months and
only 4 in months 12-13. Seventeen were not interviewed because the time window
had expired by the time they were found. They are included in the ‘other ineligible’,
Table 7.2 in the Technical Report on Sampling (4th Edition) (Plewis 2007).
Languages
In-field Tracing
1 Proxy module done in error, i.e. the proxy 117 Data deleted from proxy module,
section of the Main interview was completed household outcome code re-classified to
about a partner who was not eligible to be ‘partial household’ and partner outcome
interviewed by proxy. code re-classified to unproductive.
2 Partner interview done by proxy in error, i.e. 42 Data deleted from partner interview,
the main respondent has completed the household outcome code re-classified as
partner interview on behalf of partner. ‘partial household’ and partner outcome
Partner should have done the interview code re-classified to unproductive.
him/herself.
3 Partner answered proxy in person, should 6 Data transferred from proxy section to
have done normal partner interview, i.e. the equivalent variables in partner interview,
partner completed the proxy module in household outcome code re-classified as
person (about him/herself). ‘main and partner in person’ and partner
outcome code re-classified to ‘partial
interview in person’.
4 Main interview done by father, partner 2 NONE
interview by mother, i.e. the data indicate
that the mother did the main interview and
the father did the partner interview but the
main interview was actually conducted with
the father (in error) and the partner interview
was actually conducted with the mother (in
error).
5 Father did both main and partner interviews, 1 NONE
i.e. the data indicate that the mother
completed the main interview and the father
completed the partner interview but actually
the father conducted both interviews (should
have only done the partner interview).
6 Main interview done by partner, no other 1 NONE
interview, i.e. the data indicate that the
mother completed the main questionnaire
and the father did not respond to the partner
questionnaire but actually the father
completed the main interview (in error) and
there was no partner interview.
41
Error Type N Action taken
Following a competitive tender process, the fieldwork for MCS2 was carried out by
NOP Research. The work in Northern Ireland was sub-contracted to Millward Brown
Ulster. This survey was conducted mainly during 2004. The main-stage started in
England and Wales in September 2003, and in Scotland and Northern Ireland in
December 2003. Fieldwork finished in early 2005.
Briefings
Interviewers who were to work in England and Wales were briefed before the start of
fieldwork in 13 regional 3-day meetings. Interviewers working in Scotland were
briefed at 3 additional sessions. These training sessions were conducted jointly by
researchers from NOP and CLS. In Northern Ireland, some interviewers were briefed
in just one session by Millward Brown and CLS researchers. There were 5 further
briefings during the course of fieldwork as new interviewers were added.
Some 150 interviewers were initially briefed to work on the survey; but by the time
fieldwork was complete around 200 interviewers had worked on the survey. Further
details may be found in the NOP Technical Report on Fieldwork (NOP 2006).
Fieldwork Timetable
Fieldwork started in September 2003 in England and Wales finished in April 2005. In
Scotland and Northern Ireland, fieldwork started in December 2003 and finished in
January 2005.
42
Fieldwork Wave Baby’s Date of Birth Fieldwork Period
Languages
In-field Tracing
Families who had moved from the issued address were traced in the field by NOP
interviewers. Families who could not be successfully traced by interviewers were
returned to CLS for additional tracing by the Tracing team. Details of in-field tracing
activities can be found in the Technical Report on Fieldwork (NOP 2006).
Following a competitive tender process the NatCen was appointed to carry out the
fieldwork for MCS3. The fieldwork in Northern Ireland was sub-contracted by NatCen
to the Central Survey Unit of NISRA (the Northern Ireland Statistics and Research
Agency). The main stage of this fieldwork took place within the calendar year of
2006, starting in England and Wales in January 2006, and in Scotland and Northern
Ireland in April 2008. The survey also included a follow-on survey of teachers outside
England extending into 2007.
43
Briefings
Interviewers were briefed in 3-day training sessions. These sessions were conducted
jointly by researchers from NatCen and CLS. For further details see NatCen (2007).
Fieldwork Timetable
The fieldwork timetable for MCS3 was driven by the requirement to interview the
family during the child’s first year of compulsory schooling (Reception Class in
England and Wales and Primary One in Scotland and Northern Ireland). As a result,
fieldwork was compressed into school years. In England and Wales, the cohort’s
birth dates span a single school year. However, in Scotland and Northern Ireland the
birth dates are spread over more than one school year. In England, Wales and
Northern Ireland, school year is normally determined by date of birth. In Scotland,
school year is determined by parental preference in addition to date of birth. For this
reason, school year was known with less certainty in advance in Scotland. During
the first wave of fieldwork in Scotland, interviewers were asked to find out, before
conducting the interview, whether the child had started school. If the child had not
yet started school, the interview was deferred until the second wave of fieldwork.
44
Table 28: Fieldwork timetable for MCS3 – Teacher Survey in Wales, Scotland
and Northern Ireland
Teacher Wave Country Main Fieldwork Wave Teacher Fieldwork
Languages
In-field Tracing
Families who had moved from the issued address were traced in the field by NatCen
interviewers. Families who could not be successfully traced by interviewers were
returned to CLS for additional tracing by the Tracing Unit. Details of in-field tracing
activities can be found in the Technical Report on Fieldwork (NatCen 2007).
Following a competitive tender process the NatCen was appointed to carry out the
fieldwork for MCS4. This was a planned extension to their existing contract for
MCS3. The fieldwork in Northern Ireland was sub-contracted by NatCen to the
Central Survey Unit of NISRA (the Northern Ireland Statistics and Research
Agency). The first wave of the main stage fieldwork commenced in England and
Wales in January 2008 and in Scotland and Northern Ireland in April 2008. The
survey also included a follow-on survey of extending into 2009.
Briefings
Interviewers new to the study were briefed in 3-day training sessions. Interviewers
who had worked on MCS3 were briefed in 2-day training sessions. Some of these
sessions were large ‘conference style’ briefings’. These sessions were conducted
jointly by researchers from NatCen and CLS (see NatCen 2010).
45
Fieldwork Timetable
The fieldwork timetable for MCS4 was driven by the requirement to interview the
family during the child’s third year of compulsory schooling (Year 2 in England and
Wales, and Primary Three in Scotland and Northern Ireland). As at MCS3, fieldwork
was compressed into school years. In England and Wales, the cohort’s birth dates
span a single school year. However, in Scotland and Northern Ireland the birth dates
are spread over more than one school year. In England, Wales and Northern Ireland,
school year is normally determined by date of birth. In Scotland, school year is
determined by parental preference in addition to date of birth.
46
Table 30: Fieldwork timetable for MCS4 – Teacher Survey
Wave 1 England and Wales Interviews in E1, E2, W1, W2 Jun-Nov 2008
up to end-Apr 2008
Wave 2b England and Wales Interviews in E1, E2, W1, W2 Jul-Dec 2008
up to end-May 2008
Wave 3 England, Wales, Interviews in E1, E2, W1, W2, Oct 2008-Feb 2009
Scotland, Northern S1, N1 up to end-Aug 2008
Ireland
Languages
In-field Tracing
Families who had moved from the issued address were traced in the field by NatCen
interviewers. Families who could not be successfully traced by interviewers were
returned to CLS for additional tracing by the Cohort Maintenance Team. Details of in-
field tracing activities can be found in the Technical Report on Fieldwork (NatCen
2010).
Following a competitive tender process the Ipsos MORI was appointed to carry out
the fieldwork for MCS5. The first wave of the main stage fieldwork commenced in all
countries in January 2012.
Briefings
All interviewers had a 3-day training session. In total, 23 briefings were conducted.
19 were conducted for Wave 1 (between January 2012 and February 2012). An
additional 2 briefings were conducted for Wave 2 (in August 2012) and 2 mop up
briefings were conducted (one in March 2012 and one in May 2012). In total, 325
interviewers were briefed. The size of the briefings varied between regions and
attendance ranged from between 13 to 21 interviewers. These sessions were
conducted jointly by researchers from Ipsos MORI and CLS (see Ipsos MORI 2013).
47
Fieldwork Timetable
The fieldwork timetable for MCS5 was driven by the requirement to interview the
family during the child’s last year of primary schooling (Year 7 in England and Wales,
and Primary Seven in Scotland and Northern Ireland). As at MCS3 and MCS4,
fieldwork was compressed into school years. In England and Wales, the cohort’s
birth dates span a single school year. However, in Scotland and Northern Ireland the
birth dates are spread over more than one school year. In England, Wales and
Northern Ireland, school year is normally determined by date of birth. In Scotland,
school year is determined by parental preference in addition to date of birth.
Table 32: Fieldwork timetable for MCS5 – Teacher Survey (England and Wales
only)
48
Teacher Wave Teacher Fieldwork
Languages
In-field Tracing
Families who had moved from the issued address were traced in the field by Ipsos
MORI interviewers. Families who could not be successfully traced by interviewers
were returned to CLS for additional tracing by the Cohort Maintenance Team. Details
of in-field tracing activities can be found in the Technical Report on Fieldwork (Ipsos-
MORI 2010). Additional tracing using administrative data was carried out by CLS.
49
PART SIX - THE DATA
There are two sets of 9 data files (one set of SPSS data files and one of STATA data
files):
This file contains one row for all families in the longitudinal sample: that is
families who have taken part in MCS1 or MCS2 (n=19,244 (18,552+692)).
2) The cross-sectional data from the Household Questionnaire, Main, Partner and
Proxy Interviews:
MCS1 Parent Interview Data
MCS2 Parent Interview Data
MCS3 Parent Interview Data
MCS4 Parent Interview Data*.
These files contain one row for each productive family at that sweep.
*The three bracketed income datasets have been separated out from the main
and partner data at MCS4 to reduce the size of the main and partner interview.
The summary information derived is deposited in the main, partner and proxy
information. If you wish to explore this further the full data are available in these
datasets. The MCS4 CAPI Questionnaire section 1.2.8 explains the way in
which these data were collected.
MCS5 Parent Interview Data
MCS5 Parent Interview Unfolding Brackets Data (in preparation)
MCS5 Proxy Interview Data
MCS5 Proxy Interview Unfolding Brackets Data (in preparation)
These files contain one row for each person in the household grid in productive
families at that sweep.
50
4) Child Assessment and Measurement Files:
MCS2 Child Measurement Data
MCS2 Child Assessment Data
MCS3 Child Measurement Data
MCS3 Child Assessment Data
MCS4 Child Measurement Data
MCS4 Child Assessment Data
MCS4 Child Our Adventures (Wales) Data
MCS4 Child Self Completion Data
MCS5 Child Measurement Data
MCS5 Child Assessment Data
MCS5 Child Self Completion Data
These files contain one row per older sibling who were reported upon by the
main respondent and also those older siblings who completed the paper self-
completion questionnaire.
These files contain one row for each visit to the productive families at that
sweep.
9) Derived Variables:
MCS1 Derived Variables
MCS2 Derived Variables
MCS3 Derived Variables
MCS4 Derived Variables
MCS5 Family Level Derived Variables.
51
These files contain one row for each productive family at that sweep.
These files contain one row for each productive main, partner of CM respondent at
that sweep.
Data can be linked on mcsid, which is a unique identifier for each family. Family-
level files can be linked on this identifier only.
Data that can be linked using solely mcsid includes the information which spans
sweeps such as weights and variables to carry out analysis on stratum which are
held on the MCS Longitudinal Family Level Information file and the Parent Interview
files.
The data also contain unique, longitudinally consistent individual identifiers for cohort
members and other people in the household.
The individual identifier for cohort members is cnum: cohort member number (to
identify separately twins and triplets) and the individual identifier for all other people
in the household is pnum: person number. These variables appear on all the data
files (except the Longitudinal Family Level Information file).
In the Parent Interview data, they are ahcnuma0 (cohort member 1), ahcnumb0
(cohort member 2), ahcnumc0 (cohort member 3) at MCS1. At MCS2 the leading ‘a’
is replaced with a ‘b’ and all other digits remain the same. At MCS3 the leading ‘b’
becomes a ‘c’ . At MCS4 the leading digit is a ‘d’ with all other digits remaining the
same and at MCS5 the leading digit is a ‘e’ with all other digits remaining the same.
In the Household Grid and Child Measurement and Child Assessment data they are
ahcnum00 (MCS1), bhcnum00 (MCS2), chcnum00 (MCS3), dhcnum00 (MCS4) and
ehcnum00 (MCS5).
In order to provide data which can be used for a variety of different purposes, a
separate file (the household grid) has been supplied to enable linkage by cohort
member or other respondent or member of the household, e.g. older sibling at each
sweep.
As indicated above the data at Sweeps 1 to 4 have been produced at one row per
family. To facilitate better longitudinal linkage and to make cohort member analysis
straight forward, data at MCS5 has been produced at one row per respondent, with
52
the cohort member specific variables separated out from the parental variables as
the focus of the study moves towards the cohort members themselves.
mcsid, ahcnum00, bhcnum00, chcnum00 or dhcnum00 from the household grid with:
mcsid and ahcnuma0, ahcnumb0, ahcnumc0 – MCS1
mcsid and bhcnuma0, bhcnumb0, bhcnumc0 – MCS2
mcsid and chcnuma0, chcnumb0, chcnumc0 – MCS3
mcsid and dhcnuma0, dhcnumb0, dhcnumc0 – MCS4
at MCS5 the cohort member dataset is available using mcsid and eccnum00
mcsid, ahpnum00, bhpnum00, chpum00 or dhpum00 from the household grid with:
mcsid and ampnum00 – MCS1 – main respondent
mcsid and bmpnum00 – MCS2 – main respondent
mcsid and cmpnum00 – MCS3 – main respondent
mcsid and dmpnum00 – MCS4– main respondent
mcsid and appnum00 – MCS1 – partner respondent
mcsid and bppnum00 – MCS2 – partner respondent
mcsid and cppnum00 – MCS3 – partner respondent
mcsid and dppnum00 – MCS4– partner respondent
53
MATCH FILES /TABLE=*
/FILE='mcs1 parent interview.sav'
/FILE='mcs2 parent interview.sav'
/FILE='mcs3 parent interview.sav'
/FILE='mcs4 parent interview.sav'
/BY mcsid.
SORT CASES by mcsid.
SAVE OUTFILE='Family Level.sav'.
The household grid files contain two types of information: individual identifiers and
identifying characteristics (number, sex and date of birth) and cross-sectional
variables collected about everyone in the household (e.g. relationships between
household members).
At MCS2, the household grid was collected independently from MCS1, i.e. the MCS1
grid was not fed forward. In order to create longitudinally consistent individual
identifiers, the two household grids were matched. This involved matching people
using their individual identifying characteristics (name, sex and date of birth). All
people present in the household at MCS1 retained their original person number7 and
any new entrants were given the next available person number.
At MCS2, with information only available from two sweeps, it was not always
possible to determine which data were correct when information was inconsistent.
With MCS3 we were in a better position to resolve these issues. Our approach has
been to clean data only where it is clear that the corrections can be made with
certainty. The sex variable was checked by reference to names collected at MCS2
and MCS3. Cleaning of relationships was restricted to differences in report which
straddled the adult/child boundary, e.g. grandparent / grandchild, father / son. Other
relationships which are possible, even where unlikely, such as step-parent / other
non-relative or natural / adopted / foster, were not changed.
The household grid contains one record for each person who has ever appeared in
the household for each family that participated in that sweep.
There is a variable which indicates for each person whether or not they are present
at any particular sweep: ahcprs00, bhcprs00, chcprs00, dhcprs00, ehcprs00, for
cohort members in MCS1, MCS2, MCS3, MCS4 and MCS5, respectively; and
ahpres00, bhpres00, chpres00, dhpres00 and ehpres00 for all other people in
MCS1, MCS2, MCS3, MCS4 and MCS5, respectively. These can be used to identify
people moving in, out and back into the household by merging the three household
grid files. For cases where the main interview was not conducted at MCS2 (i.e. only
7
Except for part-time partners who at MCS1 were all assigned a person number of 12. These
people were assigned the next available person number in the household at MCS1.
54
a partner interview was conducted) and a main interview was completed at MCS1,
bhpres00 was labelled as ‘Not Known’.
The other information on the household grid file (relationships and other cross-
sectional information) is retained as reported at that sweep (with the exception of
some limited cleaning of relationships longitudinally to attempt to correct for mis-
keying).
The individual details and cross-sectional information from the household grid which
relates to cohort members, main and partner respondents, appears on the Parent
Interview Data. This means that any derived variables using the sex and date of birth
of cohort members, main and partner respondents and/or relationships of other
household members to cohort members, main and partner respondents, can be
derived solely from the Parent Interview files.
Any derived variables using the sex and date of birth of other people in the
household, relationships between other people in the household and detailed
analysis of change in household composition8 must be done using the household
grid data.
At MCS1, the household grid had to be completed before carrying out any interviews
and was collected for all families (including those in which a Main interview was not
done). At MCS2, the household grid was collected as part of the Main interview. As a
result, it was not completed in households in which there was not a main interview.
For households which were not the subject of a Main interview at MCS2 but took part
in MCS1, the household grid contains the individual details for everyone who was in
the household at MCS1. In order for these families to be interviewed, the interviewer
would have established that the cohort member was present. So, for those
households the cohort member present flag (bhcprs00) indicates that they were
present. Also, in these households, we have indicated that the person who was the
main respondent at MCS1 was present at MCS2 (bhpres00) (and that they were
eligible for the Main interview) although the cross-sectional variables (relationships,
etc) are not available in these families. In addition, if a partner or partner proxy
interview has been conducted in these households, the person who completed this
interview is indicated as present (bhpres00), although the cross-sectional variables
(relationships, etc) are not available in these families. For all other people in these
households bhpres00 indicates that we do not know whether or not they are present.
Some of the families in which the Main interview did not take place were ‘New
Families’, i.e. those that did not take part in MCS1 because they moved into the
eligible areas too late to be included in the initial survey. For such families, the
8
There are derived variables about change in household composition relating to
parents/carers.
55
individual details of the Child Benefit claimant and, if applicable, the person who
completed the partner or partner proxy interview appears in the household grid and
these people are coded as present (bhpres00) and the Child Benefit claimant as
eligible for the Main interview.
At MCS3, MCS4 and MCS5, the procedure was the same as for MCS1. The
household grid was completed before carrying out any interviews and was collected
for all families (including those in which a Main interview was not done).
At each of MCS1, 2, 3, 4 and 5 there were three different parent interviews which
could be completed with up to two different people per family. The three interviews
were: Main, Partner and Partner Proxy. The selection of household members for the
different interviews was done by the CAPI program, based on relationship to the
cohort member and relationships between different household members. In general,
any parents (including step, foster and adoptive) of cohort members and partners
(including same-sex partners) of parents were eligible for interview. If there were no
parents in the household, the main carer of the cohort member (and their partner)
was selected for interview. In each household, there should always have been
someone selected for the Main interview. A different person would have been
selected for the Partner interview. If the person selected for the Partner interview
was away for the fieldwork period or incapacitated, they became eligible for the
Partner Proxy interview instead of the Partner interview, which was completed by the
Main respondent on behalf of their partner.
At Sweep 1 there was a priority for the natural mother, if present, to do the main
interview as it contained questions about pregnancy and delivery. In the few cases
where mothers did the Partner interview it was due to language problems. At sweep
2 the preference was for the same person who had done the Main interview at
sweep 1 to do it again if possible. If the Main respondent from sweep 1 was no
longer in the cohort child’s household, but at least one biological parent of the child
was, then that person was selected as the new Main informant, even if he or she
was not the main carer of the child. If there was no biological parent in the household
then whoever was the main carer for the cohort child was selected for the main
parent questionnaire. At sweeps 3, 4 and 5 the presumption again was that the
natural mother, the natural father in her absence, the previous Main informant or the
main carer, in that order, would be selected as the main informant. But families could
elect to follow other arrangements where, for example, the father was the main carer
and the mother chose not to do either interview. Tables 33-37 summarise the
different combinations of Parent interviews at each sweep.
At MCS1, there was a Main interview in 18,532 of the 18,552 families. There was
someone eligible for a Partner interview in 15,358 families and an interview was
completed in 13,225 of these cases. Proxy data were collected on 216 partners (of
the 235 who were eligible); but interview data are completely missing for 1,917 two-
‘parent’ families (adresp00). Table 20 also shows that the vast majority of the Main
respondents were female. 18,524 (out of 18,815) were natural mothers. There were
56
28 male Main respondents, all natural fathers, 18 of whom were lone fathers. All but
61 (99.6 per cent) of the 15,358 partners identified in the families visited were natural
fathers. Since the Main respondent was asked questions about pregnancy and
delivery the presumption was that, wherever possible, the natural mother should be
the main informant. Some of the cases where roles were reversed were because of
language problems.
Natural Natural
Frequency Per cent mother Other father Other
At MCS2, there was someone eligible for the Main interview in 15,588 of the 15,590
productive families9 and an interview was completed in 15,448 cases. There was
someone eligible for a Partner interview in 12,856 families and an interview was
completed in 10,479 of these cases, with data by proxy in 233. There were 2,154
two-‘parent’ families with data missing on the partner, and 63 with data missing from
the Main. There were also 79 families with some data (e.g. child assessments) but
no interview data from either a Main or a Partner respondent. The Main respondents
were again overwhelmingly female, but the number of them who were not natural
mothers increased since MCS1 from 9 to 55. The number of male Main respondents
increased from 28 at MCS1 to 187 (2 of whom were not natural fathers). Part of this
change was an increase of lone-father informants (to 62), but it was mostly due to a
rise in the number of two-parent families where the Main response was collected
from the father (97 per cent of the partners were natural fathers).
9
There were 2 families in which the person who should have been eligible for the Main
interview actually completed the Partner interview, and there was no-one else eligible for
interview.
57
Table 34: MCS2 Parent interview response by sex of respondent and
relationship to cohort member
1. Main respondent in
person (no-one eligible for
partner) 2,655 17.03 2,574 19 61 1 0
5. No main interview,
partner interviewed in
person 61 0.39 3 1 1 0 56
6. No main interview,
partner interviewed by
proxy 2 0.01 0 0 0 0 2
At MCS3, a Main interview was conducted in 15,210 of the 15,246 families. There
was someone eligible for a Partner interview in 12,189 families and an interview was
completed in 10,475 cases, with proxy data collected in a further 287. Information
was not collected on partners in 1,408 couples, and from main respondents in 19
families where the partner responded. In 36 cases there were no interviews in the
dataset from any parent. The proportion of Main informants who were natural
mothers again dropped, to 97 per cent (14,792). The number of female Main
respondents who were not natural mothers hardly changed from MCS2 (58). But the
number of Main respondents who were men changed by a significant amount. The
number of natural fathers completing the main interview was 394 (more than double
the 185 at the age 3 survey). Seventy-two were lone fathers and the rest were part of
a couple. The switch to a male informant would have arisen in cases where the
natural mother no longer lived with the child, and where the father elected to be
treated as the main carer.
At MCS3, a Main interview was conducted in 15,210 of the 15,246 families. There
was someone eligible for a Partner interview in 12,189 families and an interview was
completed in 10,475 cases, with proxy data collected in a further 287. Information
was not collected on partners in 1,408 couples, and from main respondents in 19
58
families where the partner responded. In 36 cases there were no interviews in the
dataset from any parent. The proportion of Main informants who were natural
mothers again dropped, to 97 per cent (14,792). The number of female Main
respondents who were not natural mothers hardly changed from MCS2 (58). But the
number of Main respondents who were men changed by a significant amount. The
number of natural fathers completing the main interview was 394 (more than double
the 185 at the age 3 survey). Seventy-two were lone fathers and the rest were part of
a couple. The switch to a male informant would have arisen in cases where the
natural mother no longer lived with the child, and where the father elected to be
treated as the main carer.
Natural Natural
Frequency Per cent mother Other father Other
At MCS4, a Main interview was conducted in 13,797 of the 13,857 families. There
was someone eligible for a Partner interview in 10,687 families and an interview was
completed in 9,180 of these, with proxy data collected in a further 249 cases.
Information was not collected from partners in 1,484 couples where the Main
responded. A further 19 families had information from the partner but not from the
Main respondent. In 41 cases there were no Parent interviews. The proportion of
informants eligible to respond as Main, and who were natural mothers, dropped
slightly from 97.0 per cent at MCS3 to 96.6 per cent (13,392). The number of natural
fathers eligible to complete the Main interview at MCS4 was 392, which hardly
changed from MCS3. Ninety-nine of these (compared with only 72 at MCS3) were
lone fathers and the rest were part of a couple.
59
Table 36: MCS4 Parent interview response by sex of Main respondent and
relationship to cohort member
At MCS5, a Main interview was conducted in 13,212 of the 13,287 families. There
was someone eligible for a Partner interview in 10,031 families and an interview was
completed in 8,843 of these, with proxy data collected in a further 119 cases.
Information was not collected from partners in 1,188 couples where the Main
responded. The proportion of informants eligible to respond as Main, and who were
natural mothers, dropped slightly from 96.6 per cent at MCS4 to 95.2 per cent
(12,657). The number of natural fathers eligible to complete the Main interview at
MCS5 was 508, up from MCS4. 172 (compared with only 99 at MCS4) were lone
fathers and the rest were part of a couple.
60
Table 37: MCS5 Parent interview response by sex of Main respondent and
relationship to cohort member
Implications
In the vast majority of cases at all sweeps, the natural mother did the Main interview
and the natural father the Partner interview. There are derived variables on the
Parent interview data which give details of the identity and interview status for Main
and Partner respondents: admres00, adpres00, bdmres00, bdpres00, cdmres00,
cdpres00, ddmres00, ddpres00. At MCS5, as the parent interview data is now
stacked, there is a single equivalent variable, eddres00.
On the Household Grid files, the Main and Partner respondents and their interview
status are identified by the variables ahelig00 and ahresp00 (MCS1), bhelig00 and
bhresp00 (MCS2), chelig00 and chresp00 (MCS3), dhelig00 and dhresp00
(MCS4), ehelig00 and ehresp00 (MCS5)
61
At MCS1, 3, 4 and 5, the identity of the person eligible for the Main and Partner
interviews was derived from the household grid and available for all families
(regardless of whether or not the individual interviews were completed). At MCS2,
the identity of the individuals eligible for the Main and Partner interviews was not
known if the interview was not conducted. As discussed above, where the main
interview was not carried out at MCS2, we indicated that the Main respondent from
MCS1 was present and eligible for the Main interview. Where the Main interview was
not done at MCS2, household composition information was not collected; so unless
a partner interview was done, there was no-one recorded as eligible for the Partner
interview. In households in which the Main interview was done but there was no
Partner interview, the person eligible for the Partner interview was derived using
relationships between household members. In these families, the Partner was
assumed to be eligible for interview in person (rather than by proxy). This explains
why the number eligible and responding to the Partner Proxy interview are identical.
A number of assessments have been administered to the MCS children since they
were aged 3. The following assessments were administered to the MCS children at
different sweeps:
MCS Sweep
Assessment
MCS 2 MCS 3 MCS 4 MCS5
62
4.1 The British Ability Scales
Following consultation with advisers and piloting, the BAS Naming Vocabulary scale
was administered by interviewers to cohort members during the MCS2 data
collection.
The Naming Vocabulary is a verbal scale for children aged 2 years 6 months to 7
years 11 months. It assesses the spoken vocabulary of young children. The test
items consist of a booklet of coloured pictures of objects which the child is shown
one at a time and asked to name. The scale measures expressive language ability,
and successful performance depends on the child’s previous development of a
vocabulary of nouns. Picture recognition is also crucial; however, the pictures are
large and brightly coloured and are unlikely to cause problems except for children
with major visual impairments or with no experience of picture books. The items
require the child to recall words from long-term memory rather than to recognise or
understand the meaning of words or sentences.
Scores
Variable Description
63
cdnvabil S3 COG: Naming Vocabulary ability score
Children are shown a row of 4 pictures on a page and asked to place a card with a
fifth picture under the picture most similar to it. This assessment measures children’s
problem solving abilities.
Variable Description
The child constructs a design by putting together flat squares or solid cubes with
black and yellow patterns on each side. The child’s score is based on accuracy and
speed. This assessment tests spatial awareness but can also be used to observe
dexterity and coordination, as well as traits like perseverance and determination.
Variable Description
Word Reading is an assessment from the British Ability Scales: Second Edition (BAS
2) which assesses children’s English reading ability.
64
The child reads aloud a series of words presented on a card. The assessment
consists of 90 words in total. The words are organised into 9 blocks of 10 words in
ascending order of difficulty. The child is asked to read each word in a block out loud
to the interviewer. The number of blocks of words the child is asked to attempt to
read is dependent on the child’s performance during the assessment. This
assessment is designed to be used with children aged from 5 years to 17 years and
11 months. All of the children in MCS4 started at the first item, as this was the
starting point for children of their age.
Variable Description
In Wales a different test was carried out (see Section 4.3 below).
Verbal Similarities is an assessment from the British Ability Scales: Second Edition
(BAS 2) which assesses children’s verbal reasoning and verbal knowledge.
The interviewer reads out three words to the child who must then say how the three
things are similar or go together.
This assessment is designed to be used with children aged from 5 years to 17 years
and 11 months. All of the children in MCS5 start at the 16th item, as this is the
starting point for children of their age. There are decision points after items 28 and
33 where the child’s performance so far decides whether the test stops or continues
to the next set of questions. The test stops at the decision point unless the child has
less than three failures on all items so far. In this case they are routed to the next set
of questions. If the child has obtained less than three passes however, they are
routed back to the previous starting point (e.g. item 8).
After five consecutive failures the test is automatically stopped provided that at least
three items have been passed prior to this, otherwise they are routed back to the
previous starting point.
If the child fails either of the first two items administered they are provided with
teaching to help them to understand the concept of the test. If the child subsequently
gives a correct answer to the same question it is acknowledged but they do not
receive a point for that question.
65
The dataset provides the following scores:
Variable Description
There are three types of score provided for each scale of the BAS: raw score, ability
score and T-scores or standardised scores. Each type has its uses and limitations.
Raw Scores
Raw scores are simply the number of items the cohort member child answered
correctly. They do not take into account the stop and start points of the items
administered; for this reason, the raw scores have little meaning and should not be
used.
Ability Scores
The ability scores are a transformation of the raw scores that take into account the
specific item set administered. They are not adjusted for anything else, so are the
scores to consult for unadjusted cognitive scores.
There are some issues to keep in mind when using ability scores. The first is that it
not a truly continuous scale. The table below shows the correspondence between
some example raw scores and ability scores. As can be seen from this table, there
are ability scores that cannot be obtained.
66
Raw Score Ability Score
8 40
9 43
For convenience, the ability scores for each scale start with a value of 10, which
reflects a raw score of 0 on the easiest possible set of items in a scale. The upper
limit of ability scores varies from scale to scale. Because the ability scale uses an
arbitrary numbering system, comparing ability scores from different scales is not
meaningful, just as comparing raw scores from different scales is not meaningful.
The other issue is that the ability scores are not adjusted for age. Children of a large
range of ages take the same BAS tests, and the general trend is that older children
score higher. When using ability scores, one should control for child age. The issue
of age and the BAS scales is discussed in further detail in the section below on BAS
Scales and Age.
Also available for all scales are T-scores or a standardised score. These scores are
adjusted for the cohort member child’s age group and for the mean scores of the
BAS norming group. They are computed using the BAS manual’s conversion tables.
For each 3-month age group, there is a table showing the conversion of ability
scores to T-scores or standardised scores. The T-scores have a mean of 50 and
standard deviation of 10 within the norming sample of a given age group. A cohort
child who has an ability score that is the same as the mean for the norming group in
his or her age group will have a T-score of 50. A child with a T-score of 60 had an
ability score that was one standard deviation above the norming sample mean for his
or her age group.
All of the scales used with the MCS sample in sweeps 2 through 4 have T-scores,
with the one exception of Word Reading at MCS 4. That scale has a standardised
score rather than a T-score. The only difference between the standardised score and
the T-scores is that the former does not have a mean of 50 and standard deviation of
10. It is otherwise computed the same as the T-score, adjusting for age group and
norming sample mean and standard deviation.
There are pros and cons to using T-scores or standardised scores. While these
scores take into account child age, they are based on 3-month age groupings of the
norming sample. They don’t take into account the score variation with each group of
3 months. They also are based on the relationship between age and score in the
norming sample rather than within the MCS sample. Using the age of the MCS
sample one is using as a control will be a more accurate adjustment for age than
using the T-scores (see the section on age equivalence below for more information).
However, if one is looking at univariate relationships and cannot control for MCS
child age, it could be beneficial to use the T-scores or standardised scores,
especially in cases in which the variables of interest may be related to child age.
67
As the T-scores and standardised scores remove the mean and standard deviation
of the norming sample from each score, they may hide differences in variance at
different ages. If one is interested in how variance in BAS scores differs across age
or sweep, one may want to avoid using the T-scores or standardised scores so that
the actual variance in the sample is clear.
As was the case for the ability scores, the T-scores and standardised scores are not
truly continuous.
Below is a list of variables by MCS sweep for the different score types:
Further information
Elliott, C.D., Smith, P, and McCulloch, K (1996). British Ability Scales Second
Edition (BAS II): Administration and Scoring Manual. London: NFER-Nelson.
Elliott, C.D., Smith, P, and McCulloch, K (1997). British Ability Scales Second
Edition (BAS II): Technical Manual. London: NFER-Nelson.
The Bracken Basic Concept Scale – Revised (BBCS-R) is used to assess the basic
concept development in children in the age range of 2 years 6 months to 7 years 11
months. BBCS–R measures the comprehension of 308 functionally relevant
educational concepts in 11 subtests or concept categories. Following consultation
with advisers and piloting, only subtests 1-6 were administered by interviewers to the
members of the cohort during the MCS2 data collection.
68
The sub-tests administered together form the Bracken School Readiness
Assessment (BSRA) which evaluates 88 concepts relating to:
1. Colours: represents both primary colours and basic colour terms.
2. Letters: measures knowledge of both upper- and lower-case letters.
3. Numbers/Counting: measures recognition of single- and double-digit
numbers, and samples the ability to assign a number value to a set of
objects.
4. Sizes: includes concepts that describe one, two, and three dimensions.
5. Comparisons: measures ability to match and/or differentiate objects based
on one or more of their salient characteristics.
6. Shapes: includes one-, two-, and three-dimensional shapes. The one-
dimensional category includes linear shapes; two-dimensional shapes are
represented by concepts such as the circle, square, and triangle; and three-
dimensional shapes include concepts such as the cube and pyramid.
Scores
Raw Scores: The total number of correct answers for each of the six BRSA sub-
tests.
Variable Description
69
Percentage mastery: The raw score as a percentage of the maximum possible score
for each sub-test.
Variable Description
Age-adjusted scores
The following variables are derived from bdbsrc00 (School Readiness Composite)
which is the total number of correct answers adjusted for age.
School Readiness Composite Standard Score: The total number of correct answers
on all six sub-tests (bdsrcs00).
Normed scores: Derived from standard tables in the BSRA manual and defined with
reference to the standardisation sample used in developing the assessments. The
standardisation sample was composed of 1,100 children aged between 2 years 6
months and 8 years 0 months representative of the general US population and was
stratified by age, gender, race/ethnicity and parental education:
Further information
70
5.2 NFER Number Skills (MCS4)
This test was adapted from the NFER Progress in Maths test which is aimed for 7-
year-olds and was originally developed and nationally UK standardised in 2004. The
whole test has a maximum raw score of 28. The national mean raw score in 2004
was 19.3 with a standard deviation of 5.3. The scores were nationally age
standardised to a mean of 100 and SD of 15.
The edition of this test used in the MCS is an adaptive version of the test created by
Cres Fernandes of NFER. All children have to complete an initial test and based on
their score they are routed to easier, medium or harder sections. The sections were
devised to save administration time, as it means each child completes around half
the original number of questions.
An item response scaling method (Rasch) was used to scale the results of the easy,
medium and hard subtest scores to the equivalent original raw scores. The variable
maths7scale can be considered to be the estimated raw score based on the original
test. The variable maths7sas is the standardised age adjusted score based on the
national standardisation lookup tables in 2004.
Variable Description
Our Adventures is part of the All Wales Reading Test, which was developed in
Wales to assess the reading skills of children in Welsh schools. The test is available
in Welsh and English.
In MCS4, parents of children in Wales were given the option of having their child’s
reading skills assessed in either Welsh or English. The Welsh version of Our
Adventures was used for children whose parents opted for the Welsh medium to be
used, and the Word Reading assessment was used for children whose parents
opted for the English medium to be used.
It was decided to use the Welsh medium All Wales Reading Test, rather than a
Welsh translation of the Word Reading assessment because the Word Reading
assessment is designed only to assess English reading ability and if translated the
results are not valid.
71
The Our Adventures assessment is a paper booklet that shows a story in pictures
and words; underneath each picture is a sentence that has one missing word, and a
list of words that can complete the sentence. The child has to circle the word that
best completes the sentence. There are a total of 59 items, and the assessment has
a time limit of 30 minutes. The assessment continues until the time limit has been
reached, or the child completes the last item.
This assessment is designed to be used with children from age 6 years 10 months to
9 years 9 months.
Variable Description
DCCSEX00 S4 CM Cohort member Sex
DCOQ0100 S4 CM Picture 1
DCOQ0200 S4 CM Picture 2
DCOQ0300 S4 CM Picture 3
DCOQ0400 S4 CM Picture 4
DCOQ0500 S4 CM Picture 5
DCOQ0600 S4 CM Picture 6
DCOQ0700 S4 CM Picture 7
DCOQ0800 S4 CM Picture 8
DCOQ0900 S4 CM Picture 9
DCOQ1000 S4 CM Picture 10
DCOQ1100 S4 CM Picture 11
DCOQ1200 S4 CM Picture 12
DCOQ1300 S4 CM Picture 13
DCOQ1400 S4 CM Picture 14
DCOQ1500 S4 CM Picture 15
DCOQ1600 S4 CM Picture 16
DCOQ1700 S4 CM Picture 17
DCOQ1800 S4 CM Picture 18
DCOQ1900 S4 CM Picture 19
DCOQ2000 S4 CM Picture 20
DCOQ2100 S4 CM Picture 21
DCOQ2200 S4 CM Picture 22
DCOQ2300 S4 CM Picture 23
DCOQ2400 S4 CM Picture 24
DCOQ2500 S4 CM Picture 25
DCOQ2600 S4 CM Picture 26
DCOQ2700 S4 CM Picture 27
DCOQ2800 S4 CM Picture 28
DCOQ2900 S4 CM Picture 29
DCOQ3000 S4 CM Picture 30
DCOQ3100 S4 CM Picture 31
DCOQ3200 S4 CM Picture 32
DCOQ3300 S4 CM Picture 33
72
Variable Description
DCOQ3400 S4 CM Picture 34
DCOQ3500 S4 CM Picture 35
DCOQ3600 S4 CM Picture 36
DCOQ3700 S4 CM Picture 37
DCOQ3800 S4 CM Picture 38
DCOQ3900 S4 CM Picture 39
DCOQ4000 S4 CM Picture 40
DCOQ4100 S4 CM Picture 41
DCOQ4200 S4 CM Picture 42
DCOQ4300 S4 CM Picture 43
DCOQ4400 S4 CM Picture 44
DCOQ4500 S4 CM Picture 45
DCOQ4600 S4 CM Picture 46
DCOQ4700 S4 CM Picture 47
DCOQ4800 S4 CM Picture 48
DCOQ4900 S4 CM Picture 49
DCOQ5000 S4 CM Picture 50
DCOQ5100 S4 CM Picture 51
DCOQ5200 S4 CM Picture 52
DCOQ5300 S4 CM Picture 53
DCOQ5400 S4 CM Picture 54
DCOQ5500 S4 CM Picture 55
DCOQ5600 S4 CM Picture 56
DCOQ5700 S4 CM Picture 57
DCOQ5800 S4 CM Picture 58
The Memory task is a touch-screen assessment that tests the child's ability to retain
spatial information and to manipulate remembered items in working memory. It also
assesses use of strategy. The aim of this test is that, by process of elimination, the
child should find one blue ‘token’ in each of a number of coloured boxes displayed
on the screen and use them to fill up an empty column (black hole) on the right hand
side of the screen. To see if a blue token is beneath a coloured box, the child has to
touch it with their index finger. If a blue token is revealed to be beneath a coloured
box, the child moves it to the black hole by touching the black hole with their index
finger. Touching any box in which a blue token has already been found is an error,
as is touching any box which has been found to be empty while searching for the
same token. The child decides the order in which the boxes are searched.
Performance at the harder levels of this task is enhanced by the use of a search
strategy. The number of boxes is gradually increased from three to eight boxes. The
colour and position of the boxes used are changed from trial to trial to discourage the
use of the same search strategies from trial to trial.
73
The child’s overall score is calculated from three different aspects of their
performance: errors, strategy and latency. Their performance is scored on each of
the assessed trials.
Errors are the number of times the child revisits a box which has
previously been found to be empty or in which a token has been
previously found.
Strategy is the order in which the child decides to search the boxes. On
the harder levels the child will perform better if they make use a search
strategy.
Latency is calculated from three different measures of ‘time taken’. They
are the average time the child takes to first touch the screen when a new
trial is presented, the average time the child takes between when they
place the token in the black hole and the next time they touch a box and
the average time it takes the child to find the final token from the time each
trial was presented on screen.
Variable Description
SWMTTIME SWM Test Duration (seconds)
74
5.5 CANTAB Cambridge Gambling Task (MCS5)
Variable Description
75
General influences on test scores
It is important to note that the child’s performance may have been affected by
influences extraneous to those that the assessment is intended to measure. The
conditions listed below can lead either to a higher or lower score than would normally
be obtained.
76
6. Cohort Member Behavioural Development
The respondent is asked to comment on the following statements with: Not true,
Somewhat true or Certainly true.
v) Pro-social Scale
1. Considerate of others’ feelings
2. Shares readily with others
3. Helpful if someone is hurt, upset or ill
4. Kind to younger children
5. Often volunteers to help others.
* Denotes items that are reversed – when generating sub scales on behaviour
problems.
77
Each of the 5 sub-scales can be used alone or together to create:
1-4 when taken together generates a total difficulties score.
1 and 4 create an internalising problem score.
2 and 3 create an externalising conduct score.
5 alone measures pro-social behaviour.
78
Sweep and description Variable names
Data format has changed at MCS5 and child variables are now stacked:
79
Further information
For more information about the scoring and interpretation of the Strengths and
Difficulties Instrument see:
Goodman, R. (1997). ‘The Strengths and Difficulties Questionnaire: A
Research Note.’ Journal of Child Psychology and Psychiatry. 38: 581-586.
Goodman, R. (2001), ‘Psychometric properties of the Strengths and
Difficulties Questionnaire (SDQ).’ Journal of the American Academy of Child
and Adolescent Psychiatry. 40: 1337-1345.
Goodman, R., Meltzer, H. and Bailey, V. (1998). ‘The Strengths and
Difficulties Questionnaire: A pilot study on the validity of the self-report
version.’ European Child and Adolescent Psychiatry. 7: 125-130.
Height
The original height variables – byhtcm00 and byhtmm00 (MCS2); cyhtcm00 and
cyhtmm00 (MCS3); and dchtcm00 (MCS4) – have not been edited.
Where interviewer notes gave clear warnings that the height values entered
were incorrect, the values were removed from bdhcmc00, bdhmmc00,
cdhcmc00, cdhmmc00 and dchtdv00.
Where the interviewer notes gave a value to replace an incorrect entry, these
were changed in bdhcmc00, bdhmmc00, cdhcmc00, cdhmmc00 and
dchtdv00.
The variables bdhtam00 and cdhtam00 are flags to show if any changes
were made. There were very few interviewer comments at MCS4 relating to
measurements.
80
At MCS4, the variable dchtis00 indicates whether “measurement circumstances”
(dchtrz0a to dchtrz0d) and/or “other information” (dchtex0a and dchtex0b) was
given in relation to the height measurement, and flags up the highest and lowest 100
or so values where no other circumstances are mentioned.
Weight
Copies of the variables were made – bdwtkc00 and bdwtgc00 (MCS2); cdwtkc00
and cdwtgc00 (MCS3); and dcwtdv00 (MCS4) – and appropriate changes were
made to them as follows:
Where interviewer notes gave clear warnings that the weight values entered
were incorrect, the values were removed from bdwtkc00, bdwtgc00,
cdwtkc00, cdwtgc00 and dcwtdv00.
Where the interviewer notes gave a value to replace an incorrect entry, these
were changed in bdwtkc00, bdwtgc00, cdwtkc00 and cdwtgc00. There
were very few interviewer comments at MCS4 relating to measurements.
The variables bdwtam00 (MCS2) and cdwtam00 (MCS3) are flags to show if
any changes were made.
The amended height and weight variables were used to calculate BMI.
81
The formula to compute BMI is weight (in kilos) divided by height squared (height
measured in metres). This is computed for cases where we have a valid value given
for both height and weight, and will be missing if either or both measurement is
missing.
Outliers
All height and weight observations have been included in the data, even where they
might be considered outliers. All observations have been used to calculate the BMI
measure. We leave it to individual researchers to take decisions on whether they
consider any of the measurements to be outliers and what they do with such
observations. Users should be warned that the dataset contains a few values that
other users have considered implausible.
8. Income data
The MCS has collected income in a number of different ways over the different
sweeps. At sweeps 1-5 income data were collected in a single banded question in
addition a set of detailed questions which collected information on a range of
different measures detailed in the Table below.
Partner
Partner
Partner
Partner
Main
Main
Main
Main
Main
Gross Earnings
Net Earnings
Usual net Earnings
Earnings from second job
Irregular earnings from
occasional work
Earnings from Self-
employment
Housing benefit
Child benefit - -
Guardian’s Allowance - -
Carer’s allowance - -
State pension - -
Widow’s pension - -
War disablement allowance - -
82
Income MCS 1 MCS 2 MCS 3 MCS 4 MCS5
Severe Disablement
- -
Allowance
Disability Allowance - -
Job seekers allowance - -
Pension credit - -
Income support - -
Incapacity benefit - -
Working tax credit
Child tax credit
Child care tax credit
Statutory sick pay - -
Grant from the social fund for
- -
maternity expenses
Other social fund grant - -
Maternity Allowance - -
Banded data
Respondents were shown a card with weekly, monthly and annual bands of total
take-home income from all these sources and earnings after tax and other
deductions. These ‘sources’ implicitly included state benefits, which had been the
subject of more detailed previous questions. Note that, unlike other state benefits,
there was no attempt to ascertain the amounts of housing benefit and council tax
benefit received as separate components, so they may well have been omitted from
estimates of total net income as reported. Bands of different sizes were used for lone
and ‘couple’ families.
83
Missing income data (item non-response)
Analysis of the collected data shown in the Table below indicates that more than
1,500 of MCS families, at each sweep, do not provide banded income data either by
saying they didn’t know their family income or refusing.
Table 39: Completeness of MCS banded household net income data (number
of families)
* There were 144 families at MCS2 where there was no response to the banded income
question. # we are unable to differentiate refusals from don’t knows at MCS5.
We imputed income for the cases where it was missing using interval regression
(Stewart 1983). This method allowed us to impute a continuous value within a band
where income band was available, rather than assuming that all cases in a band had
the same midpoint income. This was achieved using Stata’s INTREG command
(StataCorp 2007; Conroy 2005). INTREG fits a model of y=[dependent variable 1,
dependent variable 2] on independent variables where in our case, dependent
variable 1 was the log lower income band and dependent variable 2 was log upper
income band. Note that the left-hand-side bound for the lowest band is 0 and the
right-hand-side bound for the top band is the 100th income percentile in the UK. The
predictors are given in the following table.
84
Variable Categories
Main respondent’s age at Continuous
interview
Housing tenure Own
Private renting
Renting from Local Authority or Housing Association
Other
DV combined labour Both in work/on leave
market status of main and Main in work/on leave, partner not in work/on leave
partner respondents
Partner in work/on leave, main not in work/on leave
Both not in work/on leave
Lone parent in work/on leave,
Lone parent not in work/on leave
Point type Advantaged
Disadvantaged
Ethnic
DV interview government North East
office region North West
Yorkshire and the Humber
East Midlands
West Midlands
East of England
London
South East
South West
Wales
Scotland
Northern Ireland
Receipt of state benefit? No
Yes
Main respondent's ethnic White
group – Mixed
6 category census Indian
classification (UK)
Pakistani and Bangladeshi
Black or Black British
Other ethnic group (inc. Chinese and other Asian)
DV combined education NVQ level 1
highest NVQ NVQ level 2
NVQ level 3
NVQ level 4
NVQ level 5
Overseas qual only
None of these
85
Variable Categories
Main type of A house or bungalow
accommodation A flat or maisonette
A studio flat
Number of children 1
including cohort child 2
3
4+
DV summary of Two parents/carers
parents/carers in
household One parent/carer
Equivalisation
We used modified OECD scales for equivalisation. Each scale sets the family’s
needs relative to those of a couple with no children whose scale is set equal to 1. In
the modified OECD scale, a family of one parent and one child under 14 has a scale
of 0.87; one parent and two such children 1.07; and so on. This is shown below.
Spouse 0.33
Dependent child age between 14<=18 years old (16<=18 for 0.33
McClements)
The average, minimum and maximum of the imputed income variable are given in
the following table.
86
Average income by income quantile.
References
Conroy, R.M. (2005). ‘Stings in the tails: Detecting and dealing with censored data.’
Stata Journal. 5: 395-404.
Hansen, K. (2008). Millennium Cohort Study First, Second and Third Surveys: A
Guide to the Datasets, Third Edition.
(http://www.cls.ioe.ac.uk/studies.asp?section=00010002000100110002http://www.cl
s.ioe.ac.uk/studies.asp?section=00010002000100110002; accessed on 16/05/2008)
HBAI Team, Information Directorate, Department for Work and Pensions (2005).
Households below average income statistics: Adoption of new equivalence scales.
(http://www.dwp.gov.uk/asd/hbai/nsfr_newequiv.pdf; accessed 22 May 2008)
StataCorp. (2007). Stata Statistical Software: Release 10. College Station, TX:
StataCorp Lp.
Stewart, M.B. (1983). ‘On least-squares estimation when the dependent variable is
grouped.’ Review of Economic Studies. 50: 737-753.
As the cohort members grow up the focus of the survey increasingly is focussed on
them. From MCS4 it was appropriate to gain the cohort member’s own views on
their developing lives. The cohort members were given their own self completion
questionnaire at MCS4 (age 7). This involved a short, easy to read, 8 page paper
self completion which the interviewer gave to them during the home visit. It took
around 20 minutes to complete. At Age 7 the age appropriate topics covered
included: Hobbies, friends and family, feelings and school.
At MCS5 (age 11) the self completion questionnaire was extended significantly to
reflect the greater complexity if the cohort member’s lives and their ability to answer
a longer, more complex instrument. The age 11 questionnaire was 28 pages long
87
and took around 30 minutes to complete. Once again it was a self completion paper
questionnaire. The topics at age 11 included:
Activities outside school
Internet & social networking
Life satisfaction
Happiness
Self esteem
Friends
Unsupervised time
Pocket money
Family financial position & materialism
Anti social behaviours,
School
Secondary school
Attitudes
Other children (incl. bullying)
Risky behaviours (incl. smoking & alcohol)
Mental health
Future ambitions
At MCS5 the questionnaire was offered in audio assisted mode to cohort members
who had lower levels of literacy using an MP4 player. However, less than 2% (1.8%)
of cohort members completed the self completion with audio support.
The explanation of relationship between question names (in the questionnaire) and
variable names (in the data) is given below:
where:
Prefix1: Indicates the sweep; a= MCS1; b=MCS2; c=MCS3; and so on.
88
Question name – the 4-letter question name in the instrumentation.
Suffix1: Identifies the iteration, i.e. where the same question is repeated for
different events/individuals, 0=no iteration; a=iteration 1; b=iteration 2;
c=iteration 3; and so on.
Hence, the variable names on the dataset have the following form:
E.g. bmfcin00 holds information from MCS2 (b), given by the main respondent (m),
in response to a question named ‘fcin’, which was not repeated (0) or multi-coded
(0).
Please note, the prefixes identifying sweep and instrument are not included in the
variable names given below.
89
Table 41: Conventions for Suffixes in Variable Names
Repeated for child, not ‘a’ for the first child, ‘b’ for the second 0
event, single-coded and ‘c’ for the third. The maximum
number of cohort children is 3.
Repeated for child not ‘a’ for the first child, ‘b’ for the second ‘a’-‘z’ depending on the
event, multi-coded and ‘c’ for the third. The maximum number of possible
number of cohort children is 3. responses
Repeated for event, not Starting with ‘a’ for the first event and 0
child, single-coded using subsequent letters of the
alphabet for successive events.
Repeated for event not Starting with ‘a’ for the first event and ‘a’-‘z’ depending on the
child, multi-coded using subsequent letters of the number of possible
alphabet for successive events. responses
Repeated for event and Starting with ‘a’ for the first event for 0
child, single-coded child 1 and using subsequent letters of
the alphabet for successive events.
The first event for child 2 will be
indicated by the next letter of the
alphabet after that used for the last
event for child 1, and so on. In this
situation the letters will not indicate
which child the variable relates to.
Repeated for event and Starting with ‘a’ for the first event for ‘a’-‘z’ depending on the
child, multi-coded child 1 and using subsequent letters of number of possible
the alphabet for successive events. responses
The first event for child 2 will be
indicated by the next letter of the
alphabet after that used for the last
event for child 1, and so on. In this
situation the letters will not indicate
which child the variable relates to.
Variable labels
Variables have been labelled in a consistent manner to aid navigation within the
datasets. Labels have abbreviated descriptions to indicate, sweep, instrument and
position in loops.
90
Table 42: Abbreviations used in Variable Labels
Abbreviation Description
S1 Sweep 1
S2 Sweep 2
S3 Sweep 3
DV Derived Variable
OS Older Siblings
NA Neighbourhood Assessment
R These appear at the end of labels and indicate an event loop such as
pregnancy R1, R2, R3, where R1 means first pregnancy, R2 means
second pregnancy, R3 means third pregnancy, etc.
91
The labelling will indicate whether this is a multicoded (MC1, MC2 etc) variable of a
repetition R1, R2 of a question e.g. pregnancy, employment spells).
Value labels
The value labels are also similarly derived from the CAPI program and have similarly
been reviewed and, where necessary, modified in an effort to ensure that labels are
comprehensible and accurate.
Some information was fed forward from earlier sweeps. The feed forward data were
associated with the Main respondent and the Partner respondent from the previous
sweep. It was fed forward into the MCS2, MCS3, MCS4 or MCS5 interview if the
interviewer indicated that the Main respondent was the same as at the previous
sweep or that the Partner respondent was the same as at the previous sweep . The
name of the Main respondent and Partner at the previous sweep was made available
for interviewers. In some cases, the interviewer coded variable is discrepant with the
derived variable indicating There are derived variables indicating this and this implies
that information was fed forward into the Main/Partner interview because the CAPI
thought the respondents were the same but in fact the respondents were different.
Details of coding and editing activities can be found in the Codebook and Edit
Instructions prepared by NatCen, included in this deposit and their Technical Report
on Fieldwork (NatCen 2004).
MCS2 data were received from NOP in SPSS format. The data went through an
extensive process of restructuring to produce the current datasets.
Because the Household Grid information was not fed forward from MCS1,
construction of the current household grids had to be carried out by a process of
matching each individual recorded at MCS2 with that at MCS1. Twenty-eight per
cent of individuals did not require matching because the family did not take part in
MCS2, either they were new families entering the study for the first time, or younger
siblings of the cohort member.
Of the remaining, 37 per cent matched on name, sex and date of birth. Cohort
members matching on full name accounted for another 18 per cent (date of birth of
cohort members was not re-collected unless there was a discrepancy with that fed
forward). Of the remaining 17 per cent, 9 per cent were either new entrants or
leavers from the household and 6 per cent matched on full name only. The
92
remaining 3 per cent matched on less reliable measures. The full list of matches was
checked by eye to reveal any discrepancies (false positives and false negatives). A
fuller analysis of how this compared to the final cleaned data used in the deposited
data will appear in due course.
Data for child assessments, child measurements and home observations also
needed to be matched as their number in the household was not passed between
instruments. As there are only a relatively small number this was done by hand.
Household grid information was fed forward from the interview at MCS1 and MCS2.
Where responses conflicted, the value used was a majority where response from the
previous two surveys were available or the latest where only one previous interview
was available for comparison. Checks were also applied to investigate implausible or
unlikely values, grandmothers under 30, natural siblings more than 30 years apart,
etc.
Essentially the same strategy was employed as at MCS3, but with the addition of
more checks on inter-family relationships with an emphasis on relationship to the
cohort child and the main and partner respondents.
The data collection was split across three instruments for the main and partner
respondent elements. This was initially reconciled by IPSOS-MORI and further
integrity checks were then conducted by CLS. This was complicated by the use of a
different person ID to that used at CLS, resulting in a mapping exercise between
those used in the data collection and that seen in the output data. The unintended
consequence of this was that the household grid and subsequent relationships
needed to be re-organised, this was done at CLS. Checks were constructed to
ensure that the people present in the household are longitudinally consistent,
through checks on date of birth, sex, and relationship to Cohort Member. As is the
case in self report of relationships, in many cases this led to correction of
relationship’s previously collected, and were the data collection asked for
confirmation of a change being required, this was accepted as being the correct
relationship. In some cases, e.g. where a relationship is corrected from partner to
married, it has not been possible to reconcile whether this requires historical
correction or just applies to the existing data collection.
a) MCS1
Details of coding and editing activities can be found in the Codebook and Edit
Instructions prepared by NatCen, included in this deposit and their Technical Report
on Fieldwork (NatCen 2004). Special thanks to Professor Neville Butler who was
tireless in developing coding frames for the open-text answers to health questions,
93
and in supervising the ICD10 coding at CLS of responses on mothers’ and fathers’
longstanding illness.
b) MCS2
Details of coding and editing activities can be found in the NOP Technical Report on
Fieldwork (NOP 2006).
In 2007, ONS were commissioned to re-code the occupation variables for MCS2.
Coding Approach
Automated Coding
In total, ONS received 52,868 records. The first stage of coding for ONS was to run
the entire sample through its corporate automated coding tool “ACTR” (Automated
Coding by Text Recognition). ACTR automatically coded 24,281 records, leaving
28,587 records.
Manual Coding
The 28,587 records not automatically coded were distributed equally between the
coders in ONS, who were asked to make a variety of assumptions, as follows:
Where the job title is non-descript, code using the job description.
Where the job title and the job description differ, the record was coded to the
job title.
If the job title is not sufficiently detailed to assign a SOC 2000 code to the unit
group (4-digit) level, code to the most detailed level possible.
In the cases where there are 2 possible codes for the job title and a
subjective approach was called for, the code assigned was always to the
lowest level. For example, “Armed Forces” were coded to “Other rank” rather
than “Officer”.
Quality Checking
Automated Coding
At present, ACTR is tuned to code an ONS survey, for which the accuracy has been
adjudicated as 99.80 per cent. As the MCS is new to ACTR all records coded by
ACTR were checked and found to be 98 per cent accurate, with incorrect records
manually changed. The reduced quality for the MCS was due to ACTR not being
tuned for the survey, as it was the first time ACTR had seen it. Information from the
94
MCS will be used to tune ACTR so the quality of ACTR will be enhanced should the
MCS be automatically coded in the future.
Manual Coding
Once the manual coding was completed, a 10 per cent sample of the manually
coded records was drawn by the ONS Methodology Division. The sample selected
maintained the SOC code distribution, and was checked by someone other than the
coder who initially coded the record. Coder accuracy was found to be 95 per cent,
with queried records changed where appropriate.
Where there was ambiguity as to how to code a record it was decided that these
would be coded after the majority of coding was completed in a “committee” format,
all coders discussing and then coding the record together.
The final quality check involved grouping the job titles and showing all the different
SOC codes associated with them. This allowed the coders to identify areas of
inconsistency and make changes accordingly.
c) MCS3
Details of coding and editing activities can be found in the NatCen Technical Report
on Fieldwork (NatCen 2007).
d) MCS4
Details of coding and editing activities can be found in the NatCen Technical Report
on Fieldwork (NatCen 2010).
e) MCS5
Details of coding and editing by IPSOS-MORI can be found in the MCS5 Coding and
Editing Report (IPSOS-MORI, 2013). Further editing e.g. value labels for multi-coded
questions received from IPSOS-MORI included the response within the value label,
this has been moved to the variable label and the description changed to Yes/No as
appropriate were conducted at CLS.
Linked data
MCS has collected consents to link to a range of other data: A detailed guide MCS
Ethical review and Consent has been produced and is available from
www.cls.ioe.ac.uk/mcssample
95
The following linked datasets are available from the UK Data Service:
The deciles were created using the rank for each sub-measure provided. As a
practical example, in England there were 32,482 LSOA's, each decile containing
3,248 or 3,249 LSOA's. This data was then linked to address at interview at Lower
Super Output Area Level. The IMD measures used were based on the following
definitions:
The websites for ONS, Welsh Assembly, Scottish Assembly and NISRA have
specific details:
England: www.communities.gov.uk/documents/communities/pdf/131209.pdf
Wales: http://wales.gov.uk/topics/statistics/theme/wimd/2005/?lang=en
Scotland: www.scotland.gov.uk/Publications/2004/10/20089/45181
Northern Ireland: www.nisra.gov.uk/archive/deprivation/nimdm2005fullreport.pdf
MCS postcodes have been classified into different types of rural and urban areas.
Again these are country specific. An overview is provided by ONS at
http://www.ons.gov.uk.
96
The data for the Rural Urban measures were linked at Lower Super Output Area
Level and used the following definitions:
The Birkbeck definition of Rural Urban in England is that used by DEFRA. More
information on this is available from ONS at the above URL.
Linked education records were obtained from the National Pupil Database (England and
Wales), and the Attendance, Absence, Pupil Census and School Meals Survey in Scotland.
The data is available from the UKData Service under SN6862, SN7414 and SN7415. There
is no comparable national dataset available from Northern Ireland.
97
Part 7. ETHICAL CONSIDERATIONS
The process of gaining medical research ethical approval proved a major hurdle. As had
been the practice with the previous cohort studies, medical research ethical clearance was
sought from the National Health Service Ethical Authority (in February 2001,
MREC/01/6/19). This was as a general precaution for future health data collection and was
specifically required because of the proposal to involve Health Visitors. Any research
involving NHS staff needs to be given such clearance. We were directed to the South West
Multi-Centre Research Ethics Committee in March 2001, who felt that opt-out sampling could
be coercive and might fail to obtain properly informed consent. They did, however, accept
that written opt-ins would tend to exclude vulnerable people, so procedures were devised in
consultation with the Committee to give potential respondents more information before they
committed themselves for interview. Advance letters introducing the interviewer were sent
shortly before her/his first visit and they were asked to arrange interviews generally after
their first visit, whose main purpose should be to give information. A simplified information
sheet was produced, and translated into several languages.
For MCS2, ethical approval was again sought for the pilot and main surveys – on this
occasion from the London Multi-Centre Research Ethics Committee. Following their
deliberations, the members of the Committee sought additional information on various
aspects of the survey, commented on aspects of tracing procedures adopted for families
discovered to have moved, and requested that a number of specific changes be made to
information leaflets and consent forms. Ethical approval was given in September 2004,
MREC/03/2/022).
Both pilot surveys and the main survey of MCS3 were considered by the London Multi-
Centre Research Ethics Committee of the NHS. Their letter granting a favourable ethical
opinion for the Economic and Social Research Council Millennium Cohort Study Third
Survey 2005: Dress Rehearsal and Main Survey 2nd amendment (12 December 2005) was
granted on 15 December 2005, with the REC Reference No. 05/MRE02/46.
Both pilot surveys and the main survey of MCS4 were considered by the Northern and
Yorkshire Multi-Centre Research Ethics Committee of the NHS. Their letter granting a
favourable ethical opinion for the Economic and Social Research Council Millennium Cohort
Study Fourth Survey: Dress Rehearsal and Main Survey 2nd amendment (3 January 2008)
was granted on 5 February 2008, with the REC Reference No. 07/MRE03/32.
98
5. MREC for MCS5
Ethical approval for the Pilot 1 was obtained on 24th March 2011 from the Northern and
Yorkshire REC: Ref: 11/H0903/3/ For the Dress Rehearsal and Main Stage approval was
granted by the Yorkshire and Humber REC on 29th July 2011: Ref:11/YH/0203. On the 13th
December 2011, confirmation of a favourable opinion was received in relation to a
substantial amendment put to the Yorkshire and Humber REC covering the addition of the
DWP data linkage consent collection to the study.
6. Codes of Practice
7. Consents
At each sweep of the survey a series of consents were asked of the respondents. These are
detailed below.
Saliva Sample
MCS3 Written Parent1 MCS3 Technical Report on
Fieldwork - appendices
Main interview & self-completion Consent 1: data collection
parent 1 and cohort child
99
Survey Consent Who from Elements Document
Child assessments & measurements Consent 1: data collection
parent 1 and cohort child
Linkage to NHS medical records and Consent 3: cohort child
accidents (birth to age 7 – IF NOT health records
GOT AT 3)
100
Survey Consent Who from Elements Document
101
Centre for Longitudinal Studies
Institute of Education
20 Bedford Way
London WC1H 0AL
Tel: 020 7612 6860
Fax: 020 7612 6880
Email: [email protected]
Web: www.cls.ioe.ac.uk
102