Skip to main content

Table 1 Data field description

From: Assessing the accuracy of record linkages with Markov chain based Monte Carlo simulation approach

Data field

Value

RECID (Record identifier)

7 alphanumeric characters ranges from ‘A000001′ to ‘A400000’

SA1 (Statistical Area 1)

A hypothetical two-level geographical location system, Statistical Area 1 (SA1). Each SA1 contains exactly 400 records. The values are 5 digit code numbered from 10,001 to 11,000

MB (Meshblock)

Every SA1 consists of exactly 5 Meshblocks or MB. Each Meshblock contains 80 records of file \(Y\) and 10 records in file \(X\). The values are 7 digit code ranges from 1,000,101 to 1,100,009

BDAY (Birth Day)

20,000 consecutive days from 1 January 1955 to 3 October 2009. BDAY values are numeric and ranges from 1 to 366

BYEAR (Birth Year)

Value is numeric and ranges from 1955 to 2009

SEX (Male/Female)

The value 1 and 2 represents male and female respectively. Exactly 50% of all records are male, and the rest 50% are female

EYE (Eye Colour)

Values are numbered from 1 to 5 and are evenly distributed

COB (Country of Birth)

75% of the total records are assigned a value ‘1101′ for ‘Born in Australia’. The remaining 25% records are randomly assigned one of about 300 country codes according to the corresponding proportion of people in the 2006 Census