From: Assessing the accuracy of record linkages with Markov chain based Monte Carlo simulation approach
Data field | Value |
---|---|
RECID (Record identifier) | 7 alphanumeric characters ranges from ‘A000001′ to ‘A400000’ |
SA1 (Statistical Area 1) | A hypothetical two-level geographical location system, Statistical Area 1 (SA1). Each SA1 contains exactly 400 records. The values are 5 digit code numbered from 10,001 to 11,000 |
MB (Meshblock) | Every SA1 consists of exactly 5 Meshblocks or MB. Each Meshblock contains 80 records of file \(Y\) and 10 records in file \(X\). The values are 7 digit code ranges from 1,000,101 to 1,100,009 |
BDAY (Birth Day) | 20,000 consecutive days from 1 January 1955 to 3 October 2009. BDAY values are numeric and ranges from 1 to 366 |
BYEAR (Birth Year) | Value is numeric and ranges from 1955 to 2009 |
SEX (Male/Female) | The value 1 and 2 represents male and female respectively. Exactly 50% of all records are male, and the rest 50% are female |
EYE (Eye Colour) | Values are numbered from 1 to 5 and are evenly distributed |
COB (Country of Birth) | 75% of the total records are assigned a value ‘1101′ for ‘Born in Australia’. The remaining 25% records are randomly assigned one of about 300 country codes according to the corresponding proportion of people in the 2006 Census |