Skip to main content

Table 1 List of input parameters obtained for machine learning and missing instances

From: Predicting oral cancer risk in patients with oral leukoplakia and oral lichenoid mucositis using machine learning

Input feature

Type

Missing instances (Hong Kong Cohort only)

Handling technique for missing data

Age

Continuous

0

NA

Sex

Boolean

0

NA

Tobacco smoking

Boolean

2

Binarization of variables during feature engineering

Alcohol drinking

Categorical (nominal)

33

Risk habit indulgence following diagnosis

Categorical (nominal)

0

NA

Previous malignancy

Categorical (nominal)

0

NA

Charlson Comorbidity Index (CCI)

Continuous

0

NA

Hypertension status

Boolean

0

NA

Diabetes Mellitus status

Boolean

0

NA

Hyperlipidemia status

Boolean

0

NA

Autoimmune disease status

Boolean

0

NA

Viral hepatitis status

Boolean

0

NA

Type of lesion

Boolean

0

NA

Clinical subtype of lichenoid lesion

Categorical (nominal)

0

NA

Tongue/FOM involved

Boolean

0

NA

Labial/buccal mucosa involved

Boolean

0

NA

Retromolar area involved

Boolean

0

NA

Gingiva involved

Boolean

0

NA

Palate involved

Boolean

0

NA

Number of lesions

Categorical (ordinal)

0

NA

Presence of ulcers or erosions

Boolean

0

NA

Presence of induration

Boolean

0

NA

Treatment at diagnosis

Categorical (nominal)

0

NA

Recurrence after surgical excision

Boolean

0

NA

Number of recurrences

Categorical (ordinal)

0

NA

Oral epithelial dysplasia at diagnosis

Categorical (nominal)

0

NA

Oral epithelial dysplasia detected during follow-up

Categorical (nominal)

0

NA

  1. All predictors were used for modeling and the variables in bold were the predictors included in the 15 feature models in this study. Note that ‘Tobacco smoking’ and ‘Alcohol consumption’ were binarized into a single variable