Paper | Domain (data set) | Quality evaluation | Tools/technologies | Quality level spec. | Quality metric spec. | Quality policy spec. |
---|---|---|---|---|---|---|
Ludwig [18] | Social media (Twitter, Facebook, Maps) | Weighted score: link, credibility, up-to-datedness, dissemination, quality of coordinates | OpenSocial format, MSSQL database | Weighting of quality attributes | – | – |
Reuter [16] | Social media (Facebook) | Based on metadata, content, message, classification, and scientific methods | Stanford NER, Classifier4J, Open Thesaurus,Gisgraphy Geocoder | Weighting of parameters | – | – |
Taleb [9] | Medical (EEG data) | Accuracy, consistency | Hadoop MapReduce | XML file: targeted data quality | XML file: data cleansing algorithm | Data quality profile |
Serhani [10] | Medical (SHRS) | Completeness, consistency, accuracy | Talend and Trifacta Wrangler. ML Vagrant | – | – | – |
Immonen [11] | Social media (Twitter) | Timeliness | Cassandra | – | – | Static |
Our approach | Social media (Twitter) | Timeliness, relevancy, popularity | Spark, Cassandra, Word2Vec | Ranges specified in SearchFilteringPolicy | JEngineRules/XML file | Dynamic quality policies |