Data Cleansing and DE-Duplication
However, investigations show that many such applications fail to work successfully. There are many reasons to cause the failure, such as poor system infrastructure design or query performance.
WinPure Clean and Match contains a unique 3 step approach for finding duplications in given list or database.
In small studies, a single outlier will have a greater distorting effect on the results . Some screening methods such as examination of data tables will be more effective, whereas others, such as statistical outlier detection, may become less valid with smaller samples.
Despite the data need to be analyzed quickly, the data cleansing process is complex and time-consuming in order to make sure the cleansed data have a better quality of data. The importance of domain expert in data cleansing process is undeniable as verification and validation are the main concerns on the cleansed data.
Society for Clinical Data Management. Good clinical data management practices, version 3.0. Milwaukee (Wisconsin): Society for Clinical Data Management; 2003. Available: http://www.scdm.org/GCDMP
Armitage P, Berry G. Statistical methods in medical research, 2nd ed. Oxford: Blackwell Scientific Publications; 1987. 559 pp.
Ki FY, Liu JP, Wang W, Chow SC. The impact of outlying subjects on decision of bio-equivalence. J Biopharm Stat. 1995;5:71–94.