Outliers

Outliers are data value(s) that are extreme or atypical, that are different from the rest of the data.

Outliers can alter the results of data analysis and hence needs to be detected. The existence of outliers influences the mean more than the median or the mode.

For example, let's consider this data set for retirement age

54, 54, 54, 57, 58, 60, 60

Here, the median is 57 and the mean is 56.7

Next, let's replace the last value 60 with 80. This value of 80 is an outlier as it is much higher than the other values. 

The new data set becomes 54, 54, 54, 58, 60, 80. Here, the median remains 57 but the mean becomes 59.6 which is higher than the previous mean. This is because all the values in a data set are used to calculate the mean, including the outlier. Here, the outlier has increased the mean value of the distribution.

In such instances, the median might be more appropriate to use rather than the mean.



No comments:

Post a Comment