Quantcast
Viewing latest article 1
Browse Latest Browse All 9656

Re: SAP HANA Anomaly function - PAL

Hi Hasan,

 

As Paul mentioned, there is no preferred way for outlier detection. From my point view, distance-based approach like k-means tends to form spherical clusters, which doesn't seem to fit in your original data. Though GMM works on your dataset, it has an assumption that the data are generated  from a mixture of Normal distributions. This is not universally true. When data don't follow normal distribution, we might need to use density-based algorithm like DBSCAN. In many real cases, there are categorical variables, and in those cases, GMM might not be a good choice either due to the distribution assumption.

 

From my experience in machine learning, there is no clear evidence that GMM is better than k-means. Again we need to choose the right algorithm according to the data and the problem.

 

BTW, GMM is included in PAL as well. You are welcome to use.

 

Best regards,

 

Xingtian


Viewing latest article 1
Browse Latest Browse All 9656

Trending Articles