The categorical features table summarizes multiple statistics about every categorical feature in the data, allowing it to be investigated all at once.
The statistics presented in the table include:
Uniques - Number of unique categories in each categorical feature in the data-set.
N/A - Number of missing values in each categorical feature in the data-set.
Least frequent- the least frequent category and it's appearance frequency in percentage for each categorical feature in the data-set.
Most frequent - the most frequent category and it's appearance frequency in percentage for each categorical feature in the data-set.
Normalized entropy - entropy of a feature represents the amount of "information" or "surprise" inherent within this feature values. The more balanced the distribution between the different categories, the more uncertain the outcome of a random pick will be. Another factor that introduces uncertainty is the number of possible outcomes, so we use normalized entropy that accounts for that and allows to compare between features with different number of categories. Normalized entropy can range between 0 to 1, where 0 means low amount of information and very homogeneous feature and 1 means high amount of information,
Potential outliers categories- some indices of samples that are suspected to be outliers based on their value for this specific feature.
Comments