Data skew is a statistical property indicating an asymmetric distribution of values in a dataset, where a majority of values cluster on one side of the distribution's range, creating a long tail on the opposite side. This imbalance can be positive skew (tail extends to the right, mean > median) or negative skew (tail extends to the left, mean < median). Detecting skew is a fundamental step in data profiling, as it reveals underlying data generation processes and potential quality issues that can severely impact downstream machine learning model performance and analytical accuracy.




