Data analysis

Data analysis is a fundamental process for understanding information contained in large amounts of structured data. Through advanced tools, it is possible to identify patterns, detect anomalies, and evaluate data quality efficiently.
The Data Analysis feature in EasySheet Pro allows automatic analysis of datasets in Excel, generating a detailed report on data quality and identifying potential issues.

Notes
If the dataset contains many missing values, inconsistencies, or formatting errors, the analysis may be inaccurate. It is recommended to clean and verify the data before running the analysis.

This feature is useful for:

Key Features

Automatic Data Type Recognition
The module automatically identifies whether a column contains numerical, textual, or date values. However, if the dataset has inconsistent formatting, misclassification may occur.
It is advisable to review the results to ensure columns have been classified correctly.

Outlier Identification Based on the IQR Method
Outliers in numerical data are identified using the Interquartile Range (IQR). This method is effective for detecting anomalies, but it is not foolproof.
If the data is widely distributed or contains legitimate extreme values, some valid data points may be classified as outliers.

Data Quality Score Calculation
The module assigns a Quality Score (%) to each column based on:

A low score indicates that the column needs thorough review.

Possible False Positives in Anomaly Detection Some data patterns may trigger false problem alerts (e.g., a column with unique numerical IDs may be flagged for too many outliers).
It is advisable to manually review warnings before correcting or removing data.

Performance Impact with Large Datasets
For very large datasets (>100,000 rows), analysis may take longer. To optimize performance, it is recommended to: Run the analysis on a data subset before processing the entire dataset.
Close other workbooks to reduce Excel’s workload.
Enable manual calculation in Excel to speed up processing.

Limitations in Date Analysis
The module attempts to detect temporal gaps and anomalies in date-type data but cannot distinguish between normal absences and data errors (e.g., a missing week in a weekly dataset might be incorrectly flagged as an issue).

Guidelines for Interpreting the Quality Heatmap
The module generates a heatmap with a visual bar representing the column’s quality score.
Color References: