Increase Tax Base: Identifying tax evaders with data-driven insights
Project Overview
In many countries, significant tax evasion reduces government revenue. This project utilized data from taxation departments to identify individuals eligible to pay taxes but evading them, aiming to improve tax compliance and boost revenue.
Problem Statement
The challenge was obtaining accurate data from various government departments to identify non-filers, aiming for a 75% match with taxable individuals. Issues included incomplete and erroneous data, requiring extensive cleaning and transformation for analysis.
Key Findings
- Data Availability: A significant barrier to identifying non-filers was the lack of accessible and accurate data from government departments.
- Data Quality: Raw and unfiltered data from various departments required substantial cleaning and processing to ensure accuracy.
- Taxpayer Identification: Efficiently identifying taxable individuals required thorough data modeling to categorize and track entities from diverse sources.
Implemented Solution
To tackle these challenges, the project utilized a structured and comprehensive data approach:
-
Initial Loading and Filtering:
The first phase focused on filtering out problematic data, removing errors and incomplete information before type conversion.
-
Data Processing:
Cleaned the data, assigned unique identifiers to entities, established relationships between various data points, and performed necessary transformations.
-
Entity Mapping:
The final step involved mapping the processed entities to the data model, ensuring consistency while accounting for variations across data sources.
Results
The project successfully achieved a match rate of 97% to 99% for taxpayer data, with two departments reporting match rates of around 85% and 89%. This allowed for the identification of approximately 2 million individuals who were flagged for potential tax evasion, greatly enhancing the tax compliance efforts and contributing to increased government revenue.