Increase Tax Base: Boosting government revenue by identifying non-filers through structured, integrated, and streamlined data
Project Overview
This project aimed to support government efforts in reducing tax evasion by leveraging data analytics to identify individuals eligible to pay taxes but not filing. By integrating datasets from multiple taxation and civil departments, the initiative focused on improving tax compliance, expanding the tax base, and ultimately increasing public revenue. The project involved extensive data engineering to unify, cleanse, and transform fragmented datasets for reliable analysis and decision-making.
Problem Statement
The challenge was obtaining accurate data from various government departments to identify non-filers, aiming for a 75% match with taxable individuals. Issues included incomplete and erroneous data, requiring extensive cleaning and transformation for analysis.
Key Findings
- Limited Access to Accurate Government Data: The fragmented nature and inaccessibility of reliable data across departments posed a fundamental obstacle in detecting individuals and entities who were eligible but not filing taxes.
- Poor Data Quality from Source Systems: Raw data often included duplicates, missing fields, and inconsistent formats—demanding rigorous data cleaning and transformation for actionable insights.
- Complex Entity Identification Across Sources: Matching individuals and businesses across different datasets required intelligent modeling to assign unique identifiers and trace relationships between scattered data points.
Implemented Solution
To tackle these challenges, the project utilized a structured and comprehensive data approach:
-
Initial Loading and Filtering:
Filtered out invalid or incomplete records before any transformation took place, ensuring type-safe conversions and removing data that would distort analytics.
-
Data Processing:
Applied advanced data cleaning techniques, assigned unique identifiers to track entities, established cross-source relationships, and performed necessary data transformations for analysis readiness.
-
Entity Mapping:
Mapped processed entities into a unified data model, normalizing discrepancies across departments and standardizing attributes for accurate categorization of non-filers.
Results
The initiative exceeded expectations by achieving match rates between 97% and 99% for most departments, with two additional departments reporting strong matches at 85% and 89%. As a result, approximately 2 million individuals were flagged for further tax compliance investigation. These findings significantly empowered tax enforcement units, allowing governments to pursue non-filers with precision. The project not only strengthened compliance frameworks but also paved the way for long-term increases in public revenue through targeted outreach and policy reinforcement.