Entity Scan: Secure, data-driven platform for real-time identity checks and compliance in finance and border control
Project Overview
Entity Scan is a platform providing data on wanted individuals globally, used by governments and financial institutions for verification. It offers name-based and image-based searches for quick, reliable results. I contributed to backend development, focusing on web scraping, API integrations, and data workflows.
Problem Statement
Before Entity Scan, there was no centralized platform capable of providing quick, detailed searches on wanted individuals across multiple sources. Existing systems struggled to handle large data sets, which hindered real-time verification and the tracking of individuals using diverse search criteria like images.
Key Findings
- Real-Time Data Retrieval: Security teams required a system that could deliver reliable, real-time information to conduct critical checks without delays or manual bottlenecks.
- Data Automation Needs: Handling vast, continuously growing datasets called for scalable automation to minimise human intervention and ensure operational efficiency.
- Image-Based Search Demand: There was a rising demand for hybrid search capabilities—enabling searches through both textual data and image inputs, significantly boosting investigative flexibility.
Implemented Solution
To effectively address these requirements, the following solutions were implemented:
-
Automated Web Scraping:
Utilised Scrapy and Scrapyd to develop automated scrapers that retrieve updated data daily from multiple trusted public sources, ensuring freshness and accuracy of intelligence.
-
Backend API Development:
Built secure RESTful APIs using Django REST Framework, offering real-time data access to authorised institutions including law enforcement and financial regulators.
-
Scraper Monitoring Portal:
Created a web-based admin dashboard with Django to track scraper activity, status, and logs in real-time—ensuring system uptime and proactive issue resolution.
-
ETL Pipelines:
Implemented Pandas-based ETL pipelines to clean, structure, and process scraped data efficiently. High-priority data triggers automated email alerts for instant response.
-
Database Management:
Integrated PostgreSQL for high-performance querying, supporting both standard searches and advanced image-based lookups to meet diverse verification needs.
Results
Entity Scan transformed how financial and governmental bodies conduct security and compliance checks. With automated data scraping and streamlined ETL pipelines powered by Pandas, the platform ensures that information is always current and trustworthy. Clients benefit from instant access to global verification data, significantly reducing manual overhead and accelerating onboarding and approval workflows. The addition of an operations dashboard further enhances transparency and monitoring—positioning Entity Scan as an essential asset for risk management and regulatory compliance worldwide.