Entity Scan: Automating secure account and travel verifications
Project Overview
Entity Scan is a platform providing data on wanted individuals globally, used by governments and financial institutions for verification. It offers name-based and image-based searches for quick, reliable results. I contributed to backend development, focusing on web scraping, API integrations, and data workflows.
Problem Statement
Before Entity Scan, there was no centralized platform capable of providing quick, detailed searches on wanted individuals across multiple sources. Existing systems struggled to handle large data sets, which hindered real-time verification and the tracking of individuals using diverse search criteria like images.
Key Findings
- Real-Time Data Retrieval: There is a strong need for accurate, real-time data to help authorities and institutions conduct security checks.
- Data Automation Needs: Managing large data volumes requires automation to ensure efficiency and reduce manual workload.
- Image-Based Search Demand: Users increasingly seek the ability to conduct searches using both text and images, enhancing the flexibility of the platform’s search functionality.
Implemented Solution
To effectively address these requirements, the following solutions were implemented:
-
Automated Web Scraping:
Developed automated scrapers using Scrapy and Scrapyd to collect fresh data daily from trusted sources, ensuring that the platform had the most up-to-date information available.
-
Backend API Development:
Built RESTful APIs with Django REST Framework to support real-time, secure data retrieval for both governmental agencies and financial institutions.
-
Scraper Monitoring Portal:
Designed a Django-based portal to monitor scraper activity, ensuring smooth operations with real-time status updates and minimal downtime.
-
ETL Pipelines:
Developed efficient ETL pipelines with Pandas to process and clean scraped data, automating email reports for high-priority updates.
-
Database Management:
Utilized PostgreSQL for robust data storage and management, ensuring fast and accurate querying, especially for image-based searches.
Results
Entity Scan has become a crucial tool for governmental and financial institutions, enabling streamlined compliance checks for account openings and travel authorizations. The platform’s automated scraping system ensures real-time data updates with minimal manual intervention, while the monitoring portal provides operational transparency. The ETL pipelines powered by Pandas help transform and deliver data seamlessly, allowing institutions to access vital information promptly.