Our challenge: Develop tools that ingests data streams from multiple sources and deploy dashboards for different stakeholders that delivers insights into real time piracy tracking of telugu films.
Project Miaro is anti piracy platform developed for the Indian Film Industry as part of an effort to establish an Anti Piracy Grid. It has tools to automate a number of tasks include data collection both automated and manual , providing stakeholder specific dashboards and supported workflows to enable them to take effective actions like evidence gathering, intiating user deterrence practices, inform infringement to ISPs & law enforcement agencies.
We developed an automated crawling engine to monitor online conversations on websites which were red listed for active piracy, integrated with different platforms like facebook and youtube to discover pirated content. The different sources of information are ingested and dashboards were developed using Microsoft’s Power BI. We used Neo 4J to build semantic graphs that enabled us to create rapid APIs and dashboards.
Our engineering approach
Data can be available from different sources in different ways. Most of the times its either unstructured, differently structured or can differ in semantics. Data could also be available synchronously or asyncronously. At the core of Project Miaro was a data engineering problem. We developed a custom data ingestion tool and normalized data using ML.
At the core of our engineering is building for regression and scale. Regression enables true agility to add new features while not breaking existing software. Our approach uses the best of class opensource and cloud solutions to build the right solution which is robust at the same time cost optimized.
We use Atlassian JIRA to run our sprints(scrums) while using kanban for incremental updates.