How We Work in Big Data & Data Science
Improving user engagement is our number one priority while developing our products. We need to understand the interests of our users, their habits and how they interact with the product to give the users what they want and keep them engaged. It is not an easy task to accomplish, but it is definitely an essential factor in building a product that users love. We listen to our users all the time by analyzing their behaviors and generating insights to see what works and what doesn't.
Delivering on this promise requires a two-fold approach: (i) Building the data pipeline and SDKs to do a lot of heavy lifting for our products when it comes to data collection and the processing of this data, and (ii) democratizing the data access within the company to enable everyone to slice & dice the data we accumulate.
Our Big Data team is responsible for building end-to-end data pipelines and delivering the processed data to the target services so that our Data Science team can enable everyone to generate actionable insights.
However, technology itself isn't enough to create impact: embracing this technology is vitally important. Luckily, we’ve got a team with a never-ending appetite for learning and experimentation, which ultimately empowers a data driven culture.
Our approach to building data services is simple. Technology for the sake of technology is meaningless. Technology is most effective when it is purpose-driven; hence it should be targeted towards a purpose. We always aim to maximize the value we create when developing new services and that's how we make technology truly work for our users.
As our user base grows, the volume of data increases at an ever-increasing rate. Our big data platform has evolved to keep pace. We currently collect billions of events and five terabytes of behavioral data each day. Our architecture consists of mobile SDKs, APIs, Go/Java/Python based services and a data warehouse to store and process close to a petabyte of user data. Our Data Science team creates data models and uses visualization tools to make it accessible by both technical and non-technical people.
We have several practices to keep the data flowing:
- All of our services are designed to be fault tolerant and highly available. Every aspect of the infrastructure is built with failure in mind, because at this scale anything can fail.
- Our software is verified by automated tests and deployed by continuous deployment tools. It is important to have a reproducible testing and deployment pipeline to ensure we deploy our services to production numerous times in a day without being afraid of breaking things.
- After the code is deployed we monitor our services to ensure that everything is still running smoothly. We have an extensive monitoring approach to measure the performance of our services and keep track of failures happening at any part of the pipeline.
- Once the data arrives its final destination, which is our data warehouse, it is ready for further processing by the Data Science team. At this stage, we create necessary data models and crunch our data down to bite sized, easily digestible data layers to turn this enormous volume into meaningful insights, along with heavily customizable dashboards where our teams can break the data down in the way they like.
Needless to say, this is only the tip of the iceberg. We always assess and reflect on our previous decisions to improve our architecture; because our products, users and the technology itself are always changing. We are a part of that change too.