push technology

Push Analytics — Getting Customers The Analytics They Need Without the Pains of ETL

Hana Mohan Last updated on September 26, 2024

In the world of Notifications, analytics is king. When sending a notification to your users, you would like to know things like:

  • Of all our users, how many received the notification?
  • How many users took action against the notification?
  • How many notifications do we send a day?
  • How do notifications impact the usage of our product?
  • Which channels had the best response rate?

The list feels endless, and the analytics questions MagicBell customers need to have answered vary as much as the type of notifications they send. Our customers require their data, and they need it in their data warehouse.

Keeping It Simple

MagicBell has succeeded in keeping our RESTful API minimal but full-featured. We accomplished this by staying very vigilant on not exposing too many RESTful API endpoints. So, how do we apply this same design approach to provide analytics data to our customers: a minimal but full-featured solution?

ETL Isn’t Simple

Extract Transform and Load (ETL) is inherently not straightforward. ETL, frankly, is not fun to manage. ETL is often brittle and expensive to maintain. Most ETL solutions have the user pull data from a data source which is cumbersome to set up.

Finding a solution for our customers that does not involve ETL enables MagicBell to continue with our vision of simple APIs. A push solution is preferred as there is a minimal effort on the part of the customer.

Compliance and PII

MagicBell takes data governance and security very seriously. Even for MagicBell Magicians, there are a lot of steps necessary to get even minimal access to our customer's data: we lock ourselves out.

So, here are some requirements we have to meet before we can even consider a solution to get our customer's data into their warehouse:

  • How do we get data to our customers while also staying SOC 2 compliant?
  • How do we assure a customer will only receive their data?
  • How do we continue to limit access to our database servers: servers with no public IP address?

Scaling

MagicBell handles a lot of notifications. It isn’t out of the question to have to synchronize millions of notifications initially and then continuously synchronize thousands of notifications per day per customer.

On top of the need to scale data synchronization, we also need the ability to scale the number of customers who need analytics data. Generally, for ETL, a user would be the consumer of one or more sources. In the case of MagicBell, we need to be the producer for many consumers.

The Requirements

Our primary requirements are:

  • We need something that is crazy easy for our customers to set up: we want the API to be easy to use.
  • We need to limit a customer’s access to only their data.
  • We need a solution that is SOC 2 compliant.
  • We need the ability to push analytics data to all of our enterprise customers who signed up for this feature.
  • We need something which handles schema changes and migrates all of those changes to our customers.
  • We need to support, at the very least, PostgreSQL and Snowflake as destination data warehouses.
  • We need a solution that lives in the same region as our database servers.

Buy Or Build

With requirements in hand, it’s now necessary to decide to buy or build.

Can we Build It?

Our initial thoughts for an MVP were to extend our existing APIs to provide analytics data based on the HTTP Accept type. For our fetch notifications API, an Accept type text/jsonl+ a creation time range would extract notifications from MagicBell, which customers could then load into their data warehouse.

However, calling a RESTful API puts a lot of unnecessary burden on our customers.

  • Our customers must keep track of when they last pulled the data.
  • Our customers must spin up compute resources to pull data at intervals.
  • Our customers must transform and then load the data into their warehouse.

On our end:

  • MagicBell needs to rate limit pulling of data.
  • MagicBell would need to support changes to our database schema and notify all of our customers of the changes.
  • MagicBell customers would then need to update their ETL solution.

Building a solution would not be easy.

Can we Buy It?

The build-it option wasn’t looking very easy. There are a lot of products available to support ETL. We looked into many of them (a process beyond the scope of this article), and we found one! We found Prequel (YC W21): an easy-to-use product that exactly matched our requirements – especially our rigorous security standards, including SOC2 certification.

Implementation Using Prequel

We don’t need to detail how to set up and use Prequel in this article as they document it well here. However, we needed to make changes to meet the above requirements: described as follows.

1) Bastion Host

MagicBell’s databases have no public IP address, so there is no way for Prequel to access our database directly. Prequel offers an on-premise solution, but we wanted to get something spun up quickly. So, we set up a bastion host with a single white-listed IP address.

2) Database Schema

We wanted to ensure that Prequel did not have access to any PII or any data other than the data our customers needed for analytics. We solved this by:

  • Creating an exportable schema.
  • Create a prequel_reader user who only had access to the exportable schema.
  • Created views in exportable that exposed only that data needed by our customers for analytics and only data that was not PII.
  • Every table synced needs to have a customer id which Prequel uses to filter rows of data. Most of the tables our customers needed did not have a customer id column. So, the views provided a customer column (via INNER JOINS).

The lines of SQL for this work were only around 120!

3) Some Prequel API Calls

With the bastion host setup and our exportable schema setup, we only needed to make an API call to Prequel to create a source.

MagicBell uses MagicBell to send out notifications, so to test, we set up a destination to our warehouse.

A call to test destination connection verified set up.

Finally, a call to begin transfer (not necessary as Prequel does start syncing on its own) resulted in close to 4 million notifications showing up in our warehouse!

Conclusion

Giving our customers the ability to do their analytics was paramount in helping our customers understand their notifications. We needed a crazy simple way to provide customers with their data while meeting strict SOC 2 compliance requirements. Building a solution in-house was not desired, and Prequel was the exact solution we needed.

The article and the above-described feature are created by Eric Hosick.