Principles for the Connected Application

Luke Ambrosetti
5 min readJan 19, 2022

--

For the past decade, SaaS providers have made big leaps to help customers sync their data to their multi-tenant data warehouse. We are far beyond the days of FTP. Now, customers are starting to say goodbye to self-managed API integrations and hello to ETL integration services (e.g. Fivetran, Stitch) and Reverse-ETL providers (e.g. Hightouch, Census). While these new tools have greatly reduced data friction, they haven’t quite solved it. A better solution would be to eliminate ETLs (reverse or otherwise) altogether.

Over the last few months, there’s been quite the buzz over a “new” B2B SaaS architecture. (It’s not that new, but we’ll get to that another time.) Here is a great write-up from Patrick Chase, Partner at Redpoint Ventures, that has generated awareness about the topic. Until recently, I’ve heard it mostly referred to as the Hybrid-SaaS or No-ETL architecture, but Snowflake has popularized the name Connected Application, or connected-app for short.

What is a Connected Application?

While this model works for any data warehouse, Snowflake gives a great introduction to what they see as a connected-app in a two-part blog post where they talk about the many benefits of the architecture. This section will be a little bit redundant if you’ve already read those, but I feel it’s necessary to reiterate parts of it here as well.

Managed Applications

First, let’s look at the relationship between the Customer and Provider (i.e. SaaS application) work in a modern data stack today.

Currently, customers have to sync data back and forth with their providers. While there are many newer tools out there to help with both sides of the ETL process, the customer’s data is effectively copied to the provider’s multi-tenant data warehouse in a schema that the provider’s app understands. With both ETL processes, the burden is on the customer to sync the data in both directions.

Connected Applications

A connected-app is a SaaS application where the Provider connects to their Customers’ internal data warehouse directly instead of loading data into the Provider’s managed data platform. This effectively removes both the ETL and the Reverse-ETL processes.

This model removes the customer’s burden of syncing data between environments. It helps remove data silos, and keeps the customer’s data securely in their environment. With a managed application, the provider’s multi-tenant database can also be difficult to scale for performance. This model lets the customer decide on how fast they want to go — after all, they’re the one paying the compute costs!

Principles for the Connected-App

The following are suggested principles when building a connected-app and are certainly up for debate! There are also many ways to implement a connected-app, and some implementations may not need to worry about 1–2 of the principles below.

The Customer Data Warehouse is the Source of Truth

This doesn’t mean another data store isn’t leveraged for data processing. It just means that the customer is working from their own data instead of the synced data that the provider has. In fact, many providers will likely use a transactional application database or even data lake as an intermediary data store. This is the most important principle — the others are based off of it!

The connected-app is schema agnostic

One of the biggest challenges of the managed application model is fitting the customers schema within the provider’s multi-tenant platform. If the customer is going to bring their own data warehouse, then they need to bring their own data models as well. This doesn’t mean that the provider can’t have some requirements, but if they do, they need to give you the tools to help fit them.

One solution would be to provide a way to store prepared statements that the customer can define/customize. Another might be to allow the user to create application-managed views on the customer’s data warehouse.

Forcing the customer to put their own data into a provider’s custom schema is unnecessary data duplication. The one exception to this principle is when the provider gives data back to the customer.

The provider must seamlessly provide data back to the customer’s data warehouse

Just like extracting data, if the provider is providing data back to the customer, then it must do so without putting the burden on the customer. Ideally, data is loaded continuously and as close to real-time as possible; however, that might not be necessary in many use-cases. Overall, the customer should be able to dictate how often data is loaded.

Snowflake leads the way in this category with their Secure Data Share, and I expect we’ll see this soon with other modern data warehouses as well. Until then, using each data warehouse’s preferred loading method will need to suffice.

The application must ensure data security

In the connected-app model, data security is both a tremendous advantage and challenge. While the customer’s data is kept safely within the data warehouse, the application must provide the necessary governance and controls to keep it that way.

The customer should have full control over the following (and more):

  • What data can the provider’s application access?
  • Who has access to the data from the application?
  • What databases, schemas, and tables can the provider write to?
  • How long, if at all, can the provider store the customer’s data?

There is some expectation that customers will need to implement portions of this on the data warehouse itself. As an example, for automated tasks / jobs, a service account should be provisioned for the application.

If implemented correctly, data security is one of the primary benefits of the connected-app model.

Next: Connected-App Methods and Examples

While working at MessageGears, I’ve seen first-hand how the connected-app model is game changing; however, it hasn’t been perfected yet and it has a lot of growing-up to do. In the next post, we’ll look at some examples of existing connected-apps, how their architecture fits the principles, and what challenges they have yet to solve.

Special thanks to Dan Roy for his input on this post! If you want to learn more of how Dan initially came up with his model for the connected-app over a decade ago, you can hear more on the IN GEAR podcast.

--

--

Luke Ambrosetti

Partner Solutions Engineer @ Snowflake. data apps + martech. sweet tea and fried chicken connoisseur. drummer’s syndrome survivor.