How Patreon Enhances Decision Making with SDF

How an industry leader simplifies database migration with SDF column level lineage. Patreon now has detailed insights within seconds, rather than tedious manual analysis of legacy queries.

Growing Customer Base, Lack of Data Visibility

Returning to the office early 2024, a time when most employees hope to start the new year without issues, Patreon experienced a mismatch between their realized customer signups and reported statistics from their data warehouse. While the platform had been seeing signups from creators and fans, their data warehouse incorrectly reflected the opposite: very few new creators and minimal signups from fans.

“Our internal data team saw their metrics dropping over time. It wasn’t so large that it was immediately obvious, but it was going down gradually.”
- Data Engineer, Patreon

Driven by a concern over the dissonance between realized and expected signup metrics, Patreon assigned team members to investigate this anomaly. The investigation and resolution effort persisted, taking up valuable resources and time.

When multiple data sets were found to not be updating properly, rendering them stale, more staff were brought into the resolution effort to attempt to resolve this issue. When Mark*, a data engineer, was added to the resolution group after several weeks of investigation, he realized he knew that SDF could easily identify the downstream impact of Patreon’s data quality issue.

Mark had recently integrated SDF into Patreon’s data stack and knew SDF was a state-of-the-art SQL engine with column level lineage, bringing information flow theory out of theory and into practice. SDF allows programmers and analysts to view their data warehouses with an unprecedented degree of granularity. This lets engineers easily identify problem areas and format solutions.

*Name changed for privacy preferences.

Power of SDF, Delivering Confidence in Decisions

After a series of investigations, hypotheses, and discussions, the Patreon team pulled in their data engineering team that had just completed integrating SDF into their data stack.

“We had an issue with stale data last week where datasets were not updating, and we knew that we needed to get this resolved”
- Data Engineer, Patreon

Patreon’s data team inferred a bug was introduced while migrating a table from one data warehouse to a new data lake. Every migration at scale can lead to issues, especially with data sinks from multiple sources listening to different partitions and tables.

Patreon originally turned to SDF Labs to provide visualized data lineage of their queries, jobs, and tables. SDF works by ingesting query logical plans to visualize column-level lineage within an easy to access web-based console.

With searchable lineage and dependency data maps, Patreon team members are able to visualize how data flows through their data warehouse within SDF Cloud.

The data team used SDF Cloud to identify which downstream tables were impacted as a byproduct of the bug they had already identified. Within seconds of searching the column-level lineage, the team was able to output a report to the wider group of stakeholders.

“With SDF, it took me seconds to find the broken tables and we had downstream impact identified immediately.”
- Data Engineer, Patreon

Conclusion

SDF not only brings immediate value by shifting left but also brings multiple data quality and data governance benefits. With the SDF and SDF Cloud, teams can rapidly identify issues based on dependencies and export reports for action.

As the Patreon team continues its migration and grows its data warehouse, they aim to further utilize SDF for their governance needs to be able to classify and understand their data.

A Creator Company that Lives on Data

Patreon provides creators to share their work and form communities with their fans, allowing them to turn their passions into lasting creative businesses. By providing a direct line to their communities, creators never have to worry about ads or algorithms getting in between them and their fans.

Patreon has amassed a complex stack of data products and tools that are used to manage, maintain, and secure the data warehouse. The team approached SDF Labs in 2024 to help solve data quality and data governance problems, as well as better understand their column level lineage.

Book a Demo