#65 What's a Data Contract Between Friends - Setting Expectations with Data Contracts - Interview w/ Abe Gong
Data Mesh Radio - A podcast by Data as a Product Podcast Network
Categorie:
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center hereAbe's Twitter: @AbeGong / https://twitter.com/AbeGongAbe's LinkedIn: https://www.linkedin.com/in/abe-gong-8a77034/Great Expectations Community Page: https://greatexpectations.io/communityIn this episode, Scott interviewed Abe Gong, the co-creator Great Expectations (an open source data quality / monitoring / observability tool) and co-founder/CEO of Superconductive. One caveat before jumping in is that Abe is passionate about the topic and has created tooling to help address it. So try to view Abe's discussion of Great Expectations as an approach rather than a commercial for the project/product.To start the conversation, Abe shared some of his background experience living the pain of unexpected upstream data changes causing data chaos / lots of work to recover from and adapt. Part of where we need to get to using something like data contracts is to remove the need to recover in addition to adapting and move towards controlled/expected adaptation. Abe believes that the best framing for data contracts is to think about them as a set of expectations.To define expectations here, this would include not just schema but also the content of data, such as value ranges/types/distributions/relationships across tables/etc. So for instance, a column may be a one to five for rankings and then the application team changes it one to 10. The schema may not be broken - it is still passing whole numbers - but the new range is not within expectations so the contract is broken.At current, Abe sees the best way to not break social expectations is via getting consumers and producers in a meeting to talk about the upcoming changes and prepare, such as with versioning. But, as tooling improves, Abe sees a world where we won't even need a lot of those meetings going forward - either because data pipelines can be "self-healing" and automatically adapt to changes upstream or because metadata and tools for context-sharing will reduce the need for meetings.Abe sees two distinct use cases in general for data contracts or more specifically