Editor’s note: This interview with Zhamak Dehghani was recorded for Coding Over Cocktails - a podcast by Lonti previously known as TORO Cloud.
As the industry’s ambitions on analytical data processing matures, more and more machine learning models emerge. To catch up, several businesses arm themselves with intelligent technologies and take on more data-driven approaches to thrive.
This is good, until your centralised data model begins to fall apart because it suddenly can’t scale to the various sources of data and diversity of consumption models implemented.
Historically, the only solution that was presented to address scale was to use a data warehouse, allowing users to make complex queries about the data therein.
But Zhamak Dehghani wanted to find a better way – and the concept of the distributed data mesh was born.
"The data mesh, at heart, tries to solve the problem of scale. If you fast forward life 15 or 20 years down the track in the future and imagine that everything that we do is having augmented intelligence, using data and the data that feeds those models can come from anywhere, any source on the planet – then it just doesn't make sense to have centralised solutions." Dehghani explains during an episode of Coding Over Cocktails, a podcast by TORO Cloud.
Reason 1: The centraliSed data model may no longer work for you
Dehghani, who is currently the Director of Next Tech Incubation at ThoughtWorks, shares how they discovered that the centralised data model within enterprises was no longer working at the micro level.
"You are dealing with a decentralised distributed system problem, dealing with problems of siloing of the data, dealing with inconsistencies and lack of integration between that data, problem of who's owning what and accountability structure."
She then explains how data mesh principles practically address these problems, which come about when users decouple and decompose the centralised analytical data solutions.
"The rest of the data mesh principles address problems that arise once you decouple and decompose the centralised analytical data solutions based on domains and the ownership of the modeling, including the data itself within those domains" she elaborates.
Reason 2: You want to delight the experience of your consumers around your data
Dehghani argues that domain teams need to change and shift their thinking from "Data is an asset that I’m collecting" to "Data is a product that I’m serving".
This method of "product thinking" will allow a domain team to try and promote the value of their data in a way that caters to the needs of their data consuming users.
The goal, she says, should be "delighting the experience of the consumer around it".
"And then, perhaps to really enable those domains and not incur an exponential cost of building up infrastructure, perhaps we need to have self-serve infrastructure with a different mindset that we really want to bring the level of complexity and the cost of running and observing and monitoring these data products low." she adds.
Reason 3: You’re looking to implement a strong sense of domain-ownership within your organization
A data mesh would require an organisation to implement a governance model that works with a distributed data system rather than a centralised one.
This allows for domain teams to be more agile and have more independence in modelling the data in a way that makes sense within their particular domain.
Dehghani states that the governance model would serve "the common good" while finding equilibrium between the autonomy of domains. However, she notes that to successfully implement this interoperability, domains should adhere to some form of "enforcing policies", which would be applied "at the level of each data product".
"So, what is that mechanism? What are those, you know, levers that we can put in a governance model that finds that equilibrium between centralisation and decentralisation of policy, decision-making and enforcement of those policies? And hence, the kind of computational and federated model of governance came to exist." she said.
When asked whether this concept of domain-ownership and productivising data is an extension of the microservice architecture, Dehghani replies that, if you look at the origin of microservices and domain-driven-design, it strongly points to an extension of that model.
"We need a new kind of a structure that extends that microservice or set of microservices within that domain, whose responsibility is exposing this analytical data from that domain, that the domain itself can use to train its machine learning, or a review of its reports, but also other domains can use it so it's interoperable with others."
"I think that's the space for innovation and that's the white space. I have an opinion about what it should look like, but that's just the first revision of implementation of this data product, and it's contained around it, given the utility layer of technologies that we have today. And I really think and I hope that this is just the beginning of a series of innovations around that." she explains.
Reason 4: You want better security for your data products
Dehghani admits that the data mesh may be a very uncomfortable space for people that come from traditional big data.
Because of the centralised nature of most data centres, you can get by with security by parameter or putting guardrails around the body of the data or accounts that can access them.
With a data mesh, however, "you can secure, not only every single data product, but also monitor and change those security parameters in an automated fashion." Dehghani explains.
"And then if we create this new concept of the product, which is beyond just the data itself. Yes, it has pointers to the way that data is stored and it has the management of that storage, but it also has mechanisms for policy configuration and policy enforcement. Then, the sky’s the limit. You can just keep adding new policies to that." she furthers.
Reason 5: You want more scale and speed
Finally, Dehghani shares how her clients who have implemented a data mesh have experienced improved scalability and speed.
"Initially these projects take quite a little bit of time to get to the point that you can see the scale and speed, because as I said, there is no solution to go by off the shelf. And I don't think that there will be a solution off the shelf. This is an ecosystem play, and there needs to be a combination of interoperable solutions coming together to really raise the bar." she said.
Dehghani shares her experience with her clients who started two years ago, on how they started from scratch.
"So, we had to build a lot of stuff and that takes time. And I was extremely lucky to work with organisations that were, you know, technically ambitious and we could do this. They had the investment, they have commitment to their data strategy. And part of that data strategy was building this data mesh platform."
By allowing teams to configure and provision data products, they would no longer have to rely on a single team for this.
"You’re not dependent on the data platform team, or BI team or whatever team that is. You can just start these new teams around the group of data products. And they can go off and use the platform to bootstrap themselves and start creating these data products and get them to the hands of the users." she explains, adding that this was truly helpful when COVID hit.
"One of our clients was able to react quickly, as they provided additional services to their members around consulting COVID patients, and so on, through chatbots. We had, very quickly, set up new products, to capture those conversations and use them downstream to provide a better experience for their members." she shares.
You can catch more of this conversation by listening to the Coding Over Cocktails podcast featuring Zhamak Dehghani on major podcast platforms or by visiting torocloud.com/podcast.
This podcast series tackles issues faced by enterprises as they manage the process of digital transformation, application integration, low-code application development, data management, and business process automation. It’s available for streaming in most major podcast platforms, including Spotify, Apple, Google Podcasts, SoundCloud, and Stitcher.