x

    Decentralizing Data: Principles and Practices of Data Mesh

    • LinkedIn
    • Twitter
    • Copy
    • |
    • Shares 0
    • Reads 357
    Author
    • Ali kidwaiContent Architect
      The goal is to turn data into information, and information into insights.
    15-February-2023
    Featured
    • Data Analytics
    • Data Management
    • Data Warehouse

    Editor's Note: Understand how Data Mesh Architectures can remove the roadblocks from the central data team and bring a decentralized approach that enables teams to perform cross-domain data analysis themselves.

    Introduction

    According to Gartner, only 20% of data analytics projects will deliver business outputs. Indeed, given that the current data architectures need to be better equipped to handle data's ubiquitous and increasingly challenging interconnected nature, this is not surprising. So, in a bid to address this problem, the question in every company's boardroom remains — how can businesses properly build data architecture to intensify data efficiency for the growing complexity of data and its use cases?

    First defined in 2018 by Zhamak Dehghani, the data mesh concept is the latest approach to enterprise data architecture aiming to address traditional data platforms' pitfalls. Companies seeking a data architecture to meet their ever-changing data use cases should consider the data mesh architecture to power their business workloads and analytics.

    What Is a Data Mesh?

    Data Mesh is a relatively new approach to managing data in large, complex organizations. In simple terms, it involves breaking down data silos and organizing data in a more decentralized and collaborative way.

    The idea behind Data Mesh is that data should be treated as a product, with ownership and responsibility distributed across the organization. Instead of having a central team manage all the data for the organization, each team or department is responsible for the data they create and use.

    For example, let's say you work for a large e-commerce company. In the traditional approach, there might be a central data team responsible for managing all the customer data, product data, and transaction data for the entire organization. But with a Data Mesh approach, each department would be responsible for managing the data related to their specific area.

    So the customer service department would be responsible for managing customer data, the product team would be responsible for managing product data, and the finance team would be responsible for managing transaction data. Each of these teams would create their own data products, such as a customer database or a product catalog, and make those products available to other teams within the organization.

    By breaking down data silos and distributing ownership and responsibility, the idea is that data can be more easily managed and shared across the organization, which can lead to better collaboration, faster decision-making, and more innovation.

    Want to get value from data at scale?

    We help you deploy enterprise class solutions for your critical data to unlock the true potential of enterprise data.

    How does a data mesh is different from traditional data architectures?

    Here's the table below that compares the features of traditional data management platforms to data mesh architectures.

    Traditional Data Management Platforms Data Mesh Architecture
    Serve a centralized data team that supports numerous domains Serve autonomous domain teams
    Manage code, policies, and data as a single unit Manage code and pipelines independently
    Needs separate stacks for operational and analytical workloads Offers a single platform for operational and analytic workloads
    Centralize the platform for optimized control Decentralize the platform for optimized scale
    Force domain awareness Remain domain-agnostic


    Rather than taking data as a centralized repository, a data mesh's decentralized nature allows data ownership to domain-specific teams that control, manage, and deliver data as a product, allowing easy accessibility and interconnectivity of data across the business.

    Today numerous organization' data use cases can be split into analytical and operational data. Operational data represents data from the organization's applications' day-to-day operations. For example, using an e-commerce store will mean customer, transaction, and inventory data. This operational data type is generally stored in databases and utilized by developers to create various APIs and microservices to power business applications.

    Operational vs. analytical data plane

    On the other side, analytical data represents historical enterprise data utilized to enhance business decisions. In e-commerce stores for instance, analytical data answers ques such as "how many consumers have ordered this product in the last few years?" or "what products are customers likely to purchase in the winter season?" Analytical data is typically transported from multiple operational databases using ETL techniques to centralized data stores such as - warehouses and data lakes. Data analysts and scientists utilize it to power their analytics workloads, and marketing and product teams can make impactful decisions with the data.

    data mesh Architecture


    A data mesh approach understands the core difference between the two broad types of data and tries to connect these two types of data under a different structure — a decentralized approach to data management.

    Have a look at Microsoft's offering Azure Synapse Analytics, along with both Data Lake Storage Gen2 and SQL database, as the central components for implementing a data mesh architecture.

    Data Lake Storage database

    Core Principles of a Data Mesh Approach

    A data mesh framework is based on four principles that change the way data analytics are enabled in the enterprise:

    four  data mesh principle


    Domain-driven ownership and architecture

    The foremost principle of a data mesh is shifting the power of ownership and data into the hands of the domain teams. They own the data end to end—from ensuring they have the apt sources or ingested data to work with, to maintaining and building any processing pipelines mandatory, to serving the data out for other domain teams to tap into as products (more on that later) with the excellent quality guarantees and governance controls in place. The domain teams can be defined by the department, business unit, or other similarly motivated groupings. If they are correctly implemented, new domain teams should be able to be added fluidly and significantly when data is being correlated into new data products.

    Data as a product

    As alluded to in the first principle, domain teams are not just responsible for the data and the resulting data products. And data products require to be treated like any other product. Consumers and other domain teams need to discover and use data products. The domain owner is responsible for maintaining and updating (or deprecating) these products to make sure accuracy and quality. What can this look like in practice? Imagine a supply chain team curating an inventory data product that marketing folks can tap to develop new discount campaigns or that can be utilized by regional teams for placing new orders.

    Self-service infrastructure

    The third principle is making all this self-service easy for the domain teams. Complex technologies and niche skills are not sustainable in a data mesh design. There requirement to be a common platform and set of tools that any domain team can indulge in to serve and build their data products without getting bogged down in infrastructure resource or maintenance limitations.

    Federated governance

    The last piece of a successful data mesh is governance. A data mesh architecture can't come at the expense of access controls and data protection. There must be a balance between global governance policies and controls and ensure each domain team can define and implement these policies when developing and sharing their data products. This federated governance is critical for ensuring data privacy and compliance and aiding discovery at scale.

    Combined, these principles enable a decentralized yet robust and extensive data framework designed to deliver business outcomes. The resulting analytical data architecture and operating model treat data as a product that can be owned by the teams with the most intimate knowledge of their consumption and analytics needs.

    Now, have a look at some of the significant benefits of Data Mesh in the image below.

    benefits of Data Mesh

    Best Practices for Implementing a Data Mesh

    Implementing a data mesh can effectively improve your organization's access to and control of data. However, it is crucial to consider the best practices for implementation to ensure you get the most out of this technology.

    • Start with business capabilities: Identify the business capabilities and their data needs first, and design the data mesh architecture around them.
    • Decentralize ownership: Assign data ownership to individual teams, who are responsible for the data quality, security, and governance.
    • Use domain-driven design: Apply domain-driven design principles to create domain-specific data products, which can be consumed by other teams.
    • Use APIs for data products: Create APIs for data products to make them easily discoverable and consumable.
    • Emphasize data quality: Ensure that the data is of high quality, by implementing data quality checks, monitoring data pipelines, and providing feedback to data producers.
    • Use a data catalog: Implement a centralized data catalog to provide a single source of truth for data discovery. Implement data security: Implement data security practices to ensure that sensitive data is protected from unauthorized access.
    • Implement data governance: Implement data governance practices to ensure compliance with regulations and standards.
    • Embrace cloud-native technologies: Use cloud-native technologies to enable scalability, reliability, and agility.
    • Foster a data-driven culture: Encourage a data-driven culture, where data is used to drive decision-making, and where data is considered a valuable asset.

    Hence, by following these best practices for implementing a data mesh, you can make sure that your company has the right infrastructure, goals, and needs to build an effective data mesh solution.

    How we can help you transform and grow?

    Our approach can help you to ingest, enrich, transform, and serve data via a centralized platform.

    Use cases of a data Mesh

    While distributed data mesh architectures are still gaining adoption, they're helping teams attain scalability goals for common big data use cases. These include:

    Business intelligence dashboards: As new initiatives arise; teams commonly require customized data views to understand the performance of these projects. Data mesh architectures can support this need for flexibility and customization by making data more available to data consumers.

    Customer experience: Customer data allows businesses to understand their users better, allowing them to provide more personalized experiences. This has been observed in various industries, from marketing to healthcare.

    Machine learning projects: By standardizing domain-agnostic data, data scientists can more easily stitch together data from numerous data sources, reducing the time spent on data processing. This time can accelerate the number of models which move into a production environment, allowing the achievement of automation goals.

    As businesses collect, store, and analyze more data, decentralized data ownership becomes evident. Ultimately, putting data back into the hands of those who understand is the way to go. Developing a distributed self-service architecture with centralized governance necessitates ongoing dedication.

    Transitioning from a monolithic to a distributed model necessitates enterprise reorganization and cultural shifts. However, the effort is worthwhile because this transformative paradigm has the potential to boost an enterprise's data-centric vision.

    Polestar Solutions is a leading IT consulting firm that has helped numerous organizations to transform their data operations. Data generation is increasing exponentially, and access to monolithic data sets becomes harder with time. Book a consultation today!

    About Author

    data mesh best practices principle and use ases
    Ali kidwai

    Content Architect

    The goal is to turn data into information, and information into insights.

    Generally Talks About

    • Data Analytics
    • Data Management
    • Data Warehouse

    Related blog