,allowExpansion)
A new paradigm for Data Management: Data Mesh
Design a modern data platform at scale based on the decentralization of Data Governance
The key features
Why apply Data Mesh
The data mesh can help organizations whose internal business areas are highly decentralized, as the data mesh structure allows different teams to manage their own data and provide the rest of the organization with only quality data as a product.
The implementation of a Data Mesh approach allows domain teams to have greater independence in modeling and governing data while still adhering to policies applied to the global level.
Having independent teams and data domains makes it possible to create data products faster to offer to the business.
With data mesh, data is the responsibility of a single node, and it is possible to protect its content and monitor its usage.
The lifecycle of a data product adheres to agile principles of being short and iterative, to provide quick and incremental value to data consumers.
Our approach
1 - Make the data addressable and easily identifiable
In a data mesh solution, data access must be standardized with common rules across various domains so that data is easily accessible. In the case of data stored in a data lake, it is recommended that these be accessible via REST API which must all have the same format. In the case of data stored within a database, schemas and views can be defined according to standardized naming conventions. The data platform team will handle this phase, still using a centralized approach.
2 - Use metadata and data catalog (discoverability)
Improve the metadata and add a data catalog for their discovery, so that anyone can purchase any product of the organization. A point is needed to search, discover, and "purchase" the data within the company. A way to request access and grant access to data products in a manner that is usable by data owners and consumers without the involvement of a central team is also necessary. At this stage, work is being done on the features of data products, adding tests for data quality, lineage, monitoring, etc.
3 - Implement the domain driven design by breaking down the monolithic approach
It is necessary to try to assign ownership to the domain team that creates the data, moving to a de-centralized. Each team must own its own data resources, ETL pipelines, quality, testing, etc. It is still necessary to rely on a federated governance for the standardization, security, and interoperability of the data. If all of this has already been done, it is advisable to build these functionalities as services to create a self-service platform. At this stage, DataOps practices can be introduced and observability and self-service capabilities can be improved.
Which components to use
The most common way to store data in a Data Mesh is the data lake where data products are addressable via URL and it is possible to manage control of, versioning, encryption, metadata, and observability. Another option is to use data warehouses; modern data warehouses have improved significantly and, thanks to Serverless Analytics, require no maintenance, they can simply be used without worrying about the operational burden.
In a Data Mesh, data processing is encapsulated within data products. There are no central ETL pipelines, but rather hidden within each domain so that the same tools already in use can be utilized. Important activities such as data quality, data cleaning, and data processing are still present, but are no longer centralized; instead, they are managed by different teams with federated governance.
A data catalog is an organized inventory of the data resources available within the organization. It uses metadata to help organizations manage their data. Additionally, it assists data professionals in collecting, organizing, accessing, and enriching metadata to support data discovery and governance.
The data stored by each domain must be queryable in order to create aggregated products regardless of the technology used to store the data. There can be a fast data level in a relational database and historical data in a data lake storage. However, data consumers should query them without being aware of the implementation details.
Data analysts are the most common data consumers, they tend to be less technical and use different tools. BI tools are extensively used within the organization to gain insights and make better decisions.