x

    Migrate and Modernize Your Data Estate: Modernize Before You Are Left in the Dust

    • LinkedIn
    • Twitter
    • Copy
    • |
    • Shares 0
    • Reads 222
    Author
    • David LeGrandSr. Vice President
    17-September-2025
    Featured
    • Data Engineering
    • Databricks

    The hype cycle for GenAI is so yesterday (where insights are gained through natural language responses, not in-depth manual research from possible irrelevant sources). Today the hype cycle is focused on Agentic AI (where actions combined with human guidance perform routine and structured tasks). But not so long ago the hype cycle was focused on obsoleting on-prem data infrastructures and moving workloads to the cloud.

    Why have we moved on from this topic? 

    Why talk about routine data engineering, the foundation of data science/advanced analytics/agentic AI, when you can talk about exciting profit optimization or top line sales uplift through business use cases that deliver customer life time value insights to retailers, or improving industrial productivity though predictive maintenance or improving commercial effectiveness of Life Science enterprises by generating real-world evidence.  Money talks!

    The reality behind cloud migration

    It's estimated that over the last seven years, cloud adoption has reached ~50-60%. While that percentage might look like progress moving to the cloud is well along, the data is misleading.  Let’s look at the contradictions: 

    • Numerous sources estimate that 50-60% of the IT workloads have moved to the cloud.

      But of that number, 60% is for data storage which produces no economic value.  AWS reported that only 21% of their recent revenue is related to AI related workloads.
    • Google (GCP) states that 98% of organizations are exploring GenAI (mostly chatbots) 

      But only 45% of large enterprises run data + AI workloads in the cloud due to the inertia caused by large complex legacy systems. Resulting in an entire segment with capability to leverage these productivity enhancing tools
    • The adoption rate of 50-60% might look solid 

      But greater than 70% of enterprises use some form of hybrid/multi-cloud, thus creating data silos as the hyperscalers resist data sharing and commonality.

    The solution? Continue to migrate/modernize to the cloud but do it with a Databricks-centric architecture. Let’s dive into our six points of view on this subject:

    • Preparation for migration or modernization – books have been written about this, but from lessons learned, starting with a plan with outcomes in mind where business use cases are known takes precedence over securing budget, building a core team, etc.   Assessing use case returns and how they will be measured over time is core to success.    To do this, look at the practical quantifiable benchmarks that represent today’s status quo.  The ability to measure and manage the baseline is fundamental to the accuracy of your modernization – if you can’t measure it today, advanced analytics won’t help you measure it tomorrow.
    • Data estate clean-up – Technical debt weighs heavily on most large enterprises considering that data + AI modernization (over the years) is done progressively.  Technical debt can be in many forms, but we see inefficient pipelines, high compute costs from full-table scans and long-running jobs, bloated storage for unmanaged intermediate data and significant engineering time spent on monitoring & fixing pipelines to be a large component.

    Databricks solutions to these challenges?  Delta Live Tables and now Lakeflow standardizes and automates pipelines by building declarative pipelines that simplifies logic , replacing ad hoc scripts;  it has built-in data quality that detects and isolate bad records automatically;  and leverages change data capture that reduces redundant batch jobs and monitors pipelines to track performance, failures and volume trends.

    So how to streamline pipeline modernization?

    Polestar Analytics augments Delta Live Tables with Data Nexus a low code – no code platform to design and orchestrate data pipelines with ease.  Data Nexus delivers composable data models with its automated data pipeline orchestration with visual workflow builder capability; version control integration for pipeline transformation; scaling capabilities to handle varying data volumes and processing requirements and a data dictionary-based schema for generative code enhancement.

    • Data Integration - The next step in preparing the target use case implementation is understanding your data sources incorporated into your data model necessary to power the use case, or is a capability needed to be developed?  Questions that we often see: “can I access this data from such sources as my hybrid architecture (GCP, Azure or AWS), S&P Global, SAP, or SFDC?”.  Databricks tackles this challenge with Lakehouse Federation or Delta Share by making data accessible avoiding vendor lock-in, easily sharing existing data from data sources, with no replication and governance. In addition, governance and linage control is handled by Unity Catalog.
    •   
    • Advanced Analytics Use Cases – remember the discussion about the hype curve? Many want to experiment with agentic solutions that automate processes or build machine learning based advanced analytics solutions, but is an agentic automated solution necessary, when a composable dashboard will suffice?

    Databricks AI/BI Genie is a conversational analytics tool within the Databricks AI/BI suite, allowing users—especially business users—to interact with their organization’s data using plain natural language, without needing SQL. Genie translates those queries into SQL, executes them, and returns results in text, tables, and visualizations. AI/BI. The Genie conversational chat-bot can answer Q&A style interactions like “How did our sales pipeline perform last year?”.

    Those that do want to move past dashboards ( smart or dumb), Databricks Mosaic is the platform to turn to.  Mosaic’s comprehensive suite of capabilities are:

    Model Training & Fine-tuning

    • Train foundation models or fine-tune open-source LLMs (e.g., LLaMA, MPT, Falcon).
    • Supports parameter-efficient fine-tuning for cost savings.

    Model Serving

    • Low-latency, autoscaling endpoints to deploy models securely within Databricks.
    • Unified monitoring for latency, throughput, and drift.

    AI Agents & RAG

    • Build compound AI systems: Retrieval-Augmented Generation (RAG), tool use, and multi-step reasoning.
    • Integrates with Unity Catalog to securely connect models to governed enterprise data.

    MosaicML

    • Optimized distributed training engine for large models (reducing cost/time by up to 10x).
    • Prebuilt training recipes for efficiency.

    ML Lifecycle Integration

    • Works with MLflow for experiment tracking, lineage, and governance. 
    • End-to-end workflow from data prep → training → deployment.

    Many have turned to Polestar Analytics’ Agenthood AI to simplify agent development and orchestration.  With its natural language control, role-based intelligence, API-first integration one can unify your entire organization cross functional orchestration with intelligent workflows with our pre-built multi-agent frameworks:

    • Pre-configured agent templates for common business processes
    • Automated handoffs between departments
    • Real-time visibility across the entire workflow lifecycle

    Its intuitive workflow designer builds complex AI workflows in minutes, not months with intuitive drag-and-drop interface

    • Visual process mapping and decision trees
    • Reusable workflow components library
    • One-click deployment to production

    How to proceed?

    Polestar Analytics has adopted the Databricks Data Intelligence platform to power three innovative solution offerings for the market: 

    • Profit Pulse: An integrated suite of ML and visualization solutions addressing pricing, Trade spend/promotion and assortment optimization. The solution has seen wide adoption across a CPG landscape.
    • Data Nexus: A data engineering tool that speeds delivery of composable data models which are the foundation of all consumption layers – visualization, machine, and Generative AI and agents.
    • Agenthood AI: A data science solution leveraging Agent Bricks allowing drag and drop creation of agents, an agent marketplace and an agent orchestration tool.

    Along with our on-prem to cloud or Unity Catalog migration capabilities, these advanced analytics solutions can be served in two motions:

    • Service – Migration and modernization are a core capability of Polestar Analytics.  We help enterprises understand their data estate, provide the industry centric resources, accelerators and technology to migrate or modernize.  We offer domain specific use case implementation in Retail, CPG, HLS, MFG and Private Equity.
    • Product – For those “Do-It-Yourself” enterprises, Data Nexus is an excellent data engineering tool focused cost reduction, governance and understanding lineage.  Hosted in your environment, this product with its drag and drop functionality speeds the creation of data models which are the foundation of modern data science advanced analytics.  And when your data estate is migrated or modernized, Agenthood takes you into the agentic world, simply.

    Let’s dive into the real-life use case!

    Data warehouse Implementation

    An example of our tailored solution delivery is below:

    Migration is a one-time event (or a series of events), but modernization is a mindset.  Databricks brings intelligent scalability to this mind set with capabilities that sunset historical processes and ecosystems making data science, data engineering and business intelligent and simple.

    Turn to Polestar Analytics where we make data to outcomes, simple!

    About Author

    Migrate and Modernize Your Data Estate
    David LeGrand

    Sr. Vice President

    Generally Talks About

    • Data Engineering
    • Databricks

    Related Blog