We recently participated in the FIMA Boston conference, one of the many springtime financial services industry events showcasing data, analytics and AI innovations on the digital transformation odyssey. And the single loudest takeaway from these events regarding this journey we’ve all been on now for nearly a decade is that it’s a journey with no end.
No longer viewed as a destination, transforming enterprise analytics is a virtuous cycle of data decisioning and predictions, governance and security that drive greater transparency and fluidity in our pursuit of analytics excellence in the cloud. Yet despite the jellyfish-like squishiness and uncertain risk for pain, there’s more optimism and a sense of clarity found in a more well-worn path for modernizing data analytics assets.
A look back at our work so far with customers in insurance and financial services highlights captivating insights learned in their legacy SAS asset transformations. In one assessment alone we discovered that only 30% of analysts were actively developing on the platform, a percentage of those were exporting the data rather than leverage it directly in the warehouse, and nearly 20% created major security risks placing open text passwords in their code.
Which highlights the importance in our second big takeaway from discussions with the C-suite and down the command line: it’s beyond time to get hands-on and transform workloads that are more and more at odds with the permeating enterprise data strategy in the cloud. Now that the broad infrastructure and processes are in place, all eyes and budgets must focus on decades-old methods and platforms like your legacy SAS workloads that today mostly encumber the people and processes tied to their analytics advantage.
Finding the edges of the jellyfish without getting stung
Corios was hired by a prominent insurance carrier to modernize their analytics and data practices for all things analytical: underwriting, pricing, claims, repairs, coverage, compliance, and regulatory support. They wanted to reduce the cost of data storage, to align all their analysts on a consolidated set of tools and environments, and to modernize the enterprise so they could react to climate events and other large-scale adverse events faster and more efficiently.
The Corios solution we use in these engagements is Corios AMP, which includes Corios software and service methodology to inventory, score, prioritize and modernize our clients’ SAS data and analytics assets. After inventorying their workloads, data and teams, and interviewing leadership and subject matter experts, we recommended to centralize their workloads that relied on their primary atomic-level data warehouse (in Oracle), and to move their non-warehouse workloads and analysts to the use of Python on Domino Data Labs for virtual analytic environment provisioning and archiving.
Then we invested the next 6 months modernizing the work of their 800+ analysts along this roadmap. More than just a tech perspective, our engagement included perspectives tied to the success of changing the hearts and minds of their stakeholders.
Don’t touch: Change is hard
Analyst behavior and readiness to adopt change is critically important, and there were four mindsets we commonly encounter. We segment the analyst community into groups who produce similar work, and determine their readiness for modernization. Then we target our migration efforts and coaching differently for each group.
- The “Ready to Go!” group (usually 10% of the analyst community) is already bought in to the benefits of modernization and serves as an internal champion for change. They often need very little direct support other than to point them in the right direction, but we stay in contact with them to help them socialize their early successes.
- “Help Me Get Started” (20% of the analyst community) prefers to have us provide them with written and video materials, especially case studies, side-by-side examples of before-and-after, and “learn by doing” training.
- The third and largest group, “Coach Me On This Journey” (roughly 40%) need active coaching and teaching experiences through multiple media, benefit reinforcement, peer teaming, and in some cases, daily check-in’s with their coach.
- Finally, every organization has members of the “Doubting Thomas” group (about 10% of the analyst community) who actively avoid change. We’ve found the best approach is to help the “Ready To Go!” group create their own success stories, to socialize those successes and lessons learned, and to encourage behavior change in the last group through demystifying the journey and making it less scary.
Take stock and prepare for surprises
The traditional means of defining the modernization roadmap and success criteria (i.e., walking the halls, and interviewing analysts) is like trying to pick up a jellyfish on the beach. There’s no discrete edges to grab onto and no warning for where the stinging tentacles will strike you down. We’ve found you need to have a data-driven inventory to answer the ‘who-which-when-where-how many’ questions in order to find success from stakeholder interviews that address the why and how questions.
In this engagement we scanned the data about every analyst, every data file, every workload, and every line of code. We found for instance that out of 800 registered analysts, only 250 were actually active in performing analytics work. More interesting, hundreds of workloads and thousands of data files were owned by analysts who didn’t even work at this company any longer.
For a fun diversion, here’s a clip on actually picking up a dangerous jellyfish:
Share the power to share adoption
It’s far more efficient to enable the analysts in the enterprise to do most of the migration, and we focus on the inventory, targeting, vision creation, training, coaching and change management. If you try to throw a hundred consultants at hundreds of analysts, thousands of workloads and hundreds of thousands of data files, you will end up with a chaotic and expensive mess.
Instead, we found it’s smarter to leverage the knowledge the analysts already maintain to ensure the quality and validation of the work they migrate, and we provide the supervision and tutelage to do it consistently and rapidly.
Meet the data where it rests
From a complete workload inventory, we found that more than 40 analysts spent most of their time querying and exporting massive amounts of data (e.g., hundreds of terabytes over five years) out of the atomic-level data warehouse, across the network into their ad hoc analytics workloads. This was true even as it didn’t fundamentally augment or combine new data on top of the warehouse data.
As an alternative, all that work could be performed inside the database where the data was secure, backed up, and easily validated. The analysts only had to learn how to convert the use of their ad hoc analytics workloads to run in SQL, and then use a BI tool like Tableau that was already connected to the warehouse to build their reports. This reduced the volume of one-time and orphaned data on the file system and its own archiving platform and allowed the analysts to spend more time actually analyzing and finding insights, than moving data from one place to the other.
Know and go with the flow
About 10-15% of their workloads would benefit from being able to horizontally scale them beyond what a Python/Pandas framework can easily support. Moving those workloads to Dask and Spark for horizontal scaling of compute resources became a natural next step.
When the analytics computations cannot be run in the database, then heavily-used analytic data would benefit from distributed object storage in the cloud (e.g., parquet datasets on Amazon S3 for instance), which is well aligned with the use of horizontal scaling for computations.
Maintain momentum with edges and fluidity
Now that our client has upgraded their analytics workloads to take better advantage of their corporate resources such as their primary relational database, and their open source analytics on running on their on-premise Domino Data Labs environment, some of the next steps include:
- Achieving further storage efficiency by leveraging cloud storage for their large flat file analytic data on AWS S3.
- Moving their largest workloads that run inefficiently on Python by migrating those jobs to Spark. This can be achieved in both Domino Data Labs (on-premise) and on Amazon Glue, Lake Formation, and EMR (in the cloud).
- Designing and writing their net new workloads to leverage best practices from the beginning so that they don’t build up more technical debt related to their analytics data and workloads.
Want to learn more about these best practices? Email me today at [email protected].