Dirty data – it sounds bad – something your mother warned you about: “you better not come home with that dirty data all over you!” What is this thing we call dirty data – and just because it is “dirty”, does it not hold value?

Many investment firms have large data management and governance models in place to verify and validate the data entering their environment. While this is important for the operational datasets within the organization and those distributed for public consumption, it imposes challenges to the flexibility and speed of ingestion as well as access to new or unique datasets for internal consumers. This process of defining, modelling cataloging, ingesting, standardizing, and managing the data can often take days, weeks, or even months.

The resulting gap between the speed at which a business operates and the ability to manage the data is all too often filled by “data slingers.” These folks have the audacity to bring together just enough business acumen and technical know-how to transcend the typical businessperson and become the go-to for data within individual departments. These individuals aggregate random bits of data from numerous sources, only to combine them with the firm’s pristine, mastered, governed data. This creates new insights or capabilities, exposing a fraction of the value that these datasets might provide if available to a larger audience.

Years ago, I was working with a firm that received some information that would assist in identifying sales opportunities. They wanted to correlate the information with several other systems to help create appropriate targets. We jokingly referred to the new dataset as the “Glengarry Leads.” We realized that the information needed to be actioned quickly as it had a very short shelf life. Unfortunately, the established processes stood in the way of moving quickly enough and the data was used outside the standard processes. There was a business value in utilizing these datasets regardless of their heritage.

This scenario plays out all too often and results in valuable pockets or puddles of data segregated all around the organization with varying degrees of muddiness, and an inability to leverage them across each other or on a broader scale for even greater value. The industry is much more focused on the operational investment datasets versus the farther-reaching information that exists all around the environment, including information sets from sales, finance, client interactions, communications, competitive analysis, transfer agency, broker-dealer, marketing, Request for Proposal (RFP) repositories, and the list goes on.

Perhaps this data could provide a view of trends within client movements, or product indicators. With the right information, you may gain a better understanding of profitability for specific functions, or at a minimum, directional cues.

We do need to understand that this data is likely imperfect and may fall short of a standard of accuracy, or there may be disagreement within the firm as to how to interpret the datasets appropriately. It can be very difficult and time-consuming to achieve this level of accuracy and agreement within the firm. That said, these datasets are often good enough to provide indication or direction and with the appropriate representation can offer great insight and guidance to the firm. The famous French writer Voltaire once described perfection as, “The best enemy of the good.”

This broader model requires us to rethink the methods that are used to land and operate data in our environments. Firms need to learn how to incorporate these puddles of data quickly and easily so that they can join the larger lake of available data across the firm. This provides a greater level of accessibility and can allow teams to realize greater value than that achieved by the “data slingers.”

What data should be allowed to join this model? This is the wrong way to look at the issue as it is not what data should be allowed, but rather what model supports this data? I was chatting with a software engineer once, and he jokingly commented to me “carpe data” – and that phrase has stuck with me ever since. Creating a concept of sandbox areas for teams within the environment allows them to work with data and provides easy access to combine with the larger standard sets of data within the firm. With the toolsets available today, we can create these playgrounds of information that enable business intelligence beyond the traditional investment accounting and performance datasets.

Do not shy away from dirty data, but rather look to see how it can be leveraged to your firm’s benefit. Learn from the data slingers in your organization as to what data types need to be planned for and incorporated. By creating an environment that embraces a broader dataset, you are enabling the company to react at the speed of business with greater flexibility for the future.

This article originally appeared on FundOperator.com

Download Thought Leadership ArticleServices: Expertises: Clients: Authors:

About Meradia

Meradia is a leader in operations and technology consulting for the global investment management industry. Since 1997, we have provided strategic advisory and implementation services that transform operational processes, performance, analytics, and reporting. Our consultants leverage decades of experience in investment performance measurement, front-to-back office operations, and technical platforms to optimize your firm’s functionality. Our extensive portfolio includes institutional asset managers, outsourced chief investment officers (OCIO), wealth, trust, banking, and insurance companies.

Skip to content