Getting Your Data to Work for You
Virtually every industry records data in some way or another. Whether it’s by tracking sales or site traffic, this information is written and stored somewhere. However, the real value comes from using that data to gain deeper insight.
This is where data profiling comes in. Data profiling is an ongoing process of examining data from various sources and collecting meaningful metrics and statistics to gain a greater understanding of the data and its quality in the appropriate context.
When creating profiles, the first thing people tend to focus on is historical, or existing, data. This can include getting trends, usage stats, etc., for different time frames, depending on the context. This information is usually collected from, but is not constrained to, a production environment, as profiling can be done on any data set. It gives valuable insight into user or product information that, once processed, can be used to drive decisions and change.
Profiling can be done for planned changes as well. With various releases, enhancements, and bug fixes changing the current state of a product, getting clear before-and-after pictures can be very insightful. For instance, after a new feature is released, you can collect statistics on how often it’s used, how it’s used, etc. This can be compared to other features or even used to determine future enhancements.
As stated previously, profiling is not constrained solely to production data; in fact, it doesn’t even need to be constrained to a database. For example, as a project is being worked, profiles can include the quality metrics for a data set. This information can provide details such as test coverage, types of tests, and the level of automation, to name a few aspects. Having these testing metrics provides a better picture of the project’s testing efforts and can help identify gaps.
It’s also advantageous to generate data profiles proactively for unplanned changes. This can include monitoring for changes within the data. In today’s world it’s not unusual for teams to be siloed, only concentrating on their piece of the greater picture. This increases the risk of impacting downstream processes, especially when the flow of data might not be common knowledge. Possible checks include monitoring for new transaction types or the population rates of key fields. Having alerts in place to detect changes to the structure of the data can indicate any upstream impact sooner. The quicker an issue is identified, the faster and cheaper it is to resolve.
When used appropriately, data profiling can be a powerful tool. Whether it’s analyzing existing data, profiling for planned changes, or monitoring for unplanned circumstances, the insights gained can save time and money and remove potential risks. Take advantage of the data you have by uncovering the hidden truths within it.
Catherine Cruz Agosto and Shauna Ayers are presenting the session Uncover Untold Stories in Your Data: A Deep Dive on Data Profiling at STAREAST 2016.