Full Screen

Data Processing

More Free Lessons in

Data and Analytics View All →

Get cutting-edge digital marketing skills, know-how and strategy

This micro lesson is from one of our globally recognized digital marketing courses.

Start a FREE Course Preview Start a FREE Course Preview
Global Authority

The Global Authority

12 years delivering excellence


245,000+ Members

Join a global community


Associate Certification

Globally recognised


Membership Included

Toolkits, content & more

Digital Marketing - Study Notes:


There’s a number of things to think about when doing data processing.

Data validation

Understanding what data, raw data is both valid, what’s relevant, what’s clean, what is pure data in effect, is very, very important, and distinguishing that from good and bad data.

Data sorting

How do you do that? Where do you cluster up your different data sets and across different variables? Ultimately, organizations at the top level will usually sort data under customer level data or product level data and based on those types of variables you can be quite helpful. Unfortunately, it doesn’t cross-pollinize very well.

Data summarization

What does the data mean? What is the kind of aggregated thing that it’s trying to say?

Data aggregation

At a high level you need to do the aggregation bit, but then at a lower level you need to be able to go down and dig down into deeper pockets of your data to understand what key target segments may be doing and responding to your data in different ways.

So for instance, in the insurance world we would have channel splits so you could understand your overall retention rates of the number of people retaining an insurance product, but then on the other side if you dug down deeper you could understand that at a channel basis, so people who are coming into the telephone, people coming to the web, people coming through an aggregator, and then you could basically drill down and see well, what were the core insights here? Why would these people leave and why were these people staying? So it’s very important to do a high level as well as deep dive.

Now the problem with high-level versus deep dive is statistical relevance and accuracy. If you’re doing it too deep down, effectively, you may not have the volumes to be able to get accurate prediction statistically, while at the general level you may be able to do that higher up. So you got to kind of mix and match your approach, and there’s an element of pragmatism that plays in the mixer when you’re processing your data as well that you must be conscious of.

Data analysis

What do you then do with your data? How do you then transfer what your raw data is into something meaningful? And the way that you will do this is by asking yourself key related questions within the dataset, and once you have an understanding of what key questions you want to ask, it becomes more relevant and you can kind of go off to what you’re trying to seek for. It’s like if you try to look for a needle in a haystack. If you’re not asking the right questions, you’re unlikely to get many meaningful answers. While it’s asking the right questions and setting up in the preparation phase as we talked about in stage one, will perhaps give you more robust answers.

Data reporting

Once you’ve got the analysis and you’ve understood what you’re trying to go after, you need to report back on that. Standardized reporting is things that people do on a regular basis, and it enables you to churn out monthly MI reports in a consistent way based on new updated data sets. This has got a huge advantage in that your organization gets comfortable with what they’re trying to read, what the KPIs are, and then they can monitor the changing nature of those KPIs. It becomes essentially important to set up what those KPIs are from the outset though, of course, so you need to think about that.

Data classification

And then finally, how you classify your data into different segments and build it up is very important as well. So classification of your data, and then understanding what it means How do you report and how do you analyze it are some of the key facets when thinking about micro-processes?

Best practices

So when processing your data, what are some of the best practices that you can use?

  • Make use of advanced analytic techniques: Using advanced analytics techniques can actually be really helpful and really useful. If you go back to Statistics 101, to begin with, and start building up models, predictive elements, correlations, causations, as well as some of the normal raw statistics you may use – average, standard deviation – all these things can be very, very good at kind of building the foundation up, and also then from that lead on to more predictive techniques of analysis.
  • Scrub data to build quality into existing processes: You need to think about scrubbing data to build quality into the existing processes. This comes down to the preparation stage again, where you’re cleansing, you’re deduping, to build in things that you understand and have triggers when you feel things aren’t going right, when you feel the data may not be as robust as you think.
  • Shape data using flexible manipulation techniques: It’s really important that you keep the raw data but then have the ability to kind of shape it in the way that you want. And this is a little bit of a “be brave” iterative process that is important in this area. You might need to do a little bit more searching. Understand by flexing your data what comes out the other side and then you can start to think about doing raw questions or more specific structured question you need to answer.
  • Share metadata across data management analytics domains. This is fundamentally important as well. We talked about open versus private data. You can share a range of data sets and then understand how they all patch together. This enables you to gain fundamental insights about what’s going on within that market share.


Let’s think about some of the challenges of data processing. These fall into two core buckets.

Technological challenges

  • Using multiple systems: Using multiple systems from a technological perspective can be extraordinarily challenging. If you work for a big organization, you’ll know that there are multiple systems that have gotten used over time. Marrying those datasets up, later on, can be both a challenge and just so complex. Indeed, it may not even be worth your time doing, but ultimately you want to create one version of the truth. The problem with multiple systems is that they can be conflicting data sets that exist.
  • Time lags: The way that these systems get updated can be quite a problem as well. So the time lags that exists between them is something that you must consider.
  • Inaccurate data: The advent of inaccurate data as well, both of the capture phase or even at the manipulation phase or analysis phase, could be very problematic for you when processing that data. You need to watch out for inaccurate data and the best simple way of doing that is the “does it make sense” test. Intuition plus insight equals something magical.
  • Data cleansing issues: It goes back to the core of being able to come up with clean data. Being confident about your data and data cleansing forms a big part of that.

Security challenges

On the other side, you’re thinking about security challenges.

  • Physical challenges: You’ve got physical challenges of storage of data and where it’s held.
  • Personnel challenges: We live in a complex world, with staff turnover and so on. You’ve got to think about who has access to the data. Is it a need-to-know basis or is it everyone open access? A lot of the time you need to be quite restrictive in the kind of data that you can offer people. So for example, in head office of a bank, people often don’t have access to customer-level systems simply because they don’t need to have access to it.
  • Procedural challenges: How do you get access to the data? What are the challenges? How do you engage a customer analyst to be able to mine in the data? These are some of the key challenges as well, and clearly all these data requests tend to be a massive pipeline around this.
  • Technical challenges: Consider access to the systems, for example. Who can determine what data there is? And particularly when transferring data from one system to the next to do some sort of analysis, that can be completely challenging from a security perspective. Where are the leakages? Think about how that basically plays out as well.
Back to Top
Richie Mehta

Ritchie Mehta has had an eight-year corporate career with a number of leading organizations such as HSBC, RBS, and Direct Line Group. He then went on setting up a number of businesses.

Data protection regulations affect almost all aspects of digital marketing. Therefore, DMI has produced a short course on GDPR for all of our students. If you wish to learn more about GDPR, you can do so here:

DMI Short Course: GDPR

If you are interested in learning more about Big Data and Analytics, the DMI has produced a short course on the subject for all of our students. You can access this content here:

DMI Short Course: Analytics

The following pieces of content from the Digital Marketing Institute's Membership Library have been chosen to offer additional material that you might find interesting or insightful.

You can find more information and content like this on the Digital Marketing Institute's Membership Library

You will not be assessed on the content in these short courses in your final exam.


    Big Data and Analytics
    Richie Mehta
    Skills Expert

    This module dives deep into data and analytics – two critical facets of digital marketing and digital strategy. It begins with topics on data cleansing and preparation, the different types of data, the differences between data, information, and knowledge, and data management systems. It covers best practices on collecting and processing data, big data, machine learning, open and private data, data uploads, and data storage. The module concludes with topics on data-driven decision making, artificial intelligence, data visualization and reporting, and the key topic of data protection.