Full Screen

Basics of Big Data

More Free Lessons in

Data and Analytics View All →

Get cutting-edge digital marketing skills, know-how and strategy

This micro lesson is from one of our globally recognized digital marketing courses.

Start a FREE Course Preview Start a FREE Course Preview
Global Authority

The Global Authority

12 years delivering excellence


245,000+ Members

Join a global community


Associate Certification

Globally recognised


Membership Included

Toolkits, content & more

Digital Marketing - Study Notes:

What is Big Data?

Big Data describes the huge volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. It involves datasets that are too large to store on one machine and it requires multiple computers to work together to process the volume of data. In turn, we can use the data to predict better results and forecasts than would be possible with smaller data sets.

Now, let’s think about where Big Data has come from, and some of the key associated technologies around it. We can now mesh different datasets to create core insights and understanding about our customers and environment better than we’ve could before - it’s the coming together of a number of different tools and techniques. Some of the key technologies in this area are around predictive analytics - being able to use the data to understand how things may happen in the future.

Big Data technologies

Some technologies associated with Big Data include:

  • Predictive analytics: using large data sets to model and outcomes based on historical data.
  • NoSQL databases: These offer high operational speed and flexibility for data managers.
  • Search and knowledge discovery: This involves using AI to help process and sequence data for knowledge discovery.
  • Stream analytics: You can integrate external data sources to enable applications to integrate, update, and process additional data.
  • In-memory data fabric: This provides a comprehensive view of business data from different sources working together through automatic orchestration.
  • Distributed file stores: You can store files on a server for access by users across a wide network, for example.
  • Data virtualization: This provides access to data without needing to know the technical details or where the data is stored, including how it is formatted, for example.
  • Data integration tools: These unify data from multiple sources, giving a view of combined data.
  • Data preparation software: This enables you to combine, structure, and organize data in preparation for data visualization, analytics, and machine learning.
  • Data quality software: This helps you to optimize the quality of your data and deploy pre-built software to users to manage data use.

Predictive analytics

Predictive analytics is software and/or hardware solutions that enable firms to discover, evaluate, optimize, and deploy predictive models by analyzing Big Data sources to improve business performance or mitigate risk. Effectively, this is about making core predictions based on previous behavior to evaluate what we think is the likely outcome for the future.

This is very important when thinking about buyer behavior, for example, in a marketing world. We need to be intuitive about what exactly is going to happen, and understand the variables that have changed from the past in order to make certain assumptions about what might happen in the future. It’s also worthwhile using common sense and intuition when examining the data to add to the benefit of your experience of the metrics.

You can think of Big Data technologies as being like a jigsaw puzzle. It’s all about meshing technologies together. There’s an element of overlap, as well as linkages, because all these technologies need to come together to create a robust Big Data strategy. And if you don’t have these linkages, your picture might not be complete.

The four Vs

[] When thinking about Big Data, you can use a number of different approaches. One very useful framework involves the four Vs:

  • Volume: the size or scale of the data
  • Variety: different forms of data
  • Veracity: trustworthiness of the data
  • Velocity: frequency of incoming data


Big Data, by its very nature, is a larger number of data sources, and can provide greater volumes of insights on individual customers. It’s the aggregating of the information that creates the richness within the dataset. In other words, it’s not just a single dataset. It’s about meshing a number of different data sources.


Big Data includes a wider variety of data types and sources. For example, the data could be a rich combination of structured, semi-structured, and unstructured data.


Veracity refers to the need to have robust data. Essentially, you must have some form of integrity within your data. Make it trusted, make it clean, and make it de-duplicated, because any anomalies that you have within it will eventually lead to inaccuracies that you might not notice later on. Remember, robust inputs lead to robust outputs, and this applies to Big Data as well.


This is the frequency of new data being entered into the set. It’s always better to work with fresh data and data sets with high levels of data velocity.

Back to Top
Jack Preston

Jack Preston is a Data Scientist working within marketing analytics, with a particular focus on strategic customer loyalty. Jack has experience working in both small-scale startups and large corporates, including dunnhumby and Notonthehighstreet. He also holds an MSc in Business Analytics from UCL where he graduated with distinction.


Jack Preston
Skills Expert

This short course covers the principles of analytics and demonstrates techniques and useful tools that you can use to develop and refine your knowledge of data analytics.

You will learn:

  • The fundamentals of data, collecting data, and processing data, including best practices, techniques, and challenges
  • The principles of web analytics, the benefits and limitations of Google Analytics, terminology for reporting, and the legalities around consent and data privacy
  • The concepts of Big Data, the processes around data, including mining, scraping, cleansing, and de-duping, and the various languages and programs for testing your data
  • The importance of AI, Machine Learning, analysis types, the value of testing hypotheses, and forecasting based on the data available
  • How best to report and present data findings to management and the different tools available to you

Approximate learning time: 3 hours