<iframe src="https://5923915.fls.doubleclick.net/activityi;src=5923915;type=hbx_core;cat=hbx_b0;dc_lat=;dc_rdid=;tag_for_child_directed_treatment=;ord=1;num=1?" width="1" height="1" frameborder="0" style="display:none">
HBX Business Blog

How to Analyze a Dataset

Posted by Melissa Barton on April 5, 2017 at 8:30 AM

dataset.png

We live in a world where 2.5 billion gigabytes of data are created every day. Rich data can be an incredibly powerful decision-making tool for organizations when harnessed effectively, but it can also become daunting to collect and analyze such large amounts of information.

Here are a few important things to keep in mind when diving into your next data analytics project:

1. Never assume that your dataset is clean

Clean, clean, clean. Data cleaning is the process of uncovering and correcting, or potentially eliminating altogether, inaccurate or repeat records from your dataset. It is imperative that you work through this step first before beginning any analysis. This is particularly important if you will be presenting your findings to business teams who will be using it for decision making purposes. Teams need to have confidence that they are acting on a reliable source of information.

2. Start with a specific question and hypothesis

Once you’ve completed the cleaning process, you may find yourself looking at the dataset with a number of questions swirling in your head. There is so much potential analysis in front of you! Just be cautious and proceed slowly. Don’t try to tackle too much at once. Make sure that you are beginning your analysis with a very focused and specific question.

If the request for analysis is coming from a business team, push them to provide explicit details around what they are hoping to learn, what they expect to learn, and how they will use the information. This might also mean that you can actually eliminate some unnecessary variables in the dataset to make sure that your analysis remains on track.

3. Don’t be biased by having a hypothesis in mind

Fantastic, your dataset is clean and you have narrowed in on a specific question! Next, make certain that you remain unbiased as you make inroads with your analysis. Many analysts will tell you that it can be tempting to use data to tell the story that you or your colleagues want or expect to hear.

But, you have to let the dataset speak for itself. Keep yourself alert to the fact that maintaining objectivity is not as easy as it sounds. It’s okay if the data isn’t telling you what you are expecting to hear because that’s a finding in and of itself!

4. Documentation is key

I find it useful to retain a log of my data analysis for various reasons. First, it is a place where I can discuss any limitations or special circumstances that I encountered along the way. Second, a colleague can more easily review and/or critique the analysis by having this guide.

Finally, in the case that my analysis will be replicated with an updated or new dataset, then I can be confident that it is conducted in a way that allows for a true comparison with the prior work.   

5. Always investigate the whys

Finally, as you near the conclusion of your analysis, remember that this dataset is only one piece of the puzzle. It is critical to pair your quantitative findings with qualitative information, which you may capture using questionnaires, interviews, or testimonials. While the dataset has the ability to tell you what is happening, the qualitative information can often point you in the direction of why it is happening.


headshot of Melissa Barton

About the Author

Melissa Barton is a Research and Planning Analyst at HBX, providing data, research and analysis to inform strategic decision making. She received her MBA from Harvard Business School in 2014, and prior to joining HBX, she was a Business Consultant at a Boston-based consumer insights & innovation firm. Outside of work, she can be found spending time with family, golfing, running or catching a movie.