Exploratory Data Analysis (EDA) as a Framework to Improve Learning

Exploratory Data Analysis (EDA) as a Framework to Improve Learning
Image by Danni Liu

I never thought I'd go to university, as no one in my family had before me. But in year 11, a teacher changed my trajectory, and I began to believe hard work was key to academic success. Despite all the hours I put in during my undergrad, my grades were just okay. I believed others were just naturally smarter, and I had to work twice as hard to compensate because I was less intelligent. Later, when I pursued my commerce/finance degree, I doubled down on my efforts. Yes, I got better grades, but the stress took its toll, leading to alopecia areata. Thankfully, my hair grew back after the stress levels went down.

As I aim to advance in data and analytics, I realize continuous learning is essential. I can't just keep studying longer; I need to study smarter. I've adjusted my mindset and understand that I don't lack intelligence; I need to develop better strategies. This year, one of my goals is to enhance my learning efficiency.

I've started identifying smarter study tactics and habits I need to change. My plan is to avoid getting bogged down in details prematurely and increase engagement in high-order thinking. To briefly explain, high-order thinking involves complex processes like analysis and creation, while low-order thinking is more about basic recall and understanding. Both are important, but focusing on high-order thinking can lead to deeper learning and understanding.

Apart from learning how to learn, I've also been exploring exploratory data analysis (EDA) using Python, and I've found interesting parallels between EDA techniques and effective learning strategies. So, for this blog, I'd like to share:

• What is EDA?
• How EDA practices relate to learning

What is EDA?

When we talk about EDA, we refer to a method developed in the 1970s by John W. Tukey. It's all about understanding data in-depth, not just looking for what we expect to find. The framework involves six practices, namely:

Discovering: This initial stage involves delving into data without many preconceived ideas. We're on the hunt for intriguing patterns, trends, or relationships. This might include examining column headings data types, understanding the volume of data points, and summarizing statistics.

Structuring: At this point, we'll organize and categorize the data to simplify analysis. This may involve sorting, filtering, or aggregating data, such as compiling daily sales into monthly totals.

Cleaning: Given data is seldom pristine, cleaning involves correcting errors, addressing missing values, and managing outliers that could skew the analysis.

Joining: Enhanced insights can often be gleaned by integrating disparate datasets. Joining involves merging data from multiple sources to gain a comprehensive understanding.

Validating: Validation is critical to ensure the data and modifications made are appropriate for our inquiry. It's about confirming that our findings are reliable and the data is fit for purpose.

Presenting: The final stage involves communicating our discoveries. This includes creating visuals, summarizing key findings, and conveying the insights effectively.

It's important to note that EDA is not a strictly sequential process but rather iterative. New discoveries might lead you to revisit and revise earlier stages. The objective is to thoroughly comprehend the data to inform robust statistical modelling and ensure any subsequent decisions or predictions are well-founded.

How EDA Practices Relate to Improved Learning?

Discovering: This is about setting the stage for our learning. We identify resources, gather information, and start to understand what we're working with. This initial stage is crucial for setting ourselves up for more in-depth study later.

When approached thoughtfully, it can boost engagement and understanding. While it might feel like extra effort, even a brief 10-minute session can make a significant difference. I've found previewing material to grasp its main themes and structure incredibly useful. It primes our brain, setting up a framework for new information to be categorized and assimilated more effectively when we move on to more in-depth learning.

Structuring: Here, we're organizing our learning materials in a way that makes sense to us. It's about creating a framework of concepts and connections that can guide our study.

During this phase, we may also decide to eliminate the studies that are either too complex for your current goals or are topics we're already familiar with. Like deleting redundant columns in a dataset, you can skim or skip over known information. Additionally, hypothesizing about groupings or relationships among topics encourages us to think more deeply and engage with the material.

Cleaning: This step involves addressing any areas of confusion and ensuring our understanding is clear and accurate. I often gloss over difficult parts, choosing to set them aside with the, dare I say, pretend intention of revising them later. This usually happens because I'm eager to cover a certain amount of content or because dealing with these uncertainties can be uncomfortable. By "parking it for later", I allow myself to progress with reduced or no guilt. However, this approach can be counterproductive in the long run. It weakens the foundation for building and retaining new knowledge as the concepts aren't correctly or properly understood. We'll likely have to return to them anyway, so we're far better off tackling them head-on the first time. This is an area I need to work on consciously if I am to change my behaviour.

Joining: This is about integrating new information with what we already know. I recently discovered that we can significantly enhance our understanding and retention by forming connections between new and existing knowledge. Our brains are like an extensive network; creating more connections to new information strengthens these new pathways, enhancing learning.

Among several techniques, two that I find particularly effective are:

  • using analogies and metaphors and
  • asking how and why questions

Analogies and metaphors help us map well-known concepts to new, unfamiliar ones, making the new information more relatable and easier to grasp.

Meanwhile, asking ourselves how and why questions about the new information encourages us to delve deeper and forge stronger connections with what we already know.

Validating: This is where we really get to see how much we've learned and pinpoint where we might need more work. There are plenty of ways to test ourselves, like diving into a project, using interactive games, or even asking ChatGPT to quiz us (but remember, take its answers with a grain of salt!). I'm not too keen on flashcards myself—they take forever to make. But however we choose to do it, this stage is all about double-checking our understanding and filling in any gaps so we feel confident and ready to roll with our new knowledge.

Presenting: They say the best way to learn is to teach, and I totally believe it- that's the protégé effect for you. When we know we'll be explaining something to someone else, we pay more attention and dig deeper to make sure it makes sense. That's one of the reasons why I started sharing what I learn on my website and socials like LinkedIn and Instagram. It's not just about spreading the word; it's about understanding it better myself.

If we're not connected, why not give me a follow on Instagram and LinkedIn? My Instagram is @danni_dan_liu, and you can find me on LinkedIn too! Just search 'danni dan liu'.

So, there you have it! I hope you've also found some nuggets of wisdom on learning that you can take away and apply in your journey.