The popular consensus on the skills that data scientists should have is algorithmic knowledge, coding, and the ability to visualize and interpret. While technical skills are at the heart of data science, soft skills are just as important in enhancing the role of a data scientist. Problem solving is the superior capacity that data scientists need.
In the real world, issues are often stated vaguely and broadly. For example, data scientists often receive requests from organizations saying, “We want to increase our company’s profits” or “We want to predict defaults”. Translating a vague problem into a clear, measurable, and precise problem statement is the first and most important step for business consultants and data scientists. Conversely, a wrong statement of the problem will ruin your entire data science project.
Register for Analytics Olympiad 2021>>
Data scientists can use the three-step process as a simple technique to define a problem:
- Understanding the landscape
- Break the general problem down into small pieces
- Refine the options and finalize the problem statement
Let’s explore this process in more detail.
Understanding the landscape
The first step before focusing on the real problem is to understand the context of the problem. This includes references to stakeholders and the boundaries of the problem to be solved. Then, data scientists can understand the landscape by answering important questions related to the industry landscape and current scope. They can do this partly by reading relevant documents and partly through interviews with stakeholders. Below are the essential questions to understand the problem.
- What is the context of the industry and the company?
You understand the operation of the specific industry and its drivers to identify the context of the specific problem statement. For example, someone working in the retail industry needs to know how the retail supply chain works, how stores are organized, and how customers behave and react to prices and promotions. In addition, it is equally crucial for them to have a concise understanding of how these trends are evolving with the post-pandemic behavior change towards an increase in online shopping.
- Who are the stakeholders who can give me information?
These are the business owners who are clear about their goals and actions. They can provide you with the data and clarity you need. Navigating the organization is essential to understand who the stakeholders are and how accessible they are to you. Your first point of contact or your key buyer can help you greatly in this navigation.
- What are the limits of the scope?
Sometimes data scientists don’t realize they’re working with the wrong scope until it’s too late. Therefore, it is essential to ensure that the boundaries are clear from the start. Important questions such as demand at a city, region, country or global level, what business units and products are we talking about, what should be excluded, should be asked when considering taking charge of a project.
- What are the key criteria for success?
The main criteria for success should be agreed with the stakeholders. These include the acceptable level of profitability and the timeframe for completing the project. Stakeholders can come up with criteria for success that you might not have thought of. There is also no problem if there is a lack of clear agreement between the stakeholders on some of these criteria in certain situations. You can note points of disagreement and revisit them once the problem is more precisely defined.
Once you have developed a detailed understanding of the landscape, the next step is to break down the real problem into smaller parts and figure it out.
Break the problem down into smaller pieces
There are many frameworks that people use to explore a broadly defined problem. However, the one that has always worked best for me is a derivative of the famous “Five Whys” technique.
The trick is to keep asking “why” until you reach a point where it doesn’t make sense to ask for more. It’s almost like a child’s curiosity. We know all too well young children who keep asking “why” followed by another “why” followed by another “why”. This is exactly what we need to do here. Get rid of the inhibition and delve deeper into the problem to understand the underlying dimensions.
Three things are important to follow at this point:
1. Explore all the potential “whys”.
The traditional “five whys” technique involves asking the question in quick succession to an answer. It would be better if you went deeper and deeper until you were satisfied. I generally prefer a modified version of this technique while still using it to write a problem statement. Here’s how. Ask “why” and explore all possible options as to why something might be a driving force (rather than moving on to the next “why” after getting a satisfactory answer). Then, for each of these answers, go further and ask “why”. This brings us to the MECE principle.
2. Follow the MECE principle.
MECE (pronounced mee-see) stands for Mutually Exclusive, Collectively Exhaustive. To illustrate this principle with an example, some potential answers while asking “why” for decreased profits could be: higher input costs, higher production costs, higher selling and marketing costs, higher administrative and general expenses or lower sales revenues. While these answers are collectively exhaustive, they do not overlap with any other conductor and are therefore mutually exclusive.
3. Map them in a tree diagram for clarity.
The last step in understanding the problem statement is to organize it. Dividing the problem into smaller parts is a surefire way to get a bigger and clearer idea of the problem at hand. You can do this by creating a tree diagram, starting with the general problem statement as the first node. As you follow this with the general branches exploring the problem in smaller details, the technique further explores the “why”. The smallest points in the tree should be followed with the MECE principle to explore the potential reasons for the problem. Finally, you can draw them on the diagram to clarify the different dimensions of the problem.
To illustrate the steps with an example, the problem statement is “the benefits are less than desired”. This can be followed by elements of the problem such as “higher input costs, higher production costs, lower sales income”, and a plot under potential reasons such as “lower sales income”.
It is also important to ensure that the “why” at each step is grounded in reality and not in wishful thinking. This is where understanding the industry, as well as discussions with stakeholders, becomes essential.
Refine the options and finalize the problem statement
After creating a detailed problem tree that looks at the problem holistically, the next step is to eliminate the nodes that are not relevant. This is where data starts to play a key role. While detailed data analysis is often not necessary, a quick summary of high-level data helps indicate which nodes are more relevant than others. This can be further validated by interviews with stakeholders.
It is essential to be critical and question the data at the stadium. This is because you don’t always get data that you can trust right off the bat, and sometimes that data can be misleading and distracting. Therefore, this step will also include noise reduction and feature removal to ensure data quality. After the data is sorted, you can use it to generate high-level information about the source of the problem.
Data scientists must resist the temptation to go into solution mode and formulate a hypothesis when they get a problem statement. Not everyone in business is good at articulating the problem with precision, and pursuing an ill-defined problem will lead to bad results at the end of the road. Therefore, it is of the utmost importance that data scientists spend time developing their ability to define the problem precisely, make it specific and unambiguous. These initial efforts will be of tremendous benefit in the longer term.
This article is written by a member of the AIM Leadership Council. The AIM Leaders Council is an invitation-only forum for senior executives in the data science and analytics industry. To check if you are eligible for membership, please complete the form here.
Subscribe to our newsletter
Receive the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community