Dr. Dror

Foo is not just a "Bar"

Remarks on "Data Driven" by Patil & Mason

I read recently the paper "Data Driven - creating a data culture" by DJ Patil and Hilary Mason (link). In this post, I'd like to share some thoughts/remarks/etc. that I collected while reading.

Communication skills

Rather early on the authors mention the importance of communication skills. For "Asking the right questions" one need to be able to communicate with different stakeholders. As a data science team, you need to understand what are the pain points of your colleagues and to that end you would have to speak with them. You could have Spock, Lieutenant Data, Pinky and the Brain on board but if they can only talk to each other things won't move forward.

Date heroes

Data democratization

I totally agree that making sure the data of the organization is accessible is important. But, there's a bit of a chicken and egg problem here. Stakeholders won't look into your amazing data warehouse if it is not clean, understandable, accessible etc. But in order to have your data ready for everyone a lot of effort has to be invested. Having a sound foundation inside the data science team can be the solution. Make sure that the data exploration processes are as simple and straightforward as possible for the DS team to start with. This will result in a nice toy that you can show off around and attract others to dirty their hands with the data. In turn, it will be needed to extend the available data, the circles of users etc.

Ask the right questions

First, make sure you don't answer what you can but what you should. This means you have to invest a lot of time in understanding the problem. "Starting with the data" as suggested by the scientific method is dangerous as it may mask the business need. The formulation of the question has to be derived from the business needs and not from the data. Business goals are the reason to do data science in the industry and not merely the intellectual joy that comes from looking into the data. Despite of the comments above, still keep in mind that a stupid question is one which was not asked!

Guiding questions

In the paper there are two lists of questions that I believe are very important and helpful.

For research management

  • "What is the question we're asking?" This one is all about alignment; inside the team and with external stakeholders
  • "How do we know we have won?" Definition of acceptance criteria
  • "Assuming we solve this problem, what will be build first?" This one should help us understand whether we're asking the right question
  • The last two questions "If everyone in the world uses this, what is the impact?" and "What's the most evil thing that can be done with this?" are of more general nature.

For organizational management

  • What are the short-term and long-term goals?
  • Who are the supporters and who are the opponents?
  • What conflicts are likely to arise?
  • What systems are needed to make the data scientists successful?
  • What are the costs and time horizons required to implement those systems?

General remarks

  • Dashboards are living entities. It is all about finding the balance between empty dashboard on one hand an overwhelmingly full one on the other hand.
  • During daily stand ups or other sync meetings adhere to the 3 agile questions:
    • What have I done since last sync?
    • What shall I do today/till next sync?
    • What are the impediments?
  • Share the teams results with interested stakeholders and try to extend the circle