Blog An Agilist's Take on Demystifying Data Science
By Kris Schroeder / 13 Sep 2022
By Kris Schroeder / 13 Sep 2022
When you work in technology, you hear phrases like data science, machine learning and Artificial Intelligence (AI) tossed around. As an Agilist working primarily on application development, I wanted a clearer understanding of these terms. When I was assigned to work with a data science team, as an Agilist, the common frameworks like Scrum and Kanban did not fit nicely.
Even though I had many years of experience with application development teams — including working in other data domains like business intelligence, ETL, data warehousing and database architecture — none of that experience prepared me for the unique challenges of data science work. With the knowledge that enterprises are prioritizing AI and machine learning initiatives over other IT initiatives, I knew I wanted to share my experience with other non-data scientists to prepare them to be successful.
The first difference I experienced when working with a data science team was their backgrounds. Rather than software engineers, this team was comprised of PhDs in mathematics, mechanical engineering and electrical engineering. They had never tracked work with Jira or Microsoft DevOps. They didn't know what Agile or user stories were. Their workflow had nothing to do with building working software. Some of my peers who worked with data science teams confirmed that they also felt out of their depth when listening to PhDs discuss their work. Working software, e-commerce, sales tools and ERP systems are things I understand. Electro-mechanical systems, I do not. It's uncomfortable.
The Agile Manifesto and the 12 Agile Principles emphasize working software. But my new team wasn't building working software — they were producing mathematical algorithms. Even though I understood how these principles have been applied in domains outside of software development, the learning curve to understand the team's workflow was still steep.
It was helpful to understand what data science is — by defining what it’s not. It's not data architecture or data governance. It's not ETL. It's not cloud infrastructure or data security. It's not even Power BI or Tableau reports and dashboards.
Data science uses math to identify patterns in historical data and predict how likely those patterns are to reoccur in the present or future. Data scientists propose a hypothesis and set out to approve or disprove that hypothesis. As they experiment, they naturally must iterate and try different things — which means they don't always succeed or move in a direct line. We don't know if ideas are feasible until we experiment. Statistical modeling also plays a big part in data science, but it requires domain knowledge and programming.
I dove further into my research to see what I could learn about the data science workflow. My internet searches immediately yielded several different frameworks for data science work, but the steps among them were similar.
Since we’re not data scientists, we don't need to be experts on each step. But what we do need to take away is that it's not a linear process. We're familiar with software development where we start out with analysis, move into design, development and testing. In software development, we break work into small chunks and iterate through the workflow. Meanwhile, in data science, the workflow is circular. What is discovered in the second stage of the workflow can send you back to the first stage. This makes it challenging to forecast work and complete it within a given timebox. Work doesn't pull forward through the value stream — it passes back and forth many times before it finishes, and it often fails before completing the full cycle.
It can be quite a culture clash if you have never worked with a data science team before. With enterprises investing in heavily in data science work and data scientists being hired at an exponential rate, it's something you want to prepare for.
To learn more about data science, take a look at our three-part series that dives deeper into the topic: