I love working with data. But does it cause me some frustration? You bet.
Having worked as a data scientist for over five years, I’ve felt the pain of trying to balance my ambitions for data with the limitations that come when operationalizing data. For instance, I left a position at a company I liked because we didn’t have the resources to get a machine learning (ML) model into production. We worked very hard to solve tasks using data and ended up not having the opportunity to impact the product. When your job becomes bogged down with organizational and operational challenges, it can be easy to grow disheartened and lose sight of why you liked working with data in the first place.
In this blog, I’d like to talk about a data scientist’s journey:
- What motivates our interest in the world of data
- The limitations we face when expectations meet realities
- The possibilities we have to change how we operationalize data by better integrating data science into technologies like Robotic Process Automation (RPA)
- Why I’m enthusiastic about AI Fabric, a solution that integrates artificial intelligence (AI) with RPA, to drive new opportunities for making the most of data
For the love of data I like being a creative problem solver, and data empowers creative problem-solving.
Using data to tackle difficult tasks and solve challenges that impact people’s lives felt like a natural career path for me. Many of the data scientists I’ve worked with got into the field to learn how to use data to solve problems. We’re passionate about understanding the data we have, exploring, developing, and using ML algorithms to test on our data, and then finding ways to bring new solutions to fruition through the power and insights we drive with the models we build.
When I decided to become a data scientist, I knew there were tasks and possible headaches that come with the job. Regardless of the type of data you work with, you’ll inevitably:
- Spend time processing and cleaning your data
- Wait a while for your models to train
- Spend time trying out different hyperparameters
The more I worked with data, the more cognizant I became about how complicated data science can become within the confines of an organization. The realities associated with being a data scientist began to overshadow my original motivations for getting into the field.
When reality strikes: Setting expectations and managing data end-to-end
Many companies are embracing a data-driven approach to development and are at the beginning stages for exploring ML. The role of the data scientist is still fairly rare, and in many cases, misunderstood. Different challenges can arise for data scientists when we begin to operationalize data within a company and move forward using data to solve problems.
Setting expectations about what an organization can—and can’t—do with ML is one area where we spend a lot of our time. It’s important to educate others about the nature of our roles as data scientists, where we wish to focus our time, and what we need for our projects to be successful.
One other challenge is the fact that data science operations are often siloed within organizations. This can limit the ability of data science projects to bring value to an organization.
ML models alone can’t, and don’t, do anything—they must work in conjunction with other teams and be included as a part of a larger project to be successful.
Besides, it is often very hard to show the return of investment (ROI) driven by the models. Data scientists often face an uphill battle making a case for the role of ML within an organization. Data scientists can spin many cycles making the case for the part we seek to play and what we need to make an impact.
Tackling the data itself brings its own set of unique challenges. We often spend more time gathering, consolidating, and cleaning datasets, rather than working to understand the data and building models. Unless a continuous integration and continuous delivery (CI/CD) pipeline for your models is already built within a company, much of our time becomes devoted to creating a scalable pipeline to take your model from your local machine to staging and production. This is not only outside of our scope of work but takes time away from us that we want to put toward model building and testing.
Ongoing model monitoring can also be a challenge you’re not prepared for. Do we experiment with any data drift over time? Are the data in production still the same as the data we used for training? Are the outputs still under control? With new data, does our model perform as well as the base model, which was built with the training set? When do you need to update the ML model?
To get back to what I love, I’ve sought out opportunities to work for companies that prioritize the integration of data science into broader processes and planning. Today, I’m excited to be working for a company that not only prioritizes data science internally but is actively working to help companies operationalize and consume ML models to drive better business outcomes.
Getting back to what you love with AI Fabric
As more and more organizations use RPA to streamline processes, opportunities arise for data scientists to operationalize data in new ways.
Here at UiPath, we’re committed to bringing together data science and RPA and empowering businesses to drive new outcomes using intelligent automation. By bringing data science together with RPA, we want to alleviate many of the above challenges data scientists face on a day-to-day basis in the automation world. We’re driving these efforts with AI Fabric.
Related read: How Heritage Bank is Using AI and AI Fabric
We believe that data science and RPA are better when they work together. It’s crucial to make data science an integral part of an RPA Center of Excellence (CoE) by bringing in data scientists to outline what’s possible when using data and ML to enhance RPA capabilities.
Through the development of AI Fabric, we’re focused on helping organizations think about ML as a step inside the automation process. We want to help users integrate ML with RPA development more seamlessly. Using AI Fabric and RPA, data scientists can simplify data pipeline builds with tooling that focuses on the preprocessing and data gathering. They can deploy models with ease, monitor models, and embrace an RPA workflow designed to make humans and ML models work together.
By integrating data science with RPA, we want to help data scientists prove out the ROI for the models built and deployed, and focus the majority of time on exploring data and refining models that solve real-world problems.
What would you do with more freedom to focus on your data?
I know from personal experience that empowering data scientists to focus on solving problems using data and integrating data science into existing processes can change the way an organization evolves and grows.
What matters most to me is helping customers drive better outcomes. In my role at UiPath, I’ve seen firsthand how companies can automate more complex processes by integrating data science with RPA. It’s rewarding to watch data scientists become liberated from common data operationalization challenges as companies build data science into RPA deployments through products like AI Fabric.
Jeremy Tederry has an M.S. in computer science, specialized in efficient and intelligent software. His master thesis focused on machine learning. Tederry worked for five years as a data scientist in two different startups. In both companies, he faced the same issues about operationalizing models and was forced to turn himself into a data product and evangelist position because nobody else could do it. He joined UiPath at the end of 2018 as a machine learning product manager.