Deploying Machine Learning Models (is still terrible)

Designing, building, and training models are hard problems. Literature is reviewed, data is gathered (and cleaned and annotated). Hours (or perhaps days? weeks?) are spent getting code to work, fixing subtle bugs, tuning training schemes. Finally, the metrics look good. It is time to deploy. And a new world of pain is entered.

Backend services are created and deployed all the time. Doing so with an ML system is often frustratingly hard. Why?

More Moving Parts

Model weights need to match the architecture. Multiple sets of weights can be produced from a single dataset and single architecture. In reality, neither the dataset nor the model architecture are likely to be static. There may be a feedback loop between the model’s usage, and data it is trained on. Models often involve a number of preprocessing steps. This, as well as the model itself, are likely being worked on in real-time.

Performance Monitoring / Distribution Shift

Related to the above, a model being “deployed” is not the end of the story. In addition to the logging and monitoring needed for traditional backend services, the distribution of input data needs to be monitored. Model performance might decrease (or increase) as the inputs change. This might mean continually annotating a subset of new data, or continually retraining. If retraining is common, a test-set of carefully selected examples might be employed, to ensure the model doesn’t accidentally regress.

Computationally Expensive

Software goes faster by doing less (for a given bit of hardware). ML models do a lot. This means they are slower (and usually require more memory) than normal backend services. Increasing the hardware budget can get around some of these problems (and might enable the use of things like GPUs¹).

Depending on the application, there may be hard requirements. Perhaps running on a mobile phone, and/or processing a certain number of images per second. Requirements such as these may prohibit many types of model.

None of this is different from non-ML software, but higher computational cost makes these problems harder.

Confluence of Skills

I mentioned that backend services are created and deployed all the time. This is true, but that doesn’t mean doing so is trivial. Engineers can follow best practices developed over many years, and use battle tested technology. DevOps expertise is highly valued, and a key part of deployment. The components required to deploy a system are relatively well understood, and a team is likely to have years of experience using them.

Deploying ML services requires the same knowledge – as well as an understanding of the model. In spite of this, ML/Data Science teams are often separate from “traditional” engineering teams. A model might be handed off to a different set of engineers for deployment, or the ML team may be asked to do so themselves². In either case, the team doing the work does not poses all of the required knowledge. Even worse, those missing skills will have to be learned without the help of experience team members.

For products completely enabled by ML, a very wide range of skills might need to be employed. Model design and training, low level optimisation of c++ or cuda kernels, higher level system architecture that works with model limitations, DevOps, and so on. Building teams that know enough about all of these areas is not trivial.

Conclusion

Machine Learning is probably somewhere around the Trough of Disillusionment on the Gartner Hype Cycle (though perhaps others feel it is still over hyped). One of the missing pieces needed to build (useful) ML products is knowledge around deployment. This gap is currently being filled. ML Engineering roles are becoming more common, and communities like MLOps are very active. There are also an increasing number of tools that try to solve various parts of the deployment process – KubeFlow, MLFlow, MetaFlow, SageMaker, Weights & Biases, and (very) many more. Of course, understanding the strengths and limitations of each of these tools is a lot of work.

I am thinking of writing a series of code-first blog posts on different topics related to deployment. While there are more than enough resources to be found about training any kind of model, there are far fewer high quality resources related to the practical considerations of deployment. And fundamentally, if you don’t deploy a model after all the hard work it takes to train it, what was the point?

If there is anything you would like to see covered (or you are looking for advice), feel free to get in touch @latkins_ on twitter.

Unfortunately, you now need to deal with CUDA. ↩︎
Fortunately, Data Scientists who are “too good” to code are nothing like as common as they were 5 years ago. ↩︎