The Security Risks of Pickle in Machine Learning

Pickle has been the de facto format for ML models primarily because of its ease of use, native Python support, and (questionable) portability. But it is also extremely insecure due to its ability to execute arbitrary code during deserialization. While this has been known for a while, as we “productionize” machine learning it’s more important than ever to pay attention.

Why Pickle Is Dangerous

Pickle is the default serialization format for PyTorch model weights. The core issue is that when you load a pickle file, it can execute arbitrary Python code. This means a maliciously crafted model file could run harmful code on your machine the moment you deserialize it.

Mitigations

There are several ways to reduce this risk:

Load models from trusted sources — only use models from users and organizations you trust
Use signed commits — verify the integrity and origin of model files
Use safer formats — consider loading models from TensorFlow or JAX formats, or use the newer safetensors format which is designed to be a simple, safe serialization format for model weights
Inspect imports — Hugging Face Hub now displays the list of imports in any pickled file so you can vet them before loading

As machine learning models increasingly make their way into production systems, treating model files as potential attack vectors is no longer optional — it’s essential.