Venue:
3W04
Lecturer:
Alex Schlögl - researcher at SEC
Abstract:
For a fixed training model and fixed input data, inference results are not consistent across hardware configurations, and sometimes not even deterministic on the same hardware configuration. We performed a deep dive into a typical machine learning inference stack and identified algorithm selection, floating point accuracy, aggregation order, data parallelism, and task parallelism as causes for numerical deviations. I will present existential evidence for our identified root causes, highlighting the complex interaction between TensorFlow, CUDA, and the Eigen linear algebra library. I will also show how these factors combine to yield different results on a large set of different hardware configurations. Finally, I will briefly discuss the implications these deviations can have on forensics, machine learning security, and applications of machine learning.