Inside Our New Simulation Engine

As we announced back in March, we recently overhauled our modeling and simulation engine, including releasing the core functionality as a standalone Python package. In this post we wanted to explore that decision and dig into the design of the new engine. Why make these changes? Why choose this particular approach? Why use JAX? What even is JAX? At the heart of the answers to all these questions is our vision for the future of model-based engineering as a process of iterative optimization.

To illustrate the vision more concretely, imagine your team is tasked with developing an autonomous vertical takeoff and landing (VTOL) aircraft that can seamlessly switch between rotary-wing flight for airfield operations and fixed-wing flight for long-distance cruising. Beyond the design of the aircraft itself, you must develop a robust controller that ensures smooth transitions between these flight modes while balancing overall performance and energy efficiency. The workflow will involve several stages, each posing unique challenges. From calibrating the physical design parameters with real-world data, to iterating on design configurations and planning efficient flight paths, each step requires meticulous tuning to achieve the desired outcomes. In other words, your tools not only need to help you design the aircraft, they also need to help you optimize it.

Why optimize for optimization?

Generally speaking, a wide variety of engineering problems can be formulated as mathematical optimizations. As the fidelity of models improves by fusing traditional and data-driven modeling approaches, engineers can increasingly leverage these models to improve performance, efficiency, and reliability. For example, all of the following stages in the design of our hypothetical VTOL aircraft can be viewed as optimization problems:

‍

Parameter calibration: Even a carefully derived model won’t match reality perfectly. You might need to calibrate coefficients of an analytic aerodynamics model, parameters of the rotor propulsion model, physical parameters like the moment of inertia (and its dependence on the changing configuration), damping coefficients in a structural vibration model, and so on. This involves optimizing all of these parameters against uncertain measurement data to ensure that the model accurately captures the real-world performance of the aircraft.

Early-stage design iterations: In an early stage of model-based systems engineering, there are often many design parameters. Here the goal is to optimize some performance metric under the constraints of design requirements and the architecture of the model. For instance, the placement and size of control surfaces like ailerons and rudders is crucial. The design must balance maneuverability and stability by evaluating various configurations against performance criteria.

‍Path planning/trajectory generation: Efficiently transitioning the aircraft from high-altitude cruising to landing on a confined helipad involves solving a series of optimal control problems. This requires determining the best series of inputs that minimize energy use while adhering to dynamic constraints.

Surrogate modeling: In many cases, design or control decisions may be informed by processes which can be simulated or tested experimentally with high fidelity, but also high cost. At the same time, the information from each such experiment is often relatively compact (e.g. pressure distributions or aerodynamic coefficients from a 3D CFD simulation or wind tunnel test). In this case you might use a fast and approximate surrogate model trained on the high-fidelity data as an effective way to quickly converge towards desired performance. Here there are two nested optimization problems: the surrogate optimizes the mismatch with the input-output map of the high-fidelity experiment, while the surrogate itself can also be used in a higher-level global optimization to efficiently determine promising new design parameters.

As the demand for more accurate, efficient, and reliable models grows, the seamless integration of machine learning with traditional first-principles modeling also becomes increasingly important. Optimization serves as the bridge between these two paradigms, enabling engineers to harness the power of data-driven insights while preserving the interpretability and robustness of physics-based models. Modeling, simulation, and deployment becomes a closed-loop process where the model predictions can inform the design or use of a system, while real-world data feeds back to improve the model—a concept known as “digital twinning”.

However, current engineering tools tend to focus on only one half of this loop, with support for optimization and machine learning as secondary additions. By focusing on optimization, Collimator empowers engineers to rapidly tackle complex challenges and develop cutting-edge, high-performance solutions.

High-level requirements

What does it mean to develop a modeling and simulation tool that emphasizes optimization as a primary objective? We started with a list of requirements and worked backwards from there to a concrete design. Those requirements are:

Intuitive user interface: Whether writing code or using graphical abstractions, users should have to learn as little as possible to get up and running. The easy tasks should be easy, with incremental complexity only incrementally harder. Graphical interfaces should allow for easy collaboration and version control.
High performance and scalability: Large computational workloads are commonplace in modern workflows, so users need to be able to run a large parameter sweep on cloud-based GPU hardware with the same model and code as running a single simulation on a laptop.
Modular and extensible framework: Models of hybrid dynamical systems have never been one-size-fits-all. Users should be able to write custom blocks or components that perform just as well as the built-in versions, giving them the ability to quickly iterate on controller logic or develop high-fidelity proprietary models of their systems.
Automatic differentiation: The “secret sauce” of machine learning, automatic differentiation enables efficient and accurate computation of the sensitivities of an objective or constraint function with respect to model parameters, initial conditions, and neural network weights. To enable large-scale optimization problems constrained by dynamics simulations, our tool must support end-to-end algorithmic differentiation. For a deep dive on “autodiff”, see for instance What's Automatic Differentiation?
Deployment to hardware: While pure software models can be vital in early-stage design iterations, ultimately control logic needs to be deployed to hardware, and our tool has to support a path to hardware, whether the controller is a reinforcement learning agent or a simple discrete-time state machine.

Breaking down the requirements

These design requirements rule out a lot of possible architectures for our simulation engine. For instance, a common modern approach to developing scientific computing tools is to outsource the compute-intensive operations to C/C++ and write a high-level interface in a language like Python. However, this doesn’t let users write custom code on the same level as the built-in functionality unless they want to write the C/C++ code themselves. Worse, it’s often not easy to interface between this kind of architecture and popular machine learning libraries like PyTorch. Considering the implications of each of these requirements, we ended up choosing this part of our tech stack essentially by process of elimination.

First, while an intuitive user interface is ultimately a matter of personal preference, Python has emerged as a clear favorite in a wide range of application domains, with a well-developed and stable ecosystem including scientific computing, machine learning, data visualization, and beyond.

Credit: https://www.kaggle.com/discussions/getting-started/476476

Moreover, if the engine itself is written in Python, then modularity and extensibility is in principle easy to achieve with the right architecture. To that end, the natural choice is the block diagram paradigm, which has dominated disciplines from systems engineering to control theory, including major modeling and simulation tools like Simulink, Modelica, etc. In particular, for this we liked the design of Drake, a C++/Python modeling tool with a similar optimization-oriented vision applied specifically to the domain of robotics. The Drake framework is designed around a block diagram abstraction, but (in common with Modelica, and in contrast to Simulink) it insists on a rigid semantics for the models, ensuring a close correspondence between the simulated behavior and the underlying mathematical representation of a hybrid dynamical system.

On the other hand, by conventional wisdom the interpreted nature of Python more or less rules out high-performance code. However, a new set of “Just-In-Time”, or JIT-compiled Python frameworks like Numba and JAX aim for the best of both worlds, allowing users to write code in a restricted subset of Python that can be transformed to ultra-fast compiled code. Some of these frameworks even allow targeting different compute hardware, making switching between CPU and GPU as easy as setting a flag.

Next, because of the crucial role that automatic differentiation plays in machine learning, the fastest route to writing a differentiable simulation engine is to use one of the popular machine learning libraries like PyTorch, TensorFlow, or JAX directly.

Deployment to hardware is potentially the most difficult piece of the puzzle if the simulation engine is written in Python. The poor performance and unreliable timing of interpreted code, along with the considerable requirements of Python relative to typical microcontrollers means that it is almost never used for industrial control applications. We settled on two solutions, depending on the nature of the controller: (1) if the controller is constructed as low-level discrete logic blocks then we will generate MISRA-compliant C code, or else (2) we compile the code using tools available in the major machine learning libraries and then call the compiled code from a C program. For instance, JAX can follow this path via the XLA compiler and the TensorFlow C API.

Summing up this landscape of sometimes conflicting requirements, the natural starting point for our simulation engine is a JAX-based, Drake-inspired block diagram framework for hybrid dynamical systems modeling and simulation in Python. With these choices, we believe the tool can be easy to use, extensible, performant, differentiable, and deployable.

Our simulation engine architecture

Outside of a somewhat niche community working on projects like training reinforcement learning agents for robotics, both JAX and Drake might be a little obscure, so it’s worth dissecting this choice a little. Why did we feel these were the natural choices, and what does the combination of the two look like in practice?

What is JAX? In the words of their documentation, “JAX is a Python library for accelerator-oriented array computation and program transformation.” In rough terms, what this means is that you write pure Python functions using a NumPy-like syntax that get “traced” by JAX to create an expression graph that can then be used to generate various other related functions, like the gradient (via autodiff) or a vectorized version of the original function. A key transformation is JIT-compilation, which passes the expression graph to XLA, which generates optimized machine code. These compiled functions can be targeted at CPU, GPU, or TPU, and can run orders of magnitude faster than the original Python.

One important implication of this model is that, since the original function doesn’t get called (only the transformed code), the traced code must be “pure” in the sense of functional programming. Once you get used to this and some of the other peculiarities of JAX, this approach is incredibly useful (Google DeepMind for one has been increasingly relying on JAX for their R&D projects). And while JAX has produced an extensive ecosystem of machine learning-oriented tools, we think the combination of a familiar NumPy syntax with automatic differentiation and JIT-compilation also makes it a great framework for modeling and simulation, especially when combining traditional scientific computing with machine learning models. As a recent example, DeepMind ported the popular multibody simulation tool MuJoCo to JAX; now we can seamlessly embed these JAX-native MuJoCo models as a “block” in our framework.

What about Drake? Released in 2021 by Russ Tedrake and the Toyota Research Institute, Drake is a modeling and simulation framework oriented towards advanced robotics applications, but with a broadly similar philosophy in terms of creating models that are amenable to formulations of engineering problems as mathematical optimizations. At its core, Drake uses a block diagram approach inspired by Simulink; a model is a subclass of System, with all individual blocks being instances of a LeafSystem that can be composed into tree-structured Diagram objects of any depth. The time, state, parameters, etc. of a System are likewise contained in a Context object of the same tree structure as the System. That is, an Integrator block has a Context containing its internal state, the Context of a Gain block will contain the gain value as a parameter, and a Diagram containing both blocks will have a Context that contains both “subcontexts”.

But while Drake is a well-developed, carefully thought-out robotics framework, it’s not quite what we were looking for in our general-purpose simulation engine. As with many popular robotics tools, it’s written in C++ with Python bindings; while this is typically a good way to balance performance and high-level usability it does have the drawbacks we mentioned earlier that make it less than ideal for our requirements.

However, although it is of course not implemented this way in C++, the System + Context concept maps neatly to the JAX functional paradigm: the System can be viewed as defining a collection of pure functions (for instance the right-hand side of an ODE or a discrete-time update rule), and the Context is the collection of arguments to those functions. From this point of view, the construction of a system model as a Diagram object is a kind of “metaprogramming” where we are building up complex functions from simple building blocks. Naturally, the implementation becomes quite different from Drake, but we opted to keep the naming conventions because the conceptual structure is fairly similar.

Why it’s different

A JAX-based implementation of the Drake concept would be appealing on its own, but for an optimization-centered general-purpose modeling and simulation tool, we wanted to go even further. To date, we’ve added support for:

An extensive block library, including over 100 foundational blocks and the capability to easily extend with custom Python code
Solving both ordinary differential equations (ODEs) and the more general class of semi-explicit differential-algebraic equations (DAEs)
Modelica-style acausal modeling for electrical/mechanical/thermal/hydraulic systems and more
Finite state machines to capture control logic and switching behavior
Built-in neural network blocks for common use cases like the multi-layer perceptron, with additional support for uploading pre-trained models from JAX, PyTorch, and TensorFlow.
A fully differentiable simulation runtime, with reverse-mode automatic differentiation (i.e. “backpropagation”) over entire simulation trajectories, including event handling. The same autodiff can be used for efficient implementations of state estimation and predictive control algorithms.
An easy-to-use interface for defining a stochastic optimization problem based on a block diagram model and solving with 50+ local and global optimization algorithms

Again, this is all at the level of the Python simulation engine. In our browser-based app we have much more, including a clean and modern graphical user interface for constructing complex hierarchical models, running massively parallel cloud-based ensemble simulations, executing flexible optimization workflows, and support for real-time collaboration and version control. This cloud-based platform has Python at its core, essentially giving us the expressiveness and ecosystem you’ve come to expect from this language.

Another benefit of our deep integration with Python is that it gives engineers flexibility in how they architect their models. For instance, here are two examples of how to represent a simple combustion model—the first with “primitive” blocks, and the second using Python code. These are taken from our rocket engine demo project:

Whether you prefer the visual graph approach, the lines of code approach, or a combination, you can choose an appropriate strategy at virtually every level. The simulation engine will straightforwardly handle context management, zero-crossing detection, multirate discrete events, acausal equation processing, and JAX transformations—all of which can quickly become very complex when implementing from scratch.

What can you do with it?

So far, the bulk of our effort on the simulation engine has gone into getting the foundations right, and we’re still rapidly developing the higher-level capabilities. With that said, our social media and website documentation has several tutorials that walk through some of the exciting capabilities unlocked by this new engine:

Automatically tuning PID gains to minimize a user-defined cost function
Discrepancy modeling with physics-informed machine learning
Using differentiable simulation to train a neural network controller and then testing on hardware
Trajectory optimization and nonlinear model-predictive control for a quadcopter
Building surrogate models of a battery using SINDy and machine learning

In addition, you can find many more example projects showcasing more complex model construction at app.collimator.ai, including a satellite formation controller, ventilator, pacemaker, wind turbine, quadrotor drone, F-16 fighter jet, and a staged-combustion rocket engine.

The future of model-based engineering

We’re still in the early stages of this project, but we believe that combining analytic models and data-driven methods with a scalable and intuitive user interface will be transformative for model-based engineering workflows. In the digital twin concept, first-principles models and controllers are continually fused with test data, ensuring both that design and operations are optimized, and that requirements are continuously verified. From a systems engineering perspective, this is a hybridization of the classic “V” diagram and the iterative development life cycle. By formulating design, analysis, and control tasks as a series of data-driven optimizations, we can begin to close the loop much earlier in the process.

While optimization problems have been actively researched for many decades and interest in digital twins has been steadily growing, our new simulation engine offers an innovative approach to your modeling and simulation workflow. The combination of Python, data-driven design, integrated optimization, and digital twin capabilities will supercharge your systems development so you can build better and faster than ever before. This new paradigm will give you a significant competitive edge in today’s rapidly evolving market.

‍

Want to learn more? Check out the documentation and tutorials for our simulation engine at py.collimator.ai and get started with the browser-based modeling environment at app.collimator.ai.

‍