As we announced back in March, we recently overhauled our modeling and simulation engine, including releasing the core functionality as a standalone Python package. In this post we wanted to explore that decision and dig into the design of the new engine. Why make these changes? Why choose this particular approach? Why use JAX? What even is JAX? At the heart of the answers to all these questions is our vision for the future of model-based engineering as a process of iterative optimization.
To illustrate the vision more concretely, imagine your team is tasked with developing an autonomous vertical takeoff and landing (VTOL) aircraft that can seamlessly switch between rotary-wing flight for airfield operations and fixed-wing flight for long-distance cruising. Beyond the design of the aircraft itself, you must develop a robust controller that ensures smooth transitions between these flight modes while balancing overall performance and energy efficiency. The workflow will involve several stages, each posing unique challenges. From calibrating the physical design parameters with real-world data, to iterating on design configurations and planning efficient flight paths, each step requires meticulous tuning to achieve the desired outcomes. In other words, your tools not only need to help you design the aircraft, they also need to help you optimize it.
Generally speaking, a wide variety of engineering problems can be formulated as mathematical optimizations. As the fidelity of models improves by fusing traditional and data-driven modeling approaches, engineers can increasingly leverage these models to improve performance, efficiency, and reliability. For example, all of the following stages in the design of our hypothetical VTOL aircraft can be viewed as optimization problems:
As the demand for more accurate, efficient, and reliable models grows, the seamless integration of machine learning with traditional first-principles modeling also becomes increasingly important. Optimization serves as the bridge between these two paradigms, enabling engineers to harness the power of data-driven insights while preserving the interpretability and robustness of physics-based models. Modeling, simulation, and deployment becomes a closed-loop process where the model predictions can inform the design or use of a system, while real-world data feeds back to improve the model—a concept known as “digital twinning”.
However, current engineering tools tend to focus on only one half of this loop, with support for optimization and machine learning as secondary additions. By focusing on optimization, Collimator empowers engineers to rapidly tackle complex challenges and develop cutting-edge, high-performance solutions.
What does it mean to develop a modeling and simulation tool that emphasizes optimization as a primary objective? We started with a list of requirements and worked backwards from there to a concrete design. Those requirements are:
These design requirements rule out a lot of possible architectures for our simulation engine. For instance, a common modern approach to developing scientific computing tools is to outsource the compute-intensive operations to C/C++ and write a high-level interface in a language like Python. However, this doesn’t let users write custom code on the same level as the built-in functionality unless they want to write the C/C++ code themselves. Worse, it’s often not easy to interface between this kind of architecture and popular machine learning libraries like PyTorch. Considering the implications of each of these requirements, we ended up choosing this part of our tech stack essentially by process of elimination.
First, while an intuitive user interface is ultimately a matter of personal preference, Python has emerged as a clear favorite in a wide range of application domains, with a well-developed and stable ecosystem including scientific computing, machine learning, data visualization, and beyond.
Moreover, if the engine itself is written in Python, then modularity and extensibility is in principle easy to achieve with the right architecture. To that end, the natural choice is the block diagram paradigm, which has dominated disciplines from systems engineering to control theory, including major modeling and simulation tools like Simulink, Modelica, etc. In particular, for this we liked the design of Drake, a C++/Python modeling tool with a similar optimization-oriented vision applied specifically to the domain of robotics. The Drake framework is designed around a block diagram abstraction, but (in common with Modelica, and in contrast to Simulink) it insists on a rigid semantics for the models, ensuring a close correspondence between the simulated behavior and the underlying mathematical representation of a hybrid dynamical system.
On the other hand, by conventional wisdom the interpreted nature of Python more or less rules out high-performance code. However, a new set of “Just-In-Time”, or JIT-compiled Python frameworks like Numba and JAX aim for the best of both worlds, allowing users to write code in a restricted subset of Python that can be transformed to ultra-fast compiled code. Some of these frameworks even allow targeting different compute hardware, making switching between CPU and GPU as easy as setting a flag.
Next, because of the crucial role that automatic differentiation plays in machine learning, the fastest route to writing a differentiable simulation engine is to use one of the popular machine learning libraries like PyTorch, TensorFlow, or JAX directly.
Deployment to hardware is potentially the most difficult piece of the puzzle if the simulation engine is written in Python. The poor performance and unreliable timing of interpreted code, along with the considerable requirements of Python relative to typical microcontrollers means that it is almost never used for industrial control applications. We settled on two solutions, depending on the nature of the controller: (1) if the controller is constructed as low-level discrete logic blocks then we will generate MISRA-compliant C code, or else (2) we compile the code using tools available in the major machine learning libraries and then call the compiled code from a C program. For instance, JAX can follow this path via the XLA compiler and the TensorFlow C API.
Summing up this landscape of sometimes conflicting requirements, the natural starting point for our simulation engine is a JAX-based, Drake-inspired block diagram framework for hybrid dynamical systems modeling and simulation in Python. With these choices, we believe the tool can be easy to use, extensible, performant, differentiable, and deployable.
Outside of a somewhat niche community working on projects like training reinforcement learning agents for robotics, both JAX and Drake might be a little obscure, so it’s worth dissecting this choice a little. Why did we feel these were the natural choices, and what does the combination of the two look like in practice?
What is JAX? In the words of their documentation, “JAX is a Python library for accelerator-oriented array computation and program transformation.” In rough terms, what this means is that you write pure Python functions using a NumPy-like syntax that get “traced” by JAX to create an expression graph that can then be used to generate various other related functions, like the gradient (via autodiff) or a vectorized version of the original function. A key transformation is JIT-compilation, which passes the expression graph to XLA, which generates optimized machine code. These compiled functions can be targeted at CPU, GPU, or TPU, and can run orders of magnitude faster than the original Python.
One important implication of this model is that, since the original function doesn’t get called (only the transformed code), the traced code must be “pure” in the sense of functional programming. Once you get used to this and some of the other peculiarities of JAX, this approach is incredibly useful (Google DeepMind for one has been increasingly relying on JAX for their R&D projects). And while JAX has produced an extensive ecosystem of machine learning-oriented tools, we think the combination of a familiar NumPy syntax with automatic differentiation and JIT-compilation also makes it a great framework for modeling and simulation, especially when combining traditional scientific computing with machine learning models. As a recent example, DeepMind ported the popular multibody simulation tool MuJoCo to JAX; now we can seamlessly embed these JAX-native MuJoCo models as a “block” in our framework.
What about Drake? Released in 2021 by Russ Tedrake and the Toyota Research Institute, Drake is a modeling and simulation framework oriented towards advanced robotics applications, but with a broadly similar philosophy in terms of creating models that are amenable to formulations of engineering problems as mathematical optimizations. At its core, Drake uses a block diagram approach inspired by Simulink; a model is a subclass of System, with all individual blocks being instances of a LeafSystem that can be composed into tree-structured Diagram objects of any depth. The time, state, parameters, etc. of a System are likewise contained in a Context object of the same tree structure as the System. That is, an Integrator block has a Context containing its internal state, the Context of a Gain block will contain the gain value as a parameter, and a Diagram containing both blocks will have a Context that contains both “subcontexts”.
But while Drake is a well-developed, carefully thought-out robotics framework, it’s not quite what we were looking for in our general-purpose simulation engine. As with many popular robotics tools, it’s written in C++ with Python bindings; while this is typically a good way to balance performance and high-level usability it does have the drawbacks we mentioned earlier that make it less than ideal for our requirements.
However, although it is of course not implemented this way in C++, the System + Context concept maps neatly to the JAX functional paradigm: the System can be viewed as defining a collection of pure functions (for instance the right-hand side of an ODE or a discrete-time update rule), and the Context is the collection of arguments to those functions. From this point of view, the construction of a system model as a Diagram object is a kind of “metaprogramming” where we are building up complex functions from simple building blocks. Naturally, the implementation becomes quite different from Drake, but we opted to keep the naming conventions because the conceptual structure is fairly similar.
A JAX-based implementation of the Drake concept would be appealing on its own, but for an optimization-centered general-purpose modeling and simulation tool, we wanted to go even further. To date, we’ve added support for:
Again, this is all at the level of the Python simulation engine. In our browser-based app we have much more, including a clean and modern graphical user interface for constructing complex hierarchical models, running massively parallel cloud-based ensemble simulations, executing flexible optimization workflows, and support for real-time collaboration and version control. This cloud-based platform has Python at its core, essentially giving us the expressiveness and ecosystem you’ve come to expect from this language.
Another benefit of our deep integration with Python is that it gives engineers flexibility in how they architect their models. For instance, here are two examples of how to represent a simple combustion model—the first with “primitive” blocks, and the second using Python code. These are taken from our rocket engine demo project:
Whether you prefer the visual graph approach, the lines of code approach, or a combination, you can choose an appropriate strategy at virtually every level. The simulation engine will straightforwardly handle context management, zero-crossing detection, multirate discrete events, acausal equation processing, and JAX transformations—all of which can quickly become very complex when implementing from scratch.
So far, the bulk of our effort on the simulation engine has gone into getting the foundations right, and we’re still rapidly developing the higher-level capabilities. With that said, our social media and website documentation has several tutorials that walk through some of the exciting capabilities unlocked by this new engine:
In addition, you can find many more example projects showcasing more complex model construction at app.collimator.ai, including a satellite formation controller, ventilator, pacemaker, wind turbine, quadrotor drone, F-16 fighter jet, and a staged-combustion rocket engine.
We’re still in the early stages of this project, but we believe that combining analytic models and data-driven methods with a scalable and intuitive user interface will be transformative for model-based engineering workflows. In the digital twin concept, first-principles models and controllers are continually fused with test data, ensuring both that design and operations are optimized, and that requirements are continuously verified. From a systems engineering perspective, this is a hybridization of the classic “V” diagram and the iterative development life cycle. By formulating design, analysis, and control tasks as a series of data-driven optimizations, we can begin to close the loop much earlier in the process.
While optimization problems have been actively researched for many decades and interest in digital twins has been steadily growing, our new simulation engine offers an innovative approach to your modeling and simulation workflow. The combination of Python, data-driven design, integrated optimization, and digital twin capabilities will supercharge your systems development so you can build better and faster than ever before. This new paradigm will give you a significant competitive edge in today’s rapidly evolving market.
Want to learn more? Check out the documentation and tutorials for our simulation engine at py.collimator.ai and get started with the browser-based modeling environment at app.collimator.ai.