Research – Melkior Ornik

My research is broadly founded on exploring the interplay between control, data availability, and learning in order to design provably safe, successful, and efficient planning strategies for systems operating in complex or unknown environments. Inversely, I am interested in identifying control policies and agent intent from data, and communicating the findings to a human supervisor in an explainable manner. Some of my recent areas of interest are described below. If you want to learn more, please look at my recent publications or contact me.

Research Funding

My work is currently supported by the Air Force Young Investigator Program award with the Resilience and Guaranteed Task Completion for Partially Unknown Nonlinear Control Systems project, the DRILLAWAY: aDaptive, ResIllient Learning-enabLed oceAn World AutonomY and the Robust and Resilient Autonomy for Advanced Air Mobility projects funded by NASA, the Distributed Swarm Planning in Complex, Low-Communication Environments and Synthesizing Temporal Logic and Human Performance Models for Deception Mitigation projects funded by the Office of Naval Research, the Optimal Infrastructure Assessment and Management Through Active Learning and Data-Driven Planning project funded by the United States Army Engineer Research and Development Center, and the Optimal Planning for Ag Systems funded by Corteva.

We concluded the Safety-Constrained and Efficient Learning for Resilient Autonomous Space Systems project funded by NASA, System for Avoidance and Flight-path Execution based on Risk Reduction (SAFERR) project funded by the United States Air Force’s AFWERX program, the Seedling: Synthesis of Control Protocols for Integrated Mission Planning, Resource Management and Information Acquisition project funded by the Defense Advanced Research Projects Agency, the Learning of Time-Varying Dynamics project funded by Sandia National Laboratories, the Net Zero Transportation Infrastructure project funded by the Discovery Partners Institute, the Enhancing Opportunities for Research and Training in Space Engineering project funded by the US Department of Education, and the intensive summer research project LEONA: Logic-Based Context-Aware Activity Interpreter for Geospatial Intelligence funded by the University of Illinois Urbana-Champaign’s New Frontiers Initiative, aligning with the mission and needs of the National Geospatial-Intelligence Agency.
.

Some Contributions

Certifiable Planning and Control for Systems with Unknown Dynamics

Faced with a mid-mission catastrophe or a significantly different environment from the one originally expected, it is often impossible for a system to complete its original task; hence, classical methods of adaptation and robustness — focused on adapting the system’s control capabilities to meet its original objective — will fail. My research develops methods that recognize whether the original task can be completed, choose an alternative task if it cannot, and ensure that the system completes the chosen task.

My group’s work introduced a guaranteed reachable set of a partially unknown system as the set of all states reachable for every system consistent with partial knowledge about new system dynamics. Underapproximating the guaranteed reachable set by exploiting the knowledge of the system’s reachable set prior to the adverse event and/or partial knowledge of the new system dynamics allows us to quickly determine tasks that the system can certifiably complete, even before we know how to complete them. After determining such a task, the remaining challenge is to perform online learning and adaptation to drive the system to succeed at its new mission.

In a scenario where the unknown dynamics are precipitated by partial failure or hostile takeover of actuators, the controller needs to adapt to undesirable actuator inputs in real time to keep the system progressing towards completing its task. The video below, made by a graduated LEADCAT member Jean-Baptiste Bouvier, illustrates the resilience of a spacecraft to adversarial inputs in successful completion of an orbital inspection mission.

In a scenario of completely unknown dynamics, our previous work on myopic control aims to continually relearn local system dynamics by applying short-time “test inputs”. Using this method, we have shown that even a significantly damaged aircraft can remain in the air. The following video, made by Steven Carr — then a PhD student at UT Austin — shows a high-fidelity simulation of a Boeing 747-200 which lost 33% of its right wing, controlled using myopic control.

Planning, Learning, and Control in Complex, Changing or Uncertain Environments

Controlling a system in an unknown environment inevitably requires learning about the environment and the system’s interaction with it. Learning and operating in an entirely unknown environment is time-consuming and risky. However, systems rarely operate in environments that are entirely unknown. They often have access to some collected at earlier times, information collected by agents on parallel and complementary missions, or information about physical laws of the environment. My group’s work exploits this information to learn quicker and subsequently plan better.

Our work in this area spans across domains: one recent line of work seeks to optimally plan control policies in time-varying, a priori unknown environments. The video below, made by LEADCAT member Gokul Puthumanaillam, illustrates successful control of a vehicle operating in a previously unknown, time-varying stochastic environment, motivated, e.g., by changing terrain properties due to inclement weather.

Another effort focuses on fast planning for an extraterrestrial lander based on prior testing on Earth and real-time sensor data. The video below, made by LEADCAT’s Pranay Thangeda using the equipment of Kris Hauser‘s research group, illustrates our recent hardware experiment at learning to scoop in an extraterrestrial environment.

Keeping with the extraterrestrial lander effort — in this case a rover operating on a model of the lunar surface — the video below, also made by Pranay Thangeda, illustrates the difference in online learning of system dynamics between an agent that learns solely from the outcomes of its actions and an agent that also collects and uses side information about similarity between dynamics in different areas.

Behavior Inference, Supervision, and Deception

To ensure success in a hostile environment, an agent being observed may want to hide its true intentions from the observer for as long as possible. Inversely, an agent observing an adversary needs to infer its important behavioral features and uncover possible deception. My current work deals with both sides of this coin. On one hand, I have been exploring methods that agents can use to seem as unpredictable as possible while ultimately satisfying their objective or, alternatively, seem to follow a particular “decoy” policy as closely as possible while also proceeding to their objective. The resulting algorithms often match the human intuition of deceptive behaviors, and have been shown to successfully fool autonomous adversaries about the agent’s intention. On the other hand, I am interested in developing a formal understanding of spatiotemporal behavior, and quantitative metrics of the “importance” of some behavior to a given mission.

Along with work on (counter)deceptive policies, we introduced the notion of counterdeceptive environment design, i.e., placement of environmental features in a way that makes it easy to uncover the adversary’s true intentions. Investigating optimal environment design spawns interesting optimization questions: motivated by a geometry-inspired simplification, one of our recent papers interprets the challenge as a classical p-dispersion optimization problem. Our novel numerical solution, motivated by physics, is the first to systematically deal with nonconvex environments in a computationally feasible manner, exceeding the best known results from previous work.