My work is currently supported by the Safety-Constrained and Efficient Learning for Resilient Autonomous Space Systems project funded by NASA, the Seedling: Synthesis of Control Protocols for Integrated Mission Planning, Resource Management and Information Acquisition project funded by the Defense Advanced Research Projects Agency, and the Enhancing Opportunities for Research and Training in Space Engineering project funded by the US Department of Education. We recently concluded the Learning of Time-Varying Dynamics project funded by Sandia National Laboratories,
My research is founded on exploring the interplay between control, data availability, and learning in order to design provably safe, successful, and efficient control strategies for systems operating in complex or unknown environments. Some of my recent areas of interest are described below. If you want to learn more, please look at my recent publications or contact me.
Role of Side Information in Learning
Learning to optimally control a system in an entirely unknown environment is time-consuming and risky. Luckily, we rarely operate systems in environments that are entirely unknown. We often have access to some information collected by previous missions in the same or a similar environment, information collected by agents on parallel and complementary missions, or information following from physical laws of the environment. I am interested in exploiting this information — potentially given in a form that is vague or not directly usable by the system’s control algorithms — in order to improve learning and planning. Our work in this area spans across domains: one recent line of work seeks to optimally plan a fast and reliable public transit route for a passenger given a priori joint probability distributions on travel times between stops and online data from all transit vehicles vehicles. Another focuses on focuses on optimally planning a control strategy of an extraterrestrial rover based on information collected from an orbiter prior to the mission and real-time sensor data. The video below, made by Pranay Thangeda from our research group, illustrates the difference in online learning of system dynamics between an agent that learns solely from the outcomes of its actions and an agent that also collects and uses side information about similarity between dynamics in different areas.
Real-Time Control of Systems with Unknown Dynamics
I am broadly interested in developing methods to retain control over a system whose dynamics have undergone a rapid, unexpected, and significant change, partly motivated by an example of an aircraft that sustains damage mid-flight. Our recent work broadly follows two directions. In the first, we are interested in a priori resilience: designing and placing the actuators of a system in such a way that the system will certifiably be able to survive loss of control over some of the actuators. The second deals with a posteriori resilience: trying to estimate possibly time-varying changed system dynamics, describe the set of objectives that are provably achievable by such a changed system, and ultimately reach one of those objectives. Using a method of myopic control, which aims to continually relearn local system dynamics and apply the control input that seems to be performing the best, we have shown that even a significantly damaged aircraft can remain in the air. The following video, made by Steven Carr from UT Austin, shows a high-fidelity simulation of a Boeing 747-200 which lost 33% of its right wing, controlled using myopic control.
Behavior Inference, Supervision, and Deception
I have been exploring methods that agents can use to hide their true intentions from an observer and, inversely, methods that observers can use to detect such deception and infer the behavior of an observed agent. These methods ultimately contribute to protocols that supervisors can use to ensure that an autonomous agent correctly understands and performs its mission. One mathematically interesting underlying problem is to define an appropriate “detectability” metric differentiating between agent’s actions when pursuing two objectives (the agent’s true intention vs. its stated intention). A deceptive agent wants to take actions which minimize such a difference, while still achieving its true objective. The supervisor wants to take actions — or design the environment — in such a way that the agent has no choice but either pursue its stated objective or do something that obviously contrasts with such an objective. We are interested both in designing optimal deceptive strategies, and optimally defending against those.