Advancements in AI/ML Explainability and Safety Research

In 2023, researchers at AIS proposed an internal research and development (IRaD) project seeking to develop novel technologies to deeply inspect Artificial Intelligence (AI) and Machine Learning (ML) based systems and algorithms. The team addresses this problem with a multi-disciplinary approach, applying knowledge from different internal groups at AIS including experts in AI/ML research and experts in the research and development of innovative software reverse engineering technologies. AIS’s AI/ML expertise spans the fields of Explainable AI (XAI), reinforcement learning, adversarial learning, safety and natural language processing.

“Before we can confidently rollout and leverage the power of AI/ML technologies, we need to better understand the algorithm’s process for achieving a certain conclusion,” said Georgia-Raye Moore, Advanced Research Program Manager. “This need has spawned into its own field within the industry and is referred to as ‘explainability’ or Explainable AI (XAI).”

AIS’s history of research in software analysis and inspection, which dates back to our founding, lends itself to an application in AI/ML based technologies.

“Decision makers need methods to deeply understand the risks and benefits of the AI/ML systems proposed for deployment within critical environments,” said Andrés Colón, Principal Investigator for the project. “The question our team seeks to address is: How can we better understand the underlying behaviors of often opaque AI/ML systems and build confidence prior to deployment?”

As part of this effort, AIS has been working to contribute to the space of XAI through the creation of a platform for unprecedented insight into AI/ML models.

“We are currently probing neural networks at the lowest levels during execution and surfacing this information to powerful analysis and visualization routines, clearing the path from opaque black-box neural network to unprecedented insight,” said Logan Boyd, Research Scientist and Technical Lead for the project. “The system is designed around extensibility so that researchers can build upon it and continue to push the state of the art in novel and interesting ways.”

The team’s preliminary assessment of existing AI/ML models has shown promise in the tool’s ability to identify, inspect and visualize a model’s inner workings, allowing us to gain valuable insights. The research suggests promising contributions to AI/ML analysis and assessment in areas such as quality assurance pipelines, extracting runtime information and visualization. AIS is currently leveraging this tool for innovative research in the fields of adversarial input detection, large language models and XAI.

The team is especially excited about this work as it lends to the testing and decomposition of complex AI/ML applications that are currently difficult to understand. Most importantly, this research supports making revolutionary AI/ML systems more trustworthy so they can confidently be deployed.

“It’s important for those adopting new AI/ML technologies to understand that models can have flaws, inherent limitations or blind spots,” said Gary Hamilton, Strategic Account Manager. “Our research aims to make these technologies safer by better understanding the system and enabling rapid adoption to ensure continued superiority.”

The team is eager to continue this research and the development of their tool in 2024. Learn more about advanced research at AIS:

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google
Consent to display content from - Spotify
Sound Cloud
Consent to display content from - Sound