Artificial intelligence (AI) and generative artificial intelligence (GenAI) are developing at an incredible pace, but we cannot understand how they make decisions, which is why they are called “black boxes”. Scientists say they can now look under the hood of artificial intelligence. Will this make AI models safer?
Why are artificial intelligence algorithms called “black boxes”?
Artificial intelligence algorithms, especially those involving elaborate machine learning (ML) models such as deep neural networks, are modeled on the human brain. They receive input and send output layer by layer until the final result. However, their internal decision-making processes are elaborate, murky and arduous to interpret, which is why they are called “black boxes”. For example, if an autonomous car hits a pedestrian instead of applying the brakes, it will be arduous to trace the system’s thought process and determine why it made that decision. This has significant implications for trust, transparency, accountability, bias and error correction in such models.
How is this issue currently being addressed?
This process includes improving model transparency, auditing model decisions and introducing regulatory measures to enforce explainability, as well as ongoing research and community collaboration to advance the field of explainable artificial intelligence (XAI). It focuses on developing methods to enhance the interpretation of artificial intelligence with the assist of researchers, ethicists, legal experts and specialists. Google, Microsoft, IBM, OpenAI and credit scoring service Fair Isaac Corp are developing XAI techniques, while governments in the EU, US etc. are actively promoting and regulating their ethical and clear utilize.
What’s the latest development?
Last October, Anthropic, an artificial intelligence start-up, said it had managed to split neural networks into pieces that humans could understand by applying a technique called “dictionary learning” to a very diminutive “toy” language model and breaking down groups of neurons into interpretable In May this year, this technique was scaled to influence the results and behavior of the model.
Will this make artificial intelligence safer and less scary?
Today, patients and doctors would not know how an artificial intelligence algorithm determined the results of an X-ray. The Anthropic project breakthrough will make such processes more clear. However, the Anthropic features identified are only a diminutive subset of the concepts learned in the model. Finding the full feature set using current techniques would require more computing power and more money than was initially used to train the model. Moreover, understanding the model’s representations does not tell us how it uses them.
What more can substantial tech companies do?
As OpenAI, Microsoft, Meta, Amazon, Apple, Anthropic, Nvidia, etc. develop smarter language models, they must also empower the teams that align AI models with human values. However, over the last 2-3 years, some companies have reduced the size of their “Ethical AI” teams. For example, members of OpenAI’s “superalignment” team, including co-founder and chief scientist Ilya Sutskever, left their jobs over differences with CEO Sam Altman. However, Microsoft expanded its responsible AI team from 350 to 400 people last year.