Testing Artificial Intelligence: How Low Can You Go?
Artificial intelligence (AI) has been around for a long time with periods of great interest but also with "AI Winters" when public interest waned. Today, with the increasing power of hardware and infrastructure and the interest by big players in multiple industries, AI is propelling to the forefront once more. With the growing importance of AI comes the question: How do I test it? AI systems do not necessarily behave predictably. This means that traditional test cases of the form "do this, expect that" are not always sufficient.
Kenneth Boulding wrote an article in 1956 on a hierarchical classification of systems that I found very useful in thinking about them. For him, the lowest level of systems were static systems, like a table or a chair. The next level up was a clock that moves. Then came the thermostat that actively responded to input. Next up were living cells that maintained themselves, then plants that were loosely connected colonies of cells, then animals in which the cell colonies form a being with coordinated behavior. Humans have the properties of animals but can also think and are self-conscious. In the words of Boulding, “a human not only knows, but knows that it knows." His final two levels were social organizations and a possible "transcendental" level that supersedes all but that he considered unknowable and therefore was not specified further in his article.
Simplified, I believe we can consider most computer systems at the "thermostat" level, in that they can perform actions based on input. Traditional test cases are a level lower. When they are executed, they provide input and compare outcomes in a predictable way, like the clock in Boulding's hierarchy. I believe this is a structural property in testing systems: they can be a level lower than their system under test but not lower than that.
In applying AI systems to the Boulding hierarchy, it seems reasonable to assume that they are at least one level above the thermostat, especially if they can learn and improve themselves. Traditional test cases will not suffice to test them. Some other kind of model will be needed to drive the testing system, raising it the thermostat level. The model can detect where the AI application is going and if its outcomes are at least as expected. Take an AI powered traffic light control system. If an intersection is empty while traffic is waiting, it is not intelligent enough. If all lights are green, a boundary condition has been overstepped. The model is a system, but comparable to a thermostat, a level lower than the AI.
The notion that a testing system can be a level lower than the system under test does not mean it necessarily has to stay there. Testing systems themselves can also incorporate AI, even if testing non-AI systems. This can, in particular, be promising when testing very complex systems and even in making AI systems self-testing. You can look for examples at aitesting.org.
I realize that I'm simplifying a complex reality with a constant stream of new ideas and technologies, but hopefully this article gives a starting point for a fascinating journey in testing AI.