Wednesday, April 18, 2012

Gu Test: A Measurement of Generic Intelligence (v3)

Abstraction
Could computers understand and represent irrational numbers without knowing the exact values, which may be necessary to build sciences ? Humans can. How about uncountable set ? Are there somethings in human intelligence which exceed the power of Turing Machine? The measurement of generic intelligence is critical to further development of artificial intelligence (AI). However, there are various bottlenecks and issues in the existing methods: Turing Test and its variants, which cannot really measure intrinsic intelligence capabilities. Based on the studies of knowledge development, several essential design goals for intelligence measurement are identified to address these issues. A new method: Gu Test is proposed, to meet some of these design goals, distinguish strong AI and regular machines, and provide insights for future directions of AI. Further improvement could be done in future.


1. The Measurement of Generic Intelligence

Could computers understand the concepts of irrational numbers and represent these numbers and the theories based on these numbers without knowing their exact values ? Such concepts and theories are necessary to build sciences and advanced human intelligence. How about uncountable set, etc. ? Such intrinsic intelligence capabilities are important milestones for machine intelligence levels.

The measurement of generic intelligence capabilities is critical to AI, to estimate the current status and look for future improvement, etc. However, the existing measuring methods, such as Turing Test and its variants, are mainly behavior-based, knowledge-based, or task-based, etc. There are various bottlenecks and issues in these solutions. They cannot really measure intrinsic intelligence capabilities.

People could design algorithms on Turing Machine or its improved models. These are mathematics models limited by Gödel's incompleteness theorems. Even worse, there are still problems to implement these models physically.

Current computers use rational numbers to approximate irrational numbers. Due to the sensitivity to intial conditions and exponential divergence in nonlinear chaotic phenomena, there are problems in such approximations. In reality, nonlinearity is the normal rather than the exception. How does humans' intelligence guide their behavior in real physical situations ?

Are there somethings in human intelligence which exceed the power of Turing Machine and its current improvements? A good measurement should point to possible bridges between mathematics models, physical implementations, and human intelligence. Gu Test, is a new measurement to address these issues, distinguish strong AI from regular machines, and provide insights for future directions, etc.
 
The following sections will discuss Turing Test and its variants with their bottlenecks and issues first. Several design goals are identified to address these issues and better measure generic intelligence. Gu Test, is proposed to achieve these design goals. Some directions for future work are discussed.


2. Turing Test and Chinese Room concern

Alan Turing described an imitation game in his paper Computing Machinery and Intelligence [1], which tests whether a human could distinguish a computer from another human only via communication without seeing each other.

It is a black box test, purely based on behavior. Computers could pass this kind of tests by imitating humans.

So John Seale raised a Chinese Room issue [2], i.e., computers could pass this test by symbolic processing without really understanding the meanings of these symbols.

More important, there are bottlenecks of communication or storage, in expression or in capacity, and the issues of blackbox test and understanding, etc., as described below, which make the current ways of symbolic processing inadequate as generic intelligence.

Turing Test uses interrogation to test, so it only can test those human characteristics which already be understood well by humans and can be expressed in communication. Humans still have very limited understanding of life, psychology, and intelligence. Some people could manage to understand each others by face to face, analogy, metaphor, implying, suggestion, etc.,  on things which cannot be purely done in symbolic processing. Some people may not. Humans do not know why these methods work or do not work yet. So these intrinsic intelligence abilities not understood well yet could not be expressed or tested via interrogation behind veils. Turing Test does not work in these cases. This is the bottleneck in expression.

Even if the bottleneck in expression could be resolved in some problems, the capacity in communication or storage could still be an issue if purely relying on symbolic processing: say, how to represent the value of an irrational number, and how many irrational numbers they could represent, finite or infinite, countable or uncountable, etc. ? The current von Neumann architectures only have finite memory units. Turing Machine has infinite but countable memory units. Could Turing Machine be enhanced with uncountable memory units ?

Since the methods of face to face, analogy, metaphor, implying, suggestion, etc., does not work in Turing Test or other blackbox tests, is it still possible for computers to be programmed to understand things like irrational numbers or uncountable sets ? There is a blackbox test issue to verify this.

Assume infinite but countable storage as in Turing Machine, or interrogators with infinite testing time, and a computer is able to compute the value of an irrational number a digital by a digital. In blackbox test, how could these interrogators know the computer is only going to display a huge rational number with the same digitals as a portion of an irrational number, or it is going to display a true irrational number? This issue could be resolved by whitebox tests, to review the program in the computer to verify whether they really understand.

Turing Test cannot resolve these bottlenecks and issues.


3. Variants of Turing Test

There are several variants of Turing Test which aim at improving on it.

One is Feigenbaum test. According to Edward Feigenbaum, "Human intelligence is very multidimensional", "computational linguists have developed superb models for the processing of human language grammars. Where they have lagged is in the 'understand' part", "For an artifact, a computational intelligence, to be able to behave with high levels of performance on complex intellectual tasks, perhaps surpassing human level, it must have extensive knowledge of the domain." [3].

Feigenbaum test is actually a good method to test the knowledge in expert systems. The test tries to produce generic intelligence by average out of many expert systems. That is why it needs to test extensive knowledge.

Here, the bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding, still remain in Feigenbaum test as well as in other variants of Turing Test. There are differences between knowledge, concepts and data. The "understanding part" is still to be resolved.

One more issue of Feigenbaum test is: individual humans may not have very extensive knowledge in many domains, but they have potentials. So extensive knowledge may not be necessary, but tests for potentials are.

Another variant is Shane Legg and Marcus Hutter's solution [4], which is actually agent-based, a good test for tasks. Their solution tries to test generic intelligence by average out of many tasks. It still uses behavior imitation and comparison. So it is a blackbox test and a variant of Turing Test.

In their framework, an agent sends its actions to the environment and received observations and rewards from it. If their framework is used to test strong AI, then it assumes that all the interactions between humans and their environment could be modeled by actions/observations/rewards. This assumption has not been tested yet. The bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding, still remains.

Furthermore, there are differences between humans and the definitions of agents. Humans can play some roles of agents, but they are not just agents. They could make paradigm evolution or shift, which usually means gain deeper observations, take better actions, and gain more rewards than what already in any definitions.

Even if Turing Test is enhanced with vision and manipulation ability, or with methods like statistical inference, etc., it still does not resolve the bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding.

These issues could be solved by concept understanding, whitebox test, etc.  The measurement of generic intelligence is not just producing the digitals of one or a few irrationl numbers.


4. The Design Goals for Generic Intelligence Measurement

Based on the analysis done in previous sections, some design goals are proposed here:
1) Resolve Chinese Room issue, i.e., to test the real understanding, not just behavior imitating or symbolic processing.
2) Resolve the bottleneck in expression, by not purely relying on interrogation. Find some ways to test those intrinsic intelligence abilities which have not been understood and expressed well. 
3) Resolve the bottleneck in capacity, by levergae of the differences between concepts, knowledge and data.
4) Use whitebox test to resolve blackbox test issue.
5) Involve as less domain knowledge as possible, since regular humans may not have much knowledge in specific domains. But include those essential intrinsic capabilities commonly necessary in many domains, with which humans have the potentials to develop intelligence in many domains.
6) Include sequential test levels, since humans are able to make continuous progresses in intelligence.
7) Include a framework to test structured intelligence and be able to make paradigm evolution or shift, since humans have such abilitities.


5. Gu Test

Based on these design goals, Gu Test is proposed. It should include sequential test levels, and be able to test structured intelligence and make paradigm evolution or shift gradually.

The current efforts are to achieve the design goals 1) to 6). The work to meet goal 7) will be left to future researches.

The first test step of Gu Test is: to test whether testees could understand irrational numbers without knowing their exact values. It is a white box test. Average humans with certain basic education can. Current computers most likely cannot. An advanced step could be: to test the understanding of uncountable sets.

These test the real understanding; They do not rely on interrogation, but test some intrinsic ability. Humans have this ability without the issues of bottlenecks in expression or capacity, but they probably do not know why they have this ability yet; It tests some concepts and knowledge which cannot be represented as data; Irrational number is a primitive concept developed in Pythagoras' age, who is a poineer in philosophy and mathematics; The concept is necessary to so many domains, but involves very little domain-specific knowledge. Uncountable set is an advanced concept.

Due to these characteristics, Gu Test is very different from Turing Test and its variants by testing the understanding parts.

Irrational number and Uncountable set are just mathematics concepts. Physical concepts are in complete different dimensions. To make generic intelligence understand physical concepts and represent them would be very different challenges.


6. Comparison With other Test Methods

As said, Gu Test is very different from behavior-based tests, knowledge-based tests, task-based tests, etc. It is a whitebox test, requiring humans to analyze whether a system achieves certain intelligent levels with certain internal capabilities.

So Gu Test represent a complete paradigm shift from previous test methods. And it is not comparable with those previous test methods.


7. Future Research

Much more work need be done to extend Gu Test to include various test levels and meet design goals 7).

The analysis on the bottlenecks and issues of Turing Test, would naturally lead to the questions of the power and limitations of Turing Machine, and what the better models are for artificial intelligence. Does it need uncountable memory units ? If yes, how to implement it. If not, how to enhance the power. People probably have to revisit Church-Turing thesis.  It is possible to build models exceeding the power of Turing Machine mathematically. However, it would be a challenge to develop such a model matching physical reality.

To really understand the essentials of intelligence, people have to study the history of knowledge development, philosophy, mathematics, sciences, etc.


References

[1] Turing, A. M., 1950, "Computing machinery and intelligence". Mind 59, 433–460.
[2] Searle, John. R., 1980, "Minds, brains, and programs". Behavioral and Brain Sciences 3 (3): 417-457.
[3] Feigenbaum, Edward A., 2003, "Some challenges and grand challenges for computational intelligence".  Journal of the ACM 50 (1): 32–40.
[4] Legg, S. & Hutter, M., 2006, "A Formal Measure of Machine Intelligence”, Proc. 15th Annual Machine Learning Conference of Belgium and The Netherlands, pp.73-80.