Tuesday, January 31, 2012

Gu Test: A Measurement of Generic Intelligence (v3)

Abstraction
Could computers understand and represent irrational numbers without knowing the exact values, and uncountable sets? Humans can. Are there somethings in human intelligence which exceed the power of Turing Machine? The measurement of generic intelligence is critical to further development of artificial intelligence (AI). However, there are various bottlenecks and issues in the existing methods: Turing Test and its variants, which cannot really measure intrinsic intelligence capabilities. Based on the studies of knowledge development, several essential design goals for intelligence measurement are identified to solve these issues. A new method: Gu Test is proposed, to distinguish strong AI and regular machines, meet some of these design goals, and provide insights for future directions of AI. Further improvement could be done in future.


1. The Measurement of Generic Intelligence

Measurement is so important in sciences and technologies. Just as clocks are necessary to advanced studies of motion and speed, centrifugal governors are critical to make steam engines usable, etc.

The measurement of generic intelligence is also critical to AI, to estimate the current status and look for future improvement, etc. However, the existing measuring methods, such as Turing Test and its variants, are mainly behavior-based, knowledge-based, or task-based, etc. There are various bottlenecks and issues in these solutions. So they cannot really measure intrinsic intelligence capabilities.

People could design algorithms on Turing Machine or its improved models. These are mathematics models limited by Gödel's incompleteness theorems. Even worse, there are problems to implement them physically. Are there somethings in human intelligence which exceed the power of Turing Machine and its current improvements? A good measurement should point to possible bridges between mathematics models, physical implementations, and human intelligence. Gu Test, is a new measurement to resolve these issues and distinguish strong AI from regular machines.
 
The following sections will discuss Turing Test and its bottlenecks and issues; the variants of Turing Test: Feigenbaum test, Shane Legg and Marcus Hutter's solution, etc; the design goals to achieve better measurement of generic intelligence; Gu Test, and some directions for future work.


2. Turing Test and Chinese Room concern

Alan Turing described an imitation game in his paper Computing Machinery and Intelligence [1], which tests whether a human could distinguish a computer from another human only via communication without seeing each other.

It is a black box test, purely based on behavior. Computers could pass this kind of tests by imitating humans.

So John Seale raised a Chinese Room issue [2], i.e., computers could pass this test by symbolic processing without really understanding the meanings of these symbols.

More important, there are bottlenecks of communication or storage, in expression or in capacity, and the issues of blackbox test and understanding, etc., as described below, which make the current ways of symbolic processing inadequate as generic intelligence.

Turing Test uses interrogation to test, so it only can test those human characteristics which already be understood well by humans and can be expressed in communication. Humans still have very limited understanding of life, psychology, and intelligence. Some people could manage to understand each others by face to face, analogy, metaphor, implying, suggestion, etc.,  on things which cannot be purely done in symbolic processing. Some people may not. Humans do not know why these methods work or do not work yet. So these intrinsic intelligence abilities not understood well yet could not be expressed or tested via interrogation behind veils. Turing Test does not work in these cases. This is the bottleneck in expression.

Even if the bottleneck in expression could be resolved in some problems, the capacity in communication or storage could still be an issue if purely relying on symbolic processing: say, how to represent the value of an irrational number, and how many irrational numbers they could represent, finite or infinite, countable or uncountable, etc. ? The current von Neumann architectures only have finite memory units. Turing Machine has infinite but countable memory units. Could Turing Machine be enhanced with uncountable memory units ?

Since the methods of face to face, analogy, metaphor, implying, suggestion, etc., does not work in Turing Test or other blackbox tests, is it still possible for computers to be programmed to understand things like irrational numbers or uncountable sets ? There is a blackbox test issue to verify this.

Assume infinite but countable storage as in Turing Machine, or interrogators with infinite testing time, and a computer is able to compute the value of an irrational number a digital by a digital. In blackbox test, how could these interrogators know the computer is only going to display a huge rational number with the same digitals as a portion of an irrational number, or it is going to display a true irrational number? This issue could be resolved by whitebox tests, to review the program in the computer to verify whether they really understand.

Turing Test cannot resolve these bottlenecks and issues.


3. Variants of Turing Test

There are several variants of Turing Test which aim at improving on it.

One is Feigenbaum test. According to Edward Feigenbaum, "Human intelligence is very multidimensional", "computational linguists have developed superb models for the processing of human language grammars. Where they have lagged is in the 'understand' part", "For an artifact, a computational intelligence, to be able to behave with high levels of performance on complex intellectual tasks, perhaps surpassing human level, it must have extensive knowledge of the domain." [3].

Feigenbaum test is actually a good method to test the knowledge in expert systems. The test tries to produce generic intelligence by average out of many expert systems. That is why it needs to test extensive knowledge.

Here, the bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding, still remain in Feigenbaum test as well as in other variants of Turing Test. There are differences between knowledge, concepts and data. The "understanding part" is still to be resolved.

One more issue of Feigenbaum test is: individual humans may not have very extensive knowledge in many domains, but they have potentials. So extensive knowledge may not be necessary, but tests for potentials are.

Another variant is Shane Legg and Marcus Hutter's solution [4], which is actually agent-based, a good test for tasks. Their solution tries to test generic intelligence by average out of many tasks. It still uses behavior imitation and comparison. So it is a blackbox test and a variant of Turing Test.

In their framework, an agent sends its actions to the environment and received observations and rewards from it. If their framework is used to test strong AI, then it assumes that all the interactions between humans and their environment could be modeled by actions/observations/rewards. This assumption has not been tested yet. The bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding, still remains.

Furthermore, there are differences between humans and the definitions of agents. Humans can play some roles of agents, but they are not just agents. They could make paradigm evolution or shift, which usually means gain deeper observations, take better actions, and gain more rewards than what already in any definitions.

Even if Turing Test is enhanced with vision and manipulation ability, or with methods like statistical inference, etc., it still does not resolve the bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding.

These issues could be solved by concept understanding, whitebox test, etc.  The measurement of generic intelligence is not just producing the digitals of one or a few irrationl numbers.


4. The Design Goals for Generic Intelligence Measurement

Based on the analysis done in previous sections, some design goals are proposed here:
1) Resolve Chinese Room issue, i.e., to test the real understanding, not just behavior imitating or symbolic processing.
2) Resolve the bottleneck in expression, by not purely relying on interrogation. Find some ways to test those intrinsic intelligence abilities which have not been understood and expressed well. 
3) Resolve the bottleneck in capacity, by levergae of the differences between concepts, knowledge and data.
4) Use whitebox test to resolve blackbox test issue.
5) Involve as less domain knowledge as possible, since regular humans may not have much knowledge in specific domains. But include those essential intrinsic capabilities commonly necessary in many domains, with which humans have the potentials to develop intelligence in many domains.
6) Include sequential test levels, since humans are able to make continuous progresses in intelligence.
7) Include a framework to test structured intelligence and be able to make paradigm evolution or shift, since humans have such abilitities.


5. Gu Test

Based on these design goals, Gu Test is proposed. It should include sequential test levels, and be able to test structured intelligence and make paradigm evolution or shift gradually.

The current efforts are to achieve the design goals 1) to 6). The work to meet goal 7) will be left to future researches.

The first test step of Gu Test is: to test whether testees could understand irrational numbers without knowing their exact values. It is a white box test. Average humans with certain basic education can. Current computers most likely cannot. An advanced step could be: to test the understanding of uncountable sets.

These test the real understanding; They do not rely on interrogation, but test some intrinsic ability. Humans have this ability without the issues of bottlenecks in expression or capacity, but they probably do not know why they have this ability yet; It tests some concepts and knowledge which cannot be represented as data; Irrational number is a primitive concept developed in Pythagoras' age, who is a poineer in philosophy and mathematics; The concept is necessary to so many domains, but involves very little domain-specific knowledge. Uncountable set is an advanced concept.

Due to these characteristics, Gu Test is very different from Turing Test and its variants by testing the understanding parts.

Irrational number and Uncountable set are just mathematics concepts. Physical concepts are in complete different dimensions. To make generic intelligence understand physical concepts and represent them would be very different challenges.


6. Future Research

Much more work need be done to extend Gu Test to include various test levels and meet design goals 7).

The analysis on the bottlenecks and issues of Turing Test, would naturally lead to the questions of the power and limitations of Turing Machine, and what the better models are for artificial intelligence. Does it need uncountable memory units ? If yes, how to implement it. If not, how to enhance the power. People probably have to revisit Church-Turing thesis.  It is possible to build models exceeding the power of Turing Machine mathematically. However, it would be a challenge to develop such a model matching physical reality.

To really understand the essentials of intelligence, people have to study the history of knowledge development, philosophy, mathematics, sciences, etc.


References

[1] Turing, A. M., 1950, "Computing machinery and intelligence". Mind 59, 433–460.
[2] Searle, John. R., 1980, "Minds, brains, and programs". Behavioral and Brain Sciences 3 (3): 417-457.
[3] Feigenbaum, Edward A., 2003, "Some challenges and grand challenges for computational intelligence".  Journal of the ACM 50 (1): 32–40.
[4] Legg, S. & Hutter, M., 2006, "A Formal Measure of Machine Intelligence”, Proc. 15th Annual Machine Learning Conference of Belgium and The Netherlands, pp.73-80.

Imperial Selection, Socratic Method, and Sophism

There is an issue related to knowledge development: how to evaluate the knowledge owned by people.

In ancient China, Imperial Selection (科举) was used to select people with some talents. It was started in Han Dynasty (206 BCE – 220 CE), initially with recommendation, then relying more and more on examinations later. Examinations should be blackbox tests ideally, in which the grades are made only based on the answers.

Examinations are also used in university admission. In 1970's, recommendation was resumed in university admission in China in a short period (工农兵学员). Then it has been switched back mainly to examinaitions up to now.

Both recommendation and examination are not good enough to evaluate scientific researches. Socratic method promotes open discussion fairly and honestly. When you want to question others' points, raise the questions face to face so the other parties would have a chance to answer. It should be a fair play. Thesis defence is similar to Socratic method.

However,  Socratic method is vulnerable to sophism. Sophism could make fair and honest discussion impossible. Socrates was even voted to death.

Very few people could resist against sophism and keep away from it. Those who really can could benefit from sciences.

Monday, January 23, 2012

Gu Test: A Measurement of Generic Intelligence (v2)

Abstraction
Could computers understand and represent irrational numbers without knowing the exact values, and uncountable sets? Humans can. Are there somethings in human intelligence which exceed the power of Turing Machine? The measurement of generic intelligence is critical to further development of artificial intelligence (AI). However, there are various bottlenecks and issues in the existing methods: Turing Test and its variants, which cannot really measure intrinsic intelligence capabilities. Based on the studies of knowledge development, several essential design goals for intelligence measurement are identified to solve these issues. A new method: Gu Test is proposed, to distinguish strong AI and regular machines, meet some of these design goals, and provide insights for future directions of AI. Further improvement could be done in future.


1. The Measurement of Generic Intelligence

Measurement is so important in sciences and technologies. Just as clocks are necessary to advanced studies of motion and speed, centrifugal governors are critical to make steam engines usable, etc.

The measurement of generic intelligence is also critical to AI, to estimate the current status and look for future improvement, etc. However, the existing measuring methods, such as Turing Test and its variants, are mainly behavior-based, knowledge-based, or task-based, etc. There are various bottlenecks and issues in these solutions. So they cannot really measure intrinsic intelligence capabilities.

A new way of measurement : Gu Test, is proposed to resolve these issues and distinguish strong AI from regular machines.
 
The next section will discuss Turing Test and the bottlenecks and issues. Section 3 will discuss the variants of Turing Test: Feigenbaum test, Shane Legg and Marcus Hutter's solution, etc. Section 4 will introduce 7 design goals to solve these issues to achieve better measurement of generic intelligence. Section 5 will describe Gu Test to meet some of these design goals. Section 6 will propose some future work.


2. Turing Test and Chinese Room concern

Alan Turing described an imitation game in his paper Computing Machinery and Intelligence [1], which tests whether a human could distinguish a computer from another human only via communication without seeing each other.

It is a black box test, purely based on behavior. Computers could pass this kind of tests by imitating humans.

So John Seale raised a Chinese Room issue [2], i.e., computers could pass this test by symbolic processing without really understanding the meanings of these symbols.

More important, there are bottlenecks of communication or storage, in expression or in capacity, and the issues of blackbox test and understanding, etc., as described below, which make the current ways of symbolic processing inadequate as generic intelligence.

Turing Test uses interrogation to test, so it only can test those human characteristics which already be understood well by humans and can be expressed in communication. Humans still have very limited understanding of life, psychology, and intelligence. Some people could manage to understand each others by face to face, analogy, metaphor, implying, suggestion, etc.,  on things which cannot be purely done in symbolic processing. Some people may not. Humans do not know why these methods work or do not work yet. So these intrinsic intelligence abilities not understood well yet could not be expressed or tested via interrogation behind veils. Turing Test does not work in these cases. This is the bottleneck in expression.

Even if the bottleneck in expression could be resolved in some problems, the capacity in communication or storage could still be an issue if purely relying on symbolic processing: say, how to represent the value of an irrational number, and how many irrational numbers they could represent, finite or infinite, countable or uncountable, etc. ? The current von Neumann architectures only have finite memory units. Turing Machine has infinite but countable memory units. Could Turing Machine be enhanced with uncountable memory units ?

Since the methods of face to face, analogy, metaphor, implying, suggestion, etc., does not work in Turing Test or other blackbox tests, is it still possible for computers to be programmed to understand things like irrational numbers or uncountable sets ? There is a blackbox test issue to verify this.

Assume infinite but countable storage as in Turing Machine, or interrogators with infinite testing time, and a computer is able to compute the value of an irrational number a digital by a digital. In blackbox test, how could these interrogators know the computer is only going to display a huge rational number with the same digitals as a portion of an irrational number, or it is going to display a true irrational number? The issue could be resolved by whitebox test, to review the program in the computer to verify whether they really understand.

Turing Test cannot resolve these bottlenecks and issues.


3. Variants of Turing Test

There are several variants of Turing Test which aim at improving on it.

One is Feigenbaum test. According to Edward Feigenbaum, "Human intelligence is very multidimensional", "computational linguists have developed superb models for the processing of human language grammars. Where they have lagged is in the 'understand' part", "For an artifact, a computational intelligence, to be able to behave with high levels of performance on complex intellectual tasks, perhaps surpassing human level, it must have extensive knowledge of the domain." [3].

Feigenbaum test is actually a good method to test the knowledge in expert systems. The test tries to produce generic intelligence by average out of many expert systems. That is why it needs to test extensive knowledge.

Here, the bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding, still remain in Feigenbaum test as well as in other variants of Turing Test. There are differences between knowledge, concepts and data. The "understanding part" is still to be resolved.

One more issue of Feigenbaum test is: individual humans may not have very extensive knowledge in many domains, but they have potentials. So extensive knowledge may not be necessary, but tests for potentials are.

Another variant is Shane Legg and Marcus Hutter's solution [4], which is actually agent-based, a good test for tasks. Their solution tries to test generic intelligence by average out of many tasks. It still uses behavior imitation and comparison. So it is a blackbox test and a variant of Turing Test.

In their framework, an agent sends its actions to the environment and received observations and rewards from it. If their framework is used to test strong AI, then it assumes that all the interactions between humans and their environment could be modeled by actions/observations/rewards. This assumption has not been tested yet. The bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding, still remains.

Furthermore, there are differences between humans and the definitions of agents. Humans can play some roles of agents, but they are not just agents. They could make paradigm evolution or shift, which usually means gain deeper observations, take better actions, and gain more rewards than what already in any definitions.

Even if Turing Test is enhanced with vision and manipulation ability, or with methods like statistical inference, etc., it still does not resolve the bottlenecks of communication or storage, in expression or in capacity, and the issue of blackbox test and understanding.

These issues could be solved by concept understanding, whitebox test, etc.  The measurement of generic intelligence is not just producing the digitals of one or a few irrationl numbers.


4. The Design Goals for Generic Intelligence Measurement

Based on the analysis done in previous sections, some design goals are proposed here:
1) Resolve Chinese Room issue, i.e., to test the real understanding, not just behavior imitating or symbolic processing.
2) Resolve the bottleneck in expression, by not purely relying on interrogation. Find some ways to test those intrinsic intelligence abilities which have not been understood and expressed well. 
3) Resolve the bottleneck in capacity, by levergae of the differences between concepts, knowledge and data.
4) Use whitebox test to resolve blackbox test issue.
5) Involve as less domain knowledge as possible, since regular humans may not have much knowledge in specific domains. But include those essential intrinsic capabilities commonly necessary in many domains, with which humans have the potentials to develop intelligence in many domains.
6) Include sequential test levels, since humans are able to make continuous progresses in intelligence.
7) Include a framework to test structured intelligence and be able to make paradigm evolution or shift, since humans have such abilitities.


5. Gu Test

Based on these design goals, Gu Test is proposed. It should include sequential test levels, and be able to test structured intelligence and make paradigm evolution or shift gradually.

The current efforts are to achieve the design goals 1) to 6). The work to meet goal 7) will be left to future researches.

The first test step of Gu Test is: to test whether testees could understand irrational numbers without knowing their exact values. It is a white box test. Average humans with certain basic education can. Current computers most likely cannot. An advanced step could be: to test the understanding of uncountable sets.

These test the real understanding; They do not rely on interrogation, but test some intrinsic ability. Humans have this ability without the issues of bottlenecks in expression or capacity, but they probably do not know why they have this ability yet; It tests some concepts and knowledge which cannot be represented as data; Irrational number is a primitive concept developed in Pythagoras' age, who is a poineer in philosophy and mathematics; The concept is necessary to so many domains, but involves very little domain-specific knowledge. Uncountable set is an advanced concept.

Due to these characteristics, Gu Test is very different from Turing Test and its variants by testing the understanding parts.

Irrational number and Uncountable set are just mathematics concepts. Physical concepts are in complete different dimensions. To make generic intelligence understand physical concepts and represent them would be very different challenges.


6. Future Research

Much more work need be done to extend Gu Test to include various test levels and meet design goals 7).

The analysis on the bottlenecks and issues of Turing Test, would naturally lead to the questions of the power and limitations of Turing Machine, and what the better models are for artificial intelligence. Does it need uncountable memory units ? If yes, how to implement it. If not, how to enhance the power. People probably have to revisit Church-Turing thesis.  It is possible to build models exceeding the power of Turing Machine mathematically. However, it would be a challenge to develop such a model matching physical reality.

To really understand the essentials of intelligence, people have to study the history of knowledge development, philosophy, mathematics, sciences, etc.


References

[1] Turing, A. M., 1950, "Computing machinery and intelligence". Mind 59, 433–460.
[2] Searle, John. R., 1980, "Minds, brains, and programs". Behavioral and Brain Sciences 3 (3): 417-457.
[3] Feigenbaum, Edward A., 2003, "Some challenges and grand challenges for computational intelligence".  Journal of the ACM 50 (1): 32–40.
[4] Legg, S. & Hutter, M., 2006, "A Formal Measure of Machine Intelligence”, Proc. 15th Annual Machine Learning Conference of Belgium and The Netherlands, pp.73-80.

The Issues and Mistakes in Knowledge Development

TBD