Scott Lifan Gu's Blog: Gu Test: A Measurement of Generic Intelligence

Abstraction
Could computers understand and represent irrational numbers without knowing their exact values? Humans can. The measurement of generic intelligence is critical to further development of artificial intelligence (AI). However, the existing methods: Turing Test and its variants cannot really measure intrinsic intelligence capabilities. Based on the studies of knowledge development, several essential design goals for intelligence measurement are identified. A new method: Gu Test is proposed, to distinguish strong AI and regular machines, and meet some of these design goals. Further improvement could be done in future.

1. The Measurement of Generic Intelligence

Measurement is so important in sciences and technologies. Just as clocks are necessary to advanced studies of motion and speed, centrifugal governors are critical to make steam engines usable.

The measurement of generic intelligence is also important to artificial intelligence (AI). However, the existing measuring methods, such as Turing Test and its variants, are mainly behavior-based, knowledge-based, or task-based, etc., which cannot really measure intrinsic intelligence capabilities.

A new way of measurement : Gu Test, is proposed, to distinguish strong AI from regular machines.

2. Turing Test and Chinese Room concern

Alan Turing described an imitation game in his paper Computing Machinery and Intelligence [1], which tests whether a human could distinguish a computer from another human only via communication without seeing each other.

It is a black box test, purely based on behavior. To pass this kind of tests, computers only need to imitate humans.

So John Seale raised a Chinese Room issue [2], i.e., computers could pass this test by symbolic processing without really understanding the meanings of these symbols.

Also, Turing Test uses interrogation to test, so it only can test those human characteristics which already be understood well by humans and can be expressed in communication. Humans still have very limited understanding of life, psychology, and intelligence. So those intrinsic intelligence abilities which humans do not understand well yet could not be tested only via interrogation.

3. Variants of Turing Test

There are several variants of Turing Test which aim at improving on it.

One is Feigenbaum test. According to Edward Feigenbaum, "Human intelligence is very multidimensional", "computational linguists have developed superb models for the processing of human language grammars. Where they have lagged is in the 'understand' part", "For an artifact, a computational intelligence, to be able to behave with high levels of performance on complex intellectual tasks, perhaps surpassing human level, it must have extensive knowledge of the domain." [3].

There are two issues in this test. One is current computers only can store and process knowledge in some data forms. If some knowledge cannot be represented as data, then the "understanding part" of such knowledge would not be solved. The other issue is whether extensive knowledge is necessary to test strong AI, since individual humans may not have very extensive knowledge in certain domains.

Feigenbaum test is actually a very good method to measure expert systems. However, to measure generic AI or test strong AI, the essentials of "understanding part" need be identified. Whether they could be transformed into data and data processing should be parts of testing. Gu Test actually tries to solve these issues.

Another variant is Shane Legg and Marcus Hutter's solution [4], which is actually agent-based. In their framework, an agent sends its actions to the environment and received observations and rewards from it. If their framework is used to test strong AI, then it assumes that all the interactions between humans and their environment could be modeled by actions/observations/rewards. This assumption has not been tested.

Humans can play some roles of agents, but they are not just agents. Humans could make paradigm evolution, which usually means gain deeper observations, take better actions, and gain more rewards than what already in any definitions.

Actions/observations/rewards could be defined for specific tasks and agents. But how could these be defined for paradigm evolution, and for the whole life of humans ? Humans could make exceptions and innovations. Without these, there would be no Euclid, Galileo and Columbus, etc.

There are some essential parts of intrinsic intelligence which are not in their framework. Much more research could be done to further study the difference between humans and agents.

Even if Turing Test is enhanced with vision and manipulation ability, or with methods like statistical inference, etc., it still does not test the difference between knowledge and data, between simulated behavior and intrinsic intelligence, etc.

4. The Design Goals for Generic Intelligence Measurement

Based on the analysis done in previous sections, some design goals are proposed here:
1) Resolve Chinese Room issue, i.e., to test the real understanding, not just behavior imitating or symbolic processing.
2) Not just rely on interrogation. Find some ways to test those intrinsic intelligence abilities which have not been understood and expressed well.
3) Test those concepts, knowledge and intelligence which cannot be represented as data yet.
4) Involve as less domain knowledge as possible, since regular humans may not have much knowledge in specific domains.
5) Include those intrinsic capabilities commonly necessary in many domains, with which humans can develop intelligence in many domains.
6) Include a sequence of leveled tests, since humans are able to make continuous progresses in intelligence.
7) Include a framework to test structured intelligence and be able to make paradigm evolution, since humans can develop sophisticated knowledge structures and make paradigm evolution.

5. Gu Test

Based on these design goals, Gu Test is proposed. It should include a sequence of test levels, and be able to test structured intelligence and make paradigm shift.

However, currently only a first test step is suggested, to meet the goals from 1) to 5). The work to meet the design goals 6) and 7) will be left to future researches.

The first test step of Gu Test is : to test whether testees could understand irrational numbers without knowing their exact values. It is a white box test. Average humans with certain basic education can. Current computers most likely cannot.

It tests the real understanding; It does not rely on interrogation, but tests some intrinsic ability; Humans can pass this test, but they probably do not know why they have this ability yet; It tests some concepts and knowledge which cannot be represented as data; Irrational number is a primitive concept developed in Pythagoras' age, who is a poineer in philosophy and mathematics; The concept is necessary to so many domains, but involves very little domain-specific knowledge.

6. Future Research

Much more work need be done to extend Gu Test to meet the design goals 6) and 7). To really understand the essentials of intelligence, people have to study the history of knowledge development, philosophy, mathematics, sciences, etc.

References

[1] Turing, A. M., 1950, "Computing machinery and intelligence". Mind 59, 433–460.
[2] Searle, John. R., 1980, "Minds, brains, and programs". Behavioral and Brain Sciences 3 (3): 417-457.
[3] Feigenbaum, Edward A., 2003, "Some challenges and grand challenges for computational intelligence". Journal of the ACM 50 (1): 32–40.
[4] Legg, S. & Hutter, M., 2006, "A Formal Measure of Machine Intelligence”, Proc. 15th Annual Machine Learning Conference of Belgium and The Netherlands, pp.73-80.

Scott Lifan Gu's Blog

Wednesday, December 28, 2011

Gu Test: A Measurement of Generic Intelligence

No comments:

Post a Comment