How Well Do Machines Perform on IQ tests: a Comparison Study on a
Large-Scale Dataset
Abstract
AI benchmarking becomes an increasingly important task. As suggested by many researchers, Intelligence Quotient (IQ) tests, which is widely regarded as one of the predominant benchmarks for
measuring human intelligence, raises an interesting
challenge for AI systems. For better solving IQ
tests automatedly by machines, one needs to use,
combine and advance many areas in AI including
knowledge representation and reasoning, machine
learning, natural language processing and image
understanding. Also, automated IQ tests provides
an ideal testbed for integrating symbolic and subsymbolic approaches as both are found useful here.
Hence, we argue that IQ tests, although not suitable for testing machine intelligence, provides an
excellent benchmark for the current development
of AI research. Nevertheless, most existing IQ test
datasets are not comprehensive enough for this purpose. As a result, the conclusions obtained are
not representative. To address this issue, we create IQ10k, a large-scale dataset that contains more
than 10,000 IQ test questions. We also conduct a
comparison study on IQ10k with a number of stateof-the-art approaches