The goal of ARTUR is to produce an AI that rewrites and improves on itself in a recursive fashion. The acronym ARTUR stands for “Artificial Recursive Transmutative Understanding Routine”. This name was expanded from the more generic term “Artificial Intelligence” to better fit the unique aspects of the project, while still breaking down to an acronym that is easy to remember and use in conversations.
The idea originally occurred to me after having a debate with a colleague about the theoretical “Technological Singularity”. The argument my colleague was making, which I disagree with, was that the singularity would follow almost immediately after human-level artificial general intelligence was created (i.e. as soon as a machine were to become self-aware). I argued that there is nothing magical about human-level intelligence which would automatically lead to the singularity (there are already 7 billion human-level intelligences in the world now, right?)
My own argument about there being nothing magical about human-level intelligence got me thinking, though. What level of intelligence would be necessary for a recursive self-modifying routine to function? Would cockroach-level intelligence be enough? What aspects of the “Technological Singularity” theory might be used to generate an AI with a higher level of intelligence than what it initially started out with? Could such a system be used to produce robust utility AIs that could later be plugged into other systems (a toy robot, for example)?
I began investigating this possibility, and initially came up with a list of four main problems that the system would need to address. I’ve also devised strategies to address each of these problems.
Problem #1: What is “intelligence”, and how to measure it?
Part of the solution is to look at intelligence from a variety of perspectives. The intent is to produce a general purpose AI (versus one that is only good at math for example), so a series of challenges must be devised which test multiple different aspects of intelligence.
In the book “Frames of Mind: The Theory of Multiple Intelligences”, psychologist Howard Gardner discusses seven facets of intelligence: musical-rhythmic, visual-spatial, verbal-linguistic, logical-mathematical, bodily-kinesthetic, interpersonal, and intrapersonal. Although Gardner’s theory has been widely criticized by mainstream psychology, I think his work provides an excellent conceptual foundation for building challenges that target different aspects of intelligence. A composite scoring of all seven categories should be a good way to measure overall intelligence for performing comparisons.
Whatever challenges or tests are being used to measure intelligence must not be exactly the same every time. Otherwise ARTUR will evolve something like a “lookup table” to the answers for each specific challenge, versus evolving the ability to actually solve them. Additionally, the challenges should be tailored to the intelligence level of the system, which will change over time. The challenges must be capable of measuring intelligence from a range of cockroach-level intelligence all the way up to whatever is the target level of intelligence. And the granularity must be high enough to detect slight differences in intelligence, so that the system can properly identify the proper trends.
Another part of the solution will be to write an API into the system which can be used to introduce new challenges in the future without having to code all of the challenges up front before kicking off the system. As ARTUR becomes more intelligent, or more pressure needs to be applied to a particular area, new challenges can be written which are more complex or which target specific problem areas. Another benefit of such an interface is that it could be used to plug the AI into a practical system later (such as a toy robot), once it has reached a desired level of intelligence.
Problem #2: How to ensure the routine doesn’t, through self-modification, deviate from its core goal to become more intelligence?
The first part of this problem can be addressed by adding an external system for driving the goal to become more intelligent. The self-modification routine should not have any access to modify this external system. This would involve proper “sandboxing” of the routine such that there is no way for it to reach outside to impact the external system. The second part of the problem is how to code a goal of becoming more intelligent in the first place?
Nature has solved similar problems, through the reproductive process and natural selection. External forces which a species cannot control drive its future generations into more and more efficient variations. This in turn leads to the emergence of instincts (such as a chipmunk’s drive to store acorns for the winter, for example).
A system of reproduction and selection could be simulated, driving an “instinct” to want to become more intelligent. In theory, this should be the inevitable result of a routine which has the ability to modify the code of its offspring, with an external “judge” which compares and selects the most intelligent of those offspring for producing the next generation. An individual which has evolved an instinct to want to make its offspring more intelligent, is more likely to produce more intelligent offspring than another individual which is just making random modifications to its offspring. This means that over many generations, the modifications being made should become less and less random in nature.
Problem #3: How to ensure the routine doesn’t evolve to purposely sabotage its own offspring so that they are all less intelligent than itself, thus ensuring its own continued existence?
Nature has an answer to this problem: aging and lifespans. ARTUR could be designed in a way such that a particular individual eventually “dies” and can no longer be chosen to populate future generations. Since it is easy to create exact byte-for-byte replicas of software, the “death” of an individual would need to include non-reusability of any exact duplicate of a particular individual.
For an individual to continue into future generations, it must also be forced to modify its offspring in a way that they do not compile into exact replicas of itself. This would also prevent a variation of the problem in which the routine ensures its own continued existence by only modifying its offspring’s source code in a way which does not change their compiled state (for example, only inserting comments in the code)
Problem #4: How to address the issue of maturation time?
More intelligent species tend to have longer maturation times. For example, compare a wildebeest infant hours after birth when it can already run from predators, with a human infant hours after birth. Any objective judge would assume the wildebeest to be more intelligent. However, doing the comparison a couple years later, the human would be the obvious winner.
How do we avoid having to wait a very long time between generations to measure intelligence? I believe this can be addressed by taking an idea from science fiction – inherited memory. If the system is designed in a way that the memories of the parents are incorporated into their offspring, then maturation time would be drastically reduced.
The initial strategy for ARTUR will be a system which consists of a Judge and Creatures. The Judge will exist as a separate application which is isolated from the Creatures. The Creatures will run in a sandbox with no access to the overall OS or to the judge. They will be free to make any modifications within the sandbox.
The creatures will have access to an interface for browsing and editing a copy of their own source code. They will also have access to a variety of context sensitive and code completion interfaces that are available in various programming IDEs (versus only having the ability to randomly type keystrokes).
Two parent Creatures will reproduce by copying and modifying multiple randomly merged versions of their own two source code bases. These modified offspring will then be given challenges by the Judge to determine their level of intelligence. The offspring and parents will all be compared, and the best two will be chosen to be the parents of the next generation. The parents (and any exact replicas of them) will have a lifespan after which they “die” and cannot be chosen for future generations. When a parent dies, it will be replaced by the most intelligent of its offspring.
Because the interface for providing challenges is critically important, the Judge will first make sure this component is not broken, before doing any other comparisons. Any Creature with a broken interface will not be chosen for future generations.
Because the ability to modify offspring is also a critically important feature, the Judge will ensure that each of the offspring are capable of modifying another creature. Any Creature which is unable to modify another Creature will also not be chosen for future generations.
The Judge will then provide challenges to the remaining Creatures, and compare their intelligence scores. In the case of a tie, the youngest Creature will win. If tied on both intelligence and age, the Creature most different from either of its parents will win.
To ensure diversity, the system will consist of two “sexes”, four breeding lines, and two rotations. Line 1 Sex A (1A) will only be crossed with Line 2 Sex B (2B), 2A with 3B, 3A with 4B, and 4A with 1B. Breeding pairs will always be from opposite rotations.
The evolving source code will be maintained in git. As new Creatures are chosen by the Judge, their source code will be committed, using a branching and merging strategy which matches the breeding strategy. This will allow the changes to be monitored, and for the possibility of injecting manual changes into the Creatures’ evolution. Statistics will be maintained in the README, which can be used to have a quick view into how the process is going.