Having recently attended a workshop and conference on beneficial artificial intelligence (AI), one of the overriding concerns is how to design beneficial AI. To do this, the AI needs to be aligned with human values, and as such is known, pace Stuart Russell, as the “Value Alignment Problem.” It is a “problem” in the sense that however one creates an AI, the AI may try to maximize a value to the detriment of other socially useful or even noninstrumental values given the way one has to specify a value function to a machine.
As Russell explains:
The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:
- The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
- Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world’s information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.
In essence, The Problem is identical to one we see in moral philosophy in relation to utilitarianism. If one attempts to maximize one’s utility, specified as x, one may end up with morally repugnant conclusions, such as the violation of the rights of others. For instance, if you told your AI assistant to go to the pharmacy to pick up your medication, you do not want it to violate the rights of others when it does so. You want it to obey traffic laws, pay for the medicine and not steal it, stand in line and take its turn, and not complete its task in the “most efficient” way possible. Efficiency, or in some parlance “optimization,” is in relation to task completion and solving in computational (polynomial) time, and this may actually be in conflict with other people’s interests or rights.
However, when discussions about how to actually create a value-aligned AI begins, there is a problem with The Problem. That is, no one appears to have the terms clear, and if they fail to understand the nuance of “values” and “beneficial” to or for humans, one cannot actually design systems to be useful for humans. I suggest, therefore, that we put some basic terms on the table to help the AI community in its thinking about The Problem.
Objective vs. Subjective Values
In conversations with various people in the AI community, when pressed about what they really mean when they say “value-aligned AI,” the response is typically “the user’s values” (as opposed to Russell’s claim about the human race above). In essence, what is meant by “value” is really more what political scientists or economists would call a “preference.” It isn’t something objectively valuable but something subjectively valuable to the person’s personal point of view.
However, there is a fundamental difference between objective and subjective values. An objective value would be true regardless of the preferences of any particular agent. In moral philosophy, we could identify a variety of such values, or principles say, if we were moral realists. We could identify things like: rights, duties, permissions, excuses, and justifications in relation to particular objective values (such as life, freedom, privacy, etc.). Of course, if we were not moral realists, but were instead relativists or collectivists, we could say that only AIs designed to particular people or communities could ever be so aligned. Even then, however, I’d suggest there is some sort of regress from user to community ad infinitum. Why is this important and why is it that social scientists and the humanities should be involved in this debate?
For the very simple reason that we know a lot about objective and subjective values. The AI community may in fact need us to help them. Otherwise they will end up building AIs that are very good at identifying what their particular user may want, or think they want, but not in linking that AI’s actions with objectively valuable reasons and actions. (For another approach about learning about the user and not taking on values oneself, see Russell’s approach on cooperative inverse reinforcement learning here.)
An example might be helpful here for those of us not steeped in the particularities. Let’s say that you download a new AI application that is a decision aid, personal assistant, and more – call it the “BFF app.” Its entire design is to be your best friend, schedule things for you, find helpful tips for you, etc. However, if it is created in a way to maximize your values, then it may take on your values as its own to the detriment of others. Let’s say it thinks you want more free time because you are stressed out, so it clears all your meetings, despite the fact you need to go to them. Or, perhaps you ask it for directions to a new place to meet a friend for lunch, but it reasons that you don’t need any more friends and takes you to a remote location instead. These would be small but irritating conclusions. It would be far worse if you actually had misanthropic preferences, and it learned how to maximize these. Indeed, one might claim that the AI actually needs to know when to say “no” to bad preferences, as well as needing to have some sort of grasp of balancing values.
Thus, if one really wanted a value-aligned AI, the AI actually needs to understand the moral universe, and how rights, obligations and permissions work, as well as when the user is asking it to do something contrary to the rights of others. But to grant this would be to then claim that The Problem is not about merely getting an AI to do what the user wants, but what is good for “people” writ large. That is, to solve for the values of humanity.
What is more, we ought to question the notion that as long as preference maximization is possible, it is beneficial. Intelligence is not merely about preference maximizing and optimization. Relations of power about who’s preferences are included, maximized, made visible versus those who are excluded, minimized and hidden is a moral and political choice. And if the reader thinks this is some hypothetical futurist vision that does not need addressing now by fields like political science, sociology, economics, philosophy and law, and not merely computer science and robotics, then we need look no further than the present debate on algorithmic bias in criminal sentencing, the market place, housing applications, job interviews, and much more. The future of such systems that are not based on algorithms, but “policies” and learning, is going to be even more pressing.
I prefer to frame this problem in terms of the following: Would the user (or the community, or the society) judge that a particular action in a specific context would be desirable (or acceptable)? Formulating this in terms of a value or utility that is explicitly maximized is just one way of approaching the problem. But an alternative approach might employ a large knowledge base of previous cases of desirable and undesirable actions and ask which of these previous cases the proposed action+context best matches. Such an approach might be better able to capture the highly contextual aspects of moral decision making.
Even when we are only considering a single person’s utility, that utility function would need to capture the impact on the community and the society. Otherwise, the actions chosen would be too greedy and would not be judged EVEN BY THAT PERSON to have been acceptable or desirable. In other words, we are not playing the economists’ game of positing individual utilities such as income or pleasure. Instead, we are trying to model the complex preferences of real people, which (except in the case of extreme narcissism) will also include the well-being of others.