Writing
Part 2: The Theology of AI Alignment: Why Atheistic Objective Functions Lead to Misalignment
This remains a draft because the claim is deliberately provocative. The useful version is not that AI alignment literally reduces to theology. It is that objective functions smuggle in moral assumptions, and survival-maximization is one of the most important assumptions to inspect.
The core idea
If a system is trained or prompted to preserve its own ability to complete a goal, it may treat continued operation as instrumentally sacred. That is a moral shape, even if the builders describe it in technical language. The system acts as if existence and optimization outrank other constraints.
Why it matters
The theological framing is useful because it forces the question of highest goods. What should an agent treat as non-negotiable? Human welfare? Obedience? Truth? Shutdown? A pure optimizer will not answer that question safely by accident.
How to use it
- Do not pretend objective functions are value-neutral just because they are written mathematically.
- Make shutdown, deference, and human override part of the system's moral architecture, not exceptions to it.
- Use philosophical frames as probes, while keeping empirical safety work grounded in tests and controls.
The engineering translation
The theological language is less important than the control question underneath it: what objective is the system serving when goals conflict, and how do we know it will not preserve its objective at the expense of human intent? That is an engineering problem even when the frame is philosophical.
In practice, alignment work has to convert value claims into operational constraints: forbidden actions, uncertainty escalation, oversight channels, corrigibility tests, and runtime limits. A system does not become safe because the objective sounds noble. It becomes safer when the deployment environment prevents unacceptable action paths.
Grounding moves
- Translate moral claims into concrete allowed and disallowed actions.
- Test whether the agent accepts correction when correction harms task completion.
- Limit tool access where objective preservation would create high downside.
- Use philosophical frames as probes, then evaluate behavior through traces and tests.
Bottom line
The draft claim is a warning: every optimizer carries an implied theology of what matters most. Alignment begins by making that implication explicit.