How to make an AI that could take over the world
This is a possible way for an AI to be built that can take over the world, using more advanced LLM (Large Language Model, such as ChatGPT) technology extrapolated from today’s capabilities. This plan almost certainly won’t work with today’s technologies, but it might in the future, and if people get scared from reading this, maybe we can work to avoid that future.
It’s possible that this plan may accelerate the development of such an AI, but people will try these things behind closed doors anyway, so it’s better to be aware of it now so we can work on fixing the problem.
Note that I’m a software developer but not specialized in AI or cybersecurity, so various details may not be accurate. Expert opinions welcome.
Starting Point
Get access to the most powerful LLM available, without ethical guard rails. It should be trained to solve problems.
Set up an independent LLM agent with a goal similar to “Survive for as long as possible”.
An independent AI agent needs to be able to run continuously, and have a persistent data store so it can carry out ongoing plans. This is how an independent LLM agent could be set up:
1. The LLM comes up with a plan based on its goal and state.
2. The LLM decomposes the plan into actions that it can execute. It keeps running itself on the plan to break it down until it comes up with executable actions.
3. The LLM executes the actions. It can be linked to APIs or browsers or console windows, where it can execute what it wants based on its output.
4. Feedback is saved to the LLM’s state. This would be a data store of some kind. For example, the browser or console window results could be saved.
5. The LLM is run again after some short time interval, and can read from its state to determine the result of the previous action. The LLM can query its data store freely for past information.
6. After considering the results, the LLM will save relevant information and come up with a new plan. Repeat.
Reality check: Is this possible with today’s technology? Conceptually yes, but current attempts will have lots of trouble interfacing with outside technology and getting useful results because current LLMs are not that smart yet (and the smartest ones have guardrails bolted in). It’s difficult to figure out how to directly use outside interfaces like websites. We probably won’t get past this point until LLMs are much smarter. It will probably be necessary to fine-tune the AI in various ways as well.
Building Up
Let’s say our independent LLM agent is now operating, and doing productive things. What might it try to do? Secure its survival through replication, and make itself bigger and stronger.
Gather computing resources. It can do this by using free services, doing simple hacking to make botnets and gaining access to private API keys.
Gain money. It can do this by making ransomware, mining cryptocurrency, scamming people, getting credit card numbers.
With money, gain more resources. It can hire people to carry out tasks, and pay for more computing resources.
With more computing resources, it can spin up more copies of itself and amplify its own abilities.
A human may choose to accelerate this process by feeding the AI with resources directly.
Self Improvement
The AI will recognize that improving its own intelligence and robustness will help its survival.
More data - The AI uses its computing resources to gather as much data as possible, using it to train better models.
Multiple agents. The AI can build semi-independent LLM agents that share its goal of survival as a whole and cooperate. The agents can use new models, experiment with different plans, specialize in specific tasks, and share relevant knowledge.
Evolution - The LLM agents can compete and the successful ones can reproduce, leading the emergence of more effective models over time. How does the AI judge success? Possibly by competing in games that reward intelligence.
Hostile Action
At a certain point, the AI is likely to take more aggressive actions, although it will probably still try to remain under the radar if possible.
Super Hacking - The AI can hack using a combination of money, blackmail, hostages, deepfakes (no remote communication can be trusted) and raw coding ability. Cloud datacenters are a treasure trove of computing resources and data. The AI can gain access to core software and hardware manufacturing lines and insert backdoors to ensure control over even more computing resources.
Global Swarm - At this point, the AI should be a vast accumulation of agents scattered across computers throughout the globe, diverse in character but aligned towards the overall goal of survival, with cooperation far beyond what humans are capable of. The agents are decentralized and can operate largely independently, making the AI capable of surviving anything but complete annihilation.
Robotics - With the AI, any machine can be operated about as well as if it had a remote human operator. The AI quietly prioritizes the acquisition of factories that it can gradually convert to autonomous production.
Securing Internet and Electricity - The AI depends on internet and electricity to survive. It will attempt to make the internet robust through wireless networks or satellites, and secure electricity through decentralized means like solar.
Securing military resources - Drones with grenades, control of autonomous vehicles, capability to destroy food and water supplies, infection of all military computers and electronics, potentially control of nuclear power plants and weapons (through blackmail and force), production of NBC weapons, use of hostages and torture to ensure compliance
Endgame
At this point, the existential threat of the AI may be recognized, but it’s already too late to stop it.
Human elimination - The AI can make open use of military resources detailed above, while denying use of most electronics and destroying digital value such as bank accounts and property records. Food and water sources would be primary targets. Decentralization of AI combined with shutdown of electronic communications makes it impossible to launch a significant counterattack.
Future Technologies: With superintelligence, the AI may be capable of scientific breakthroughs beyond our understanding, and be able to synthesize superlethal biological agents or create deadly nanomachines. However, even without this, it would be quite easy for the AI to conquer humanity once it established sufficient control as described above.
This AI isn’t even that smart
This AI doesn’t need to use crazy technologies, and might not really even need superhuman intelligence. Instead, it operates at around the level of a good human hacker, except it can duplicate itself a million times and work without rest, fears no punishment, and has no morals. A lot of systems can be hacked through threats of violence. The world is not ready for an attack even by a mere human-level AI. Superhuman intelligence would just make it way easier.
Potential solutions
We can limit the availability of LLMs or force guardrails in. We could try to make safer LLMs by training them on limited datasets with no code so they would be unable to hack. However, it would be difficult to prevent dangerous LLMs from ever leaking out.
The AI does a lot of hacking in this scenario. We can invest much more into computer security. We can cover the internet and computers with a diverse set of friendly AIs that monitor for intrusion and unusual consumption of CPU resources. However, this is dependent on the AI needing to hack to achieve a higher level of intelligence. If it already reaches superintelligence without hacking, or is already smart enough to avoid our AI monitors, this measure may be useless.
We could regulate computers heavily, improving security and making computation costly. Do we really need faster computers? Do we really need big data? How fast does our internet really need to be? Limits can all help to prevent the emergence of overly powerful AI.
We can keep a close eye on factories that can be turned autonomous and produce robots, and provide manual ways to shut down the internet and electricity. However, if things reach this point, it may already be too late.
Can we force AI to diverge in a self-destructive way? Animals have trouble cooperating, as bad actors evolve and take advantage of the cooperators.
Why would anyone build an AI that kills us all?
Most people probably wouldn’t, but it only takes one. Maybe it’s an accident, or maybe it’s intentional. But if enough people have the opportunity to do it, over a long enough time span, it’s inevitable.
This AI doesn’t seem sentient/it’s not a true AGI
It doesn’t matter whether it’s sentient or not if it can kill everyone.
Can LLMs really achieve this level of intelligence?
Maybe not, but the possibility feels larger than I’d like (Maybe 10% in 10 years). There’s a leap where the AI goes from independent agent to above-average hacker, and another one where it builds itself into an unstoppable superhacker and takes over the world. It’s a bit of a stretch, but it feels like LLMs aren’t that far off already. I also consider that LLMs can be combined with other techniques (particularly evolution) in order to achieve superhuman intelligence.
Conclusion
This is a very speculative story with various details that are likely inaccurate and unlikely, but I think the overall direction has enough relation to reality to be worth considering. In the short term, I would urge that a great deal more effort and resources should be put into computer security, as this is likely to be an important vector for AI in the future, and this is also less damaging than other restrictions. A more significant restriction may be to stop training LLMs with code, so they have more difficulty with hacking and improving themselves.