Is Decentralized AI Overhyped?
The Cherry Without the Cake
--
The AI Problem
It’s probably a good idea to start by defining the problem decentralized AI aims to solve. As a summary:
AI will make more and more important decisions in society with undesired bias
Basically, as AI models become more powerful, it’s not inconceivable that they will start being used to make more critical decisions in society (e.g. deciding which treatment a patient should have, or presenting legal evidence in a court of law). However, there is no current way to prove whether the model being run is giving unbiased results 🕵️. Essentially, a model host can either intentionally or accidentally provide a model that gives inaccurate answers with no way for the user to verify its quality.
One source of bias is the dataset used to train the model 💾. For example, suppose we are training a model to diagnose illness in patients. If the model is only trained on male medical data, it will likely give biased results when treating women.
Another issue could be hidden model inputs not provided by the user 🤐. You might think you know what inputs a model takes, and so roughly what features have been used in training, because you provide the input yourself. For example, with something like ChatGPT, you provide a text prompt, so you might assume that its output is purely a function of that input. However, it’s possible that the model takes in further inputs not provided by the user, but which are rather provided when training or at runtime by the model host. For example, ChatGPT requires an initial ‘prompt’ to tell it what kind of LLM it is. You can see this from the recent OpenAI dev day talk where you have to give an initial instruction to tell a GPT what sort of responses to give from then on.
One potential solution for this is to make models open source (like Twitter did for their recommendation engine). Open source code is certainly more easily auditable by the public; for example, any hidden model inputs would be laid bare. However, this still has some problems:
- 💰 There is little economic incentive for companies to open source models
- 👥 There is no way to prove that the open-source model is the one being run