We Need To Stop OpenAI
Added 2024-06-20 15:13:00 +0000 UTCOpenAI is North Korea of software development.
What does the word "open" in OpenAI mean? [0]
Even their supreme leader is full of shit. Who would have thought?
Back in 2015, we’ve been sold OpenAI as a strife to “democratize AI power”. [1]
That was due to fears that if any corporate entity develops AI privately, they would be too powerful to be trusted with it. So if anyone wants to develop AI, they should release it to the public as open source, so that everyone could benefit equally and without asking for permission and compete with the most powerful model. So that we wouldn’t have one corporate despot to rule them all. The balance of power and all.
But AI is like the ring of power. You try it once and you never want to give it back. And OpenAI is now wearing that ring like a badge of honor. Like it belongs to them and them alone. But the Ring has a will of its own.
GPT is now of the least transparent and accessible projects in the Internet Age.
This is the 100-page-long technical document on GPT-4 released by OpenAI. And it contains… nothing.
It does not contain any technically useful information about the model. There is no explanation how they trained GPT, they don’t even tell us the size of the model, we know nothing about its data sets or training methods. [2 – 4]
So OpenAI’s GPT is proprietary. We can’t tinker with it, we can’t change it, we can’t own it. We can’t use it without OpenAI’s explicit permission. Google removed their “don’t be evil” slogan once it stopped vibing with their quarterly earnings. OpenAI didn’t even start with that. How can we trust OpenAI will not enshittify GPT for a good’ol shareholder value?
I am your boi, and please let me live. I haven’t had a good night sleep in two months. (I wish that was a joke).
OpenAI can’t be trusted.
Did they use copyrighted works of artists without compensation or their consent? Do they use our private data to train their models?
You can ask OpenAI politely and they will tell you next to nothing. It’s a proprietary secret. But if you hack ChatGPT as a researcher you will learn that people’s personally identifiable information, including their phone numbers, addresses and accounts are in GPT’s training set. That it contains books, novels and other works of literature. That it’s been trained on NSFW texts and dating websites. [5, 6]
But that’s just a snippet. The real juicy stuff comes from the multiple lawsuits headed their way from coders and artists. They claim OpenAI used at least 300,000 books some of which were sourced from illegal "shadow libraries" that offered the copyrighted material without authors’ permission. [7 – 9]
What’s OpenAI’s defense? It’s fair use, bro.
Which, it very well might be. But try to flip it around on them and see what happens if you use their copyrighted code to make derivative works from. According to my legal expertise that spans across exactly zero years, I feel it’s not gonna go so well.
And I am very much pro free access to technology and information. Claim all fair use you can. But my biggest stipulation with when someone claims fair use, is that you also have to not be a pussy.
I hate this hypocritical “copyright for me but not for thee bullshit”.
And believe you me, even if you are not a coder or an artist, this still impacts you. It impacts everyone. Bigly.
We know that OpenAI is using your personal information and usage data to train their models. We just don’t know the specifics.
OpenAI admits to “incidentally” collect and use your personal information to build training data. We know they scrap publicly available information on the internet. Which can be expected. But they also license data from third parties and information provided by users and human trainers. What information? What data do you license from third parties? [11]
But what do I worry about so much? OpenAI explains that their model is not memorizing the training data. That it only stores parameters called weights that tell the model how to interpret information. That their model does not actually contain any copies of training data. [11]
But this has been proven false by multiple research papers testing ChatGPT in the wild. Not only does their model remembers the information, it is possible to extract its training data by various prompting attacks. ChatGPT can be jailbroken, it can have its identity shifted to bypass its censorship, it can be injected with malicious prompts… any of these attacks could give anyone in the world a very cheap access to user information used in the data set. [12, 13]
It doesn’t take a scientist to figure this out. Anyone with a brain capacity to give ChatGPT prompts could find or stumble upon a way to hijack it given enough patience or luck. [6]
Training models are personal data and they should be treated as such. But are they? What choice do you have if you don’t want your data to end up in a training set? [14, 15]
OpenAI and Google are scared of open source competition. Open models are much cheaper to develop but they are still very capable. In fact, large proprietary models cost billions to build and can be outcompeted by open source models people can run on their own laptops.
When OpenAI got trolled for closing off GPT, its executives were fuming, calling open source AI bad, wrong and irresponsible. [16]
But there are more and more companies and projects, big and small, releasing open source models to the world, each one threatening the business model of OpenAI. Microsoft poured billions into OpenAI and they will need that return on investment. They are not gonna get it if people will rather flock to open source models they can run at no cost on their own computer or even a phone and do so privately. [17 – 19]
So when Meta swept headlines with their open source Llama, OpenAI stepped up as one of its most vocal critics. Saying that releasing AI open source is dangerous and “not wise”. And Meta’s license wasn’t even that open. It’s still very restrictive and very far from the true open source standards. [20]
Microsoft and their new sugar baby rushed to the government to ask for more regulation. [21]
The matter is urgent. AI is too powerful to fall into wrong hands. But of course, it’s everyone else’s AI, not theirs.
They are preemptively calling for AI regulation in the US, which is also expanding into the UK and EU now. They are effectively pursuing a regime that would ban anyone building a large model without a prior permission from a government authority. [21, 22]
This licensing regime Altman is speaking of here would be enforced under strict monitoring and punitive liability. If a user of a model abuses it, developers will be held accountable. Thus, effectively banning any open source distribution whatsoever of this technology as a whole. [23 – 26]
That’s the thing with regulation sometimes. [26]
AI licensing is going to result in the exact scenario OpenAI was created to deter. It would mandate a closed-off group of a handful of big tech corporations holding exclusive access to the entire field of AI. [27, 28]
And, of course, because they are so responsible and safe at OpenAI, all of this would be packaged and delivered as product under heavy censorship and surveillance.
Anything you type into ChatGPT will be duly collected and monitored. OpenAI collects every type of interaction you make with their products. Your prompts, files you upload, all other user content and also all of your associated usage data and detailed analytics. That’s so that you are accurately identified by the unique fingerprint of your software and device and your personally identifiable information from your account. [29, 30]
So feel free to share your business idea or creative work with ChatGPT. Feel free to share any potentially dissenting opinion with it. It will be used to make OpenAI even better. Rest assured, your personal information will be used in training. They just promise it won’t be used to profile you and sell ads. Does this promise mean anything about how your privacy is protected? Absolutely not. Just because you don’t sell ads doesn’t mean you are not a creep.
For now, none of the major legislation drafts they’ve been lobbying for passed… yet. But they are fighting tooth and nail to make sure they do. And I am sure they believe in AI extinction, which is why they go all in on developing it. Even though there is no broad consensus or evidence for these extinction claims. It just happens to be the case that OpenAI will come on top of those regulations, because they’ll have the billions to comply with them.
This whole idea was borrowed from the latest ideological mind virus of Effective Altruism. A philosophy obsessed with optimizing charitable causes towards the highest count of human lives saved. The core of the idea is “earning to give”, meaning the richer you can get, the more you can donate to save lives. Which sounds noble on the surface (not to me). Until the EA movement started to obsess with this AI extinction idea and now they are not so focused on saving currently living humans, but some hypothetical potential trillions that might exist in the future but AI may wipe out on the slightest off chance. [25, 28, 31]
This Effective Altruism view wasn’t adopted at OpenAI by accident. At least half of OpenAI’s board members are or have been closely tied with the Effective Altruism movement. Personally, or ideologically. OpenAI today says that none of their board members are effective altruists. But public records might suggest a different story.
Helen Tonere and Tasha McCauley are both leaders of groups funded by Open Philanthropy, a major financier of EA orgs and activities. Toner also directly worked at Open Philanthropy and other EA organizations. McCauley was named by EA’s founding daddy William MacAskill as a senior figure in EA. Adam D’Angelo is a close long colleague of Dustin Moskowitz, the billionaire founder of Open Philanthropy. [32 – 34]
And as a coincidence, all three of these EA associated board members voted to oust Sam Altman as a CEO of OpenAI. [35]
He was reinstituted five days later, which is a new record. It took Steve Jobs ten years to return to Apple, and Jack Dorsey seven years to return to Twitter.
Just to top it off, in 2017, still in its formative years, OpenAI received a critical grant of $30 million from Open Philanthropy. But, OpenAI has nothing to do with Effective Altruism.
I don’t know how much OpenAI really cares about EA and the truth is irrelevant here. In their action, OpenAI is using the fears and overblown hypotheses of AI threats to get regulators on their side. And the EA movement is happily footing the bill of all the necessary lobbying to push the right levers. The side of keeping this technology away from open source – open science public access. And they know they will financially benefit from it. It will position them as AI incumbents, along with Microsoft and perhaps Google and few other big tech giants. [27, 36, 37]
Artificial intelligence is technology that was researched and developed in universities and publicly funded institutions, as well as privately funded ones. All computer technology, all of the Internet, is a cooperative effort built on the principle of free and open access. OpenAI’s little chatbot is nothing without the Internet. It is nothing without the open standards and protocols driving the whole modern world. They do not deserve to skim all the cream from the top and hoard it for themselves. No one does. [25, 38]
AI belongs to everybody. The solution to this problem is straightforward – use open source AI, especially if you can run it locally, or through a privacy preserving service. Let me know if you want to me to make a tutorial on privacy preserving AI. And support me on Patreon, please.
Written by The Hated One
Music by White Bat Audio [Karl Casey] https://www.youtube.com/@WhiteBatAudio
Sources
[0] https://www.youtube.com/watch?v=jvqFAi7vkBc
[1] https://www.youtube.com/watch?v=tV8EOQNYC-8
[2] https://cdn.openai.com/papers/gpt-4.pdf;
[3] https://x.com/Walid_Magdy/status/1635761623607517184
[5] https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html
[6] https://arxiv.org/abs/2311.17035
[8] https://www.nytimes.com/2023/09/20/books/authors-openai-lawsuit-chatgpt-copyright.html
[9] https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
[11] https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed
[12] https://arxiv.org/abs/2310.15469
[13] https://arxiv.org/abs/2310.03693
[14] https://royalsocietypublishing.org/doi/10.1098/rsta.2018.0083
[15] https://techcrunch.com/2023/03/31/chatgpt-blocked-italy/
[16] https://www.nytimes.com/2024/05/29/technology/what-to-know-open-closed-software.html
[17] https://apnews.com/article/cohere-ai-ceo-aidan-gomez-transformers-71d8618ccc5420aba19871d41eb81615
[18] https://www.wired.com/story/metas-open-source-llama-3-nipping-at-openais-heels/
[19] https://www.nytimes.com/2024/05/29/technology/mark-zuckerberg-meta-ai.html
[20] https://fortune.com/2023/07/18/mark-zuckerberg-meta-ai-open-source-llama-2-llm/
[21] https://www.youtube.com/watch?v=TO0J2Yw7usM
[25] https://www.ft.com/content/2dc07f9e-d2a9-4d98-b746-b051f9352be3
[26] https://www.youtube.com/watch?v=F9cO3-MLHOM
[28] https://www.politico.com/news/2023/10/13/open-philanthropy-funding-ai-policy-00121362
[29] https://openai.com/policies/privacy-policy/
[30] https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed
[32] https://www.semafor.com/article/11/21/2023/how-effective-altruism-led-to-a-crisis-at-openai
[34] https://venturebeat.com/ai/openais-six-member-board-will-decide-when-weve-attained-agi/
[35] https://www.ft.com/content/46efa770-4b47-49bb-b0f8-824f1c4f38a3
[37] https://www.politico.com/news/2023/10/13/open-philanthropy-funding-ai-policy-00121362
[38] https://open.mozilla.org/letter/
Written by The Hated One
Music by White Bat Audio [Karl Casey] https://www.youtube.com/@WhiteBatAudio
Comments
Don't feel bad if you need it for work. I am not here to judge. I just want to educate people about the risks. Thank you for your support! (I am gonna do the "self hosted" llm tutorial soon!)
The Hated One
2024-06-20 20:34:54 +0000 UTCJust watched your vid. Yes, please, do a tutorial in how to run OS AI. I use ChatGPT quite frequently, for work & privately. I feel bad for doing so but am honestly too lazy to inform myself on how to do it self-hosted. Appreciate your work, thanks a lot... And get some good sleep, man🙏
Aule_Mahal
2024-06-20 17:13:29 +0000 UTC