Timnit Gebru, an accomplished and respected figure in the computer science field, is particularly known for her revolutionary work in artificial intelligence (AI) and its ethics. As a strong advocate for expanding diversity in technology, she is also the co-founder of Black in AI, an organization of Black researchers working on AI. At Google, she assisted in the construction of the most diverse AI team with leading experts, and often challenged various mainstream AI practices, such as the commercialization of face recognition to police because of its inaccuracy towards recognizing women and people of color- bringing up the possibility of misuse and discrimination against POC.
Recently, however, Gebru has left the multinational technology company over complicated disputes. A series of tweets, leaked emails and online articles have revealed Gebru’s exit was the result of a conflict over a paper she co-authored.
According to Jeff Dean, the head of Google AI, stated that the paper “didn’t meet our bar” and “ignored too much relevant research” on the efforts of minimizing the effects of large language processing models on the environment and bias; quickly cutting off Gebru’s access to her work email after a series of internal email conversations (MIT Technology Review). Gebru, however, among many others, claim that she was wrongfully fired, and forced out of the company. In fact, more than 1,400 Google staff members and 1,900 external petitioners have signed for her support.
According to the MIT Technology Review, who had access to early drafts of the paper–unavailable to the public as of now–covered a series of ethical issues and risks regarding the usage of large language processing models, which Google has been using and building overtime. For some context, natural language processing models are essentially machine learning systems built to sort through sample text to draw some conclusions or “predictions” from them. So, the larger the system, the more text needed.
The four main issues discussed in the paper include environmental and financial costs, enigmatic or inscrutable datasets, misdirected research efforts, and essentially the potential of conning users and providing misinformation.
Firstly – not only can training machine learning models be extremely costly, they produce a whole lot of CO2 as well. According to the MIT Technology Review, training a variation of Google’s language model (BERT), which supports the company’s search engine, has produced 1,438 pounds of CO2. Worse, this number should be viewed as a minimum and the emissions produced after testing the model once.
Secondly, and this is where everything begins to intertwine, there’s the problem of using massive amounts of text to train the machine. Large language models are going to need more text samples to sort through and learn from. Today, much of that data comes from the Internet and all the websites that come with it- even I used a sample from Amazon for my own RNN sequence. Nonetheless, the problem isn’t the Internet itself really, but the potential content the machine might be sorting through. It’s important to keep in mind that an AI model learns what it is given- it cannot distinguish between racist, sexist, or abusive language in the dataset.
Obviously, with the expanse of the internet and massive dataset, there is no guarantee such language will not be included in the training set. Simply put, teaching an AI model that such language is “ok” would be bad. In addition to the normalization of such language, these AI models will not be able catch the nuances of new anit-sexist and anti-racist vocabularly that has arisen from recent political risings, such as the MeToo and Black Lives Matter movements. Furthermore, these models will miss the complexities of languages, cultures, and norms of marginalized groups, those with less access to the Internet and smaller linguistic footprints online. The language generated from such AI models will only be able to produce homogenized results, disproportionately reflecting the richer countries and communities, and failing to properly mirror the diversity we find today.
This disproportion is the limitation of AI- these language models cannot understand language, but merely analyze the patterns employed in language. The real problem is Google is willing to ignore these potential consequences and profit off of them. By commercializing this skill–analyzing data and language– the focus for multiple companies, and likely Google as well, becomes to solely increase the accuracy results of these models rather than producing learning models that project to understand language and overcome these limitations for POC. Not only will such companies be sweeping mass amounts of profit from these models, there are no guidelines as to how such models can be used. These models can also learn to mimic human beings and spread misinformation or other malicious content.
As another person of color and aspiring computer scientist, not only is this disheartening to hear, but also concerning. AI ethics is already a touchy subject, and with the increasing use and research towards AI functions, it’s crucial we get the ethical foundations correct and fair. Technological innovations designed to benefit one group cannot be considered innovations. It simply becomes a disadvantage for another group. Technology should benefit the large majority and constantly aim for the amelioration and consideration of all groups, regardless of background, race, ethnicity, or gender. As the world gravitates towards the increased use of tech, it’s only fair that it’s constructed to include everyone- not a select few.
– Janet