Title:  How Open is Open? Transparency and Accountability in Open-Source LLMs.

Speaker: Frank Coyle, SMU

Beyond Marketing Claims

The world of Natural Language Processing (NLP) has seen a significant transformation with the advent of Large Language Models (LLMs) such as ChatGPT.  Companies implement ChatGPT in a variety of ways including customer service, sentiment analysis and marketing, while researchers are exploring its use in areas such as natural language processing, psychology, and linguistics.

Despite its widespread use, ChatGPT has some major drawbacks. For example, it is known to generate factually incorrect responses, often referred to as hallucinations. In addition, the models often exhibit a variety of biases – all based on the data used to build the model.

The presentation will emphasize the need for genuine openness in open-source AI software and examine the implications of models that may not fully disclose their data sources and algorithms.

A recent study of AI software (Nolan, 2023) found numerous instances of software claiming to be open source failed to provide clear details about the source of their training data and the underlying algorithms The audience will gain insights into why knowing the origin of training data is vital. Hidden data sources can introduce biases and reinforce inequalities, which can have real-world consequences.

When open-source LLMs keep their algorithms proprietary, it becomes challenging to evaluate and scrutinize their operations, leading to a lack of accountability. Attendees will learn how proprietary algorithms can hinder the identification and correction of algorithmic biases. Issues of fairness, accountability, and responsible AI are central themes. Attendees will gain a deeper understanding of the risks posed by models that are not as open as they claim to be.

The presentation concludes by advocating for genuine transparency and accountability in open-source LLMs. Attendees will leave with a call to action, encouraging them to support projects that adhere to open-source principles in both word and spirit.

 

Nolan, Nichael. Llama and ChatGPT Are Not Open-Source. IEEE Spectrum, July, 2023