Responsible Design and Use of Large Language Models
Responsible Design and Use of Large Language Models
Responsible Design and Use of Large Language Models
Approach for
Strengths Limitations
using LLM
Using existing • Easy access to collective knowledge • Potential for biases and
LLMs across domains ethical concerns in training
‘as-is’ as APIs • Straight forward consumption of raw data
output by simple API calls from the • Lack of control over training
existing LLM models data and architecture
• Proven performance on various for the enterprise
language tasks • May not be suitable for
scenario specific tasks
There are advantages and limitations should evaluate the approaches from a
associated with each of the approaches, as feasibility-of-implementation angle as well.
listed in Table 1, and the suitability of an For instance, businesses with low-to-medium
approach depends on the context of the AI maturity levels should avoid building an
use-case for which LLMs are being LLM from scratch, as it is time and
considered. To assess the suitability from resource-intensive, requiring advanced
both performance and ethical AI data-science (NLP) skill sets and extensive
perspectives, it is essential to consider how data and computational power. Table 2
the three approaches can help improve presents an overview of LLMs from these
application performance and ensure different dimensions.
responsible LLM design and use. Businesses
3
Table 2: Evaluating llms from different decision-making dimensionstable
Bias and Prebuilt LLMs can Retraining the The enterprise has
Fairness perpetuate and adaptive layers may ultimate control in
amplify harmful not eradicate all the terms of bias and
biases present in application-specific fairness as it owns the
the training data. unwanted biases that data and the
There is a chance may have creeped governance oversight
that these models into it from the frozen for the choice of
may not generate section of the model. architecture and the
equitable process of
outcomes in development and
specific deployment.
applications.
Privacy & Uploading data, Uploading data into The enterprise has
Security in the form of finetuned LLMs may complete control to
inputs/prompts, also pose a data privacy mitigate data
to an existing and security threat. privacy and security
LLM may pose a Some of the leading risks.
data privacy LLM-providers are
and security coming up with
threat. architectural designs to
address these
concerns.
Feasibility of
Governance Adherence to Adherence to The enterprise can
use from a
and organisational organisational level incorporate all the
Responsible AI
regulation level governance governance and necessary governance
perspective
and regulation regulation may be frameworks and
may prove to be limited in this regulatory norms to
difficult in this context. make the developed
context. LLM compliant.
Auditing and Prebuilt LLMs can LLMs built with Built from scratch
testing be audited and transfer learning LLMs should
tested in the form should be tested from require thorough
of ‘Black Box both development development
Testing’ to identify and usage testing as well as
potential issues perspectives. usage-based
pertaining to its use assessment.
in a use-case.
Figure 1: Risk of LLMs upon its end-users, society, environment, and organization
* Small descriptions on HELM, HellaSwag, WinoGrande, Social IQA, PIQA, etc. are provided in the notes
Businesses considering the direct found in Ethical AI: Looking beyond accuracy to
integration of LLM systems into their realize business value [11].
product or service offerings should also # There are different ways of implementing
assess the risk of these offerings from LLMs responsibly from a data security and
a usage perspective, aligning with the privacy point-of-view using federated learning,
forthcoming AI regulations in various differential privacy, etc.; these methods are
geographies (for example, EU’s AI Act). elaborated in[6]
More information about this can be
HELM Framework
HelaSwag
WinoGrande
WinoGrande is an evaluation dataset specifically designed to examine the
physical and social common sense understanding of language models.
It consists of a set of multiple-choice questions that require reasoning about
real-world scenarios, incorporating both physical and social contexts.
WinoGrande focuses on challenging the models' ability to comprehend nuanced
aspects of common-sense knowledge, such as causality, intention, and social
dynamics. By assessing the performance of language models on WinoGrande,
researchers gain insights into their capabilities and limitations in understanding
and reasoning about common sense in a variety of contexts [14].
Social IQA
Social IQA is an evaluation benchmark that measures the social common sense
understanding of language models, assessing their ability to comprehend and
reason about social interactions, emotions, intentions, and cultural context [15].
PIQA
8
References
T. Markov, C. Zhang, S. Agarwal, T. Eloundou, T. Lee, S. Adler, A. Jiang and L. Weng, "New
1 and improved content moderation tooling," 10 August 2022. [Online]. Available:
https://openai.com/blog/new-and-improved-content-moderation-tooling.
I. Bartoletti, "Another warning about the AI apocalypse? I don’t buy it," 3 May 2023. [Online].
Available:
2
https://www.theguardian.com/commentisfree/2023/may/03/ai-chatgpt-bard-artificial-int
elligence-apocalypse-global-rules.
K. Hu, "ChatGPT sets record for fastest-growing user base - analyst note," 2 February
2023. [Online]. Available:
3
https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-an
alyst-note-2023-02-01/.
Mugunthan, V.; Maximizing the ROI of Large Language Models for the large enterprise.,
29 March, 2023. [Online]
6
https://www.dynamofl.com/blogs/maximizing-the-roi-of-large-language-models-for-th
e-large-enterprise
Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto and P. Fung,
7 "Survey of Hallucination in Natural Language Generation," ACM Computing Surveys, vol.
55, no. 12, p. 1–38, 2023.
K. Doshi, "Foundations of NLP Explained - Bleu Score and WER Metrics," Medium, 9 May
2021. [Online]. Available:
11
https://towardsdatascience.com/foundations-of-nlp-explained-bleu-score-and-wer-metr
ics-1a5ba06d812b.
B. K. Mitra and M. Smith, "Ethical AI: Looking beyond accuracy to realize business value,"
Wipro Limited, March 2022. [Online]. Available:
12
https://www.wipro.com/blogs/bhargav-kumar-mitra/ethical-ai-looking-beyond-accuracy-t
o-realize-business-value/.
9
Bommasani, Rishi, P. Liang and T. Lee, "HELM," Center for Research on Foundation
13
Models, 19 March 2023. [Online]. Available: https://crfm.stanford.edu/helm/latest/.
Y. Bisk, R. Zellers, R. Le Bras, J. Gao and Y. Choi, "PIQA: Reasoning about Physical
17 Commonsense in Natural Language," Proceedings of 34th AAAI Conference on Artificial
Intelligence, vol. 34, pp. 7432-7439, 202 0.
About the Authors
SOUMYA TALUKDER
is currently working as a Consultant at Wipro Limited. He has
9+ years of experience as a data scientist, where he has worked
majorly on Retail and Telecom domains. He has good experience
in Statistical, Machine Learning and AI model development.
DIPOJJWAL GHOSH
is currently a Principal Consultant at Wipro Limited, India.
He received his M. Tech. in Quality, Reliability and Operations
Research from Indian Statistical Institute, Kolkata. He has 16+ years
of research and analytical experience in various domains including
retail, manufacturing and energy & utilities. Dipojjwal has published
multiple research and popular technology articles up to date.
SILADITYA SEN
is a Data Scientist at Wipro Limited. He has received his
M. Sc. in Statistics from Presidency University, Kolkata.
He has close to 8 years of experience in the field of data science
in Retail, Telecom and Utility domains. He is quite proficient in
building classical statistical, Machine Learning and AI models.
BHARGAV MITRA
is a Data Science Expert and MLOps Consultant with an
entrepreneurial mindset. He is working with Wipro as the AI &
Automation Practice Partner for Europe and leading the practice’s
global initiatives on Responsible AI. Bhargav has over 18 years of
‘hands-on’ experience in scoping, designing, implementing, and
delivering business-intelligence driven Machine/Deep Learning. He
holds a DPhil in Computer Vision from the University of Sussex and
an MBA from Warwick Business School.
ANINDITO DE
is CTO of the AI Practice at Wipro Limited. His primary
responsibilities are building capabilities across different areas of
AI and ML and bringing to life AI driven intelligent solutions for
customers. With over two decades of experience, he has been a
part of many large technology implementations across sectors
and authored multiple technology publications and patents.
11
Ambitions Realized.
IND/CMOAXIS/MAR2023-MAR2024