PWC AI Engineer Interview Assignment Guidelines
PWC AI Engineer Interview Assignment Guidelines
PWC AI Engineer Interview Assignment Guidelines
S
Lambda, Poetry, Langchain, OpenAI, Pinecone and Gradio
his post demonstrates building a basic Document Q&A Generative AI application using AWS serverless technologies,
T
Langchain, AWS Cloud Development Kit (CDK), Poetry, OpenAI, Pinecone and Gradio.
arge Language Models (LLMs) perform quite well on general knowledge retrieval tasks because they have been trained
L
on a large corpus of openly available internet data. However, in an enterprise environment or use cases where we need to
ask questions on more specialized topics, a general purpose LLM will have issues in coming up with precise responses.
LLMs can be used for more complex and knowledge-intensive tasks by using an approach called Retrieval-Augmented
Generation (RAG). RAG combines retrieval-based methods with generative language models to improve the quality of
generated text, especially in question answering and text generation tasks.
or the purposes of this example, we will build a simple system to do question-answering on openly available clinical trial
F
protocol documents (https://clinicaltrials.gov/). To setup the knowledge base for the RAG approach, we will build a data
pipeline that ingests a clinical protocol document (pdf format), converts the pdf to text format, chunk the document into
smaller pieces, convert the chunks into embeddings using an embedding model and store the embeddings in a vector
database.
ow, when we ask a question to the LLM about the clinical trial, we can provide more context to the LLM by providing
N
relevant chunks of the document as part of the prompt. This can be done by using a framework such as Langchain, that
takes the user question, converts the question into a vector representation, does a semantic search against the
knowledge base, retrieves the relevant chunks, ranks them by relevance and sends the chunks as context in the prompt
to the LLM.
Architecture
The following diagram shows a high level architecture of the system:
Unset
npm
install
-g
aws-cdk
Run the following command to verify you installed the tool correctly and print the version installed.
Unset
cdk
--version
Run the following command to create a sample CDK project in Python language.
C/C++
cdk
init
app
--language
python
You should see an output similar to this after you run the poetry install command.
Unset
genai-blog-py3.9)
( [ssm-user@redacted
genai_blog]$
poetry
install
Updating
dependencies
Resolving
dependencies...
(2.4s)
Package
operations:
52
installs,
0
updates,
0
removals
•
Installing
attrs
(23.1.0)
Installing
the
current
project:
genai_blog
(0.1.0)
• Create an index in Pinecone (https://www.pinecone.io).You could use the Starter free tier option that allows you to
create one index and one project. Give the index a name, enter 1536 for dimension size and euclidean as Metric.
he reason we use 1536 as the dimension is because we will be using the OpenAI embeddings for this application.
T
Please refer this link for details: (https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)
Next, create an API key within your Pinecone account. Note the API key value, Environment and Pinecone index name.
• We will use the GPT 3.5 model as the LLM for this application. Go tohttps://platform.openai.com/account/api-keysand
create an API key.
• We will save all the keys into a secret in AWS Secrets Manager and retrieve the keys wherever required in our
application using the boto3 library. Create an api_keys.json file, enter your keys and create a secret using the aws cli.
All this in less than 50 lines of code! Following is the code snippet:
Python
rom
f aws_cdk
import
RemovalPolicy,Stack,
Duration
from
aws_cdk
import
aws_s3
as
s3
from
aws_cdk
import
aws_sqs
as
sqs
from
aws_cdk
import
aws_s3_notificationsas
s3n
from
aws_cdk
import
aws_lambda
as
lambda_
from
aws_cdk
import
aws_lambda_event_sourcesasevent_sources
from
constructs
import
Construct
from
aws_cdk
import
aws_secretsmanageras
secretsmanager
class
GenaiBlogStack(Stack):
def
__init__(self,
scope:
Construct,
construct_id:
str
,
**kwargs)
->
None
:
super
().__init__(scope,
construct_id,
**kwargs)
# Create a SQS queue to receive events when documents land in the "genai-demo" bucket
under 'raw' prefix
queue
=
sqs.Queue(self,
"genai-demo-queue"
,
visibility_timeout=Duration.minutes(
15
))
# Add a lambda function to poll the queue, read the pdf, extract the pdf and convert
to embeddings
pdf_extraction_lambda
=
lambda_.Function(
self,
'GenaiPDFHandler'
,
runtime=lambda_.Runtime.PYTHON_3_9,
timeout=Duration.seconds(
300
),
memory_size=
2048
,
code=lambda_.Code.from_asset(
'lambda'
),
handler=
'genaiblog_pdf_extraction.handler'
,
layers
=[common_lambda_layer],
environment={
"QUEUE_URL"
:
env_queue_url
}
)
# Grant necessary permissions to the Lambda function
queue.grant_send_messages(pdf_extraction_lambda)
Configure the Lambda function to be triggered by messages from the SQS queue
#
queue_trigger
=
event_sources.SqsEventSource(queue,
batch_size=
1
)
pdf_extraction_lambda.add_event_source(queue_trigger)
• Write the lambda function to read the pdf file uploaded to S3 bucket, extract pdf to text, convert to embeddings and
store in a vector database.
Python
mport
i json
import
boto3
s3
=
boto3.client(
's3'
)
def
get_secret():
ecret_name
s =
"demo/gb"
region_name
=
"us-east-1"
def
handler(event,
context):
print
(
"request: {}"
.
format
(json.dumps(event)))
get_secret()
ucket_name
b =
body_json[
"Records"
][
0
][
"s3"
][
"bucket"
][
"name"
]
print
(bucket_name)
key
=
body_json["Records"
][
0
]
[
"s3"
][
"object"
][
"key"
]
print
(key)
print
(f
'
{key}
loaded
to
pinecone
vector
store...
'
)
return
{
"statusCode"
:
200
,
"headers"
:
{
"
Content-Type" :
"text/plain"},
"body"
:
"Document successfully loaded to vector store"
,
}
• Run the following command to synthesize the AWS CloudFormation templates based on the CDK code.
Unset
cdk
synth
Unset
cdk
deploy
Unset
aws
s3
cp
NCT03078478.pdf
s3://genai-blog-pipeline-genaidemoaxxx-cxxxxcxx/raw/
• Uploading the pdf document to S3 will cause an event to be generated and details stored in the SQS queue. Our
'genaiblog_pdf_extraction' Lambda function will receive the details of the document, read it from S3, convert pdf to text,
split the document into chunks, call OpenAI embeddings model to convert the chunks to embeddings and store in the
Pinecone vector database.
• After the Lambda has completed execution, check your Pinecone dashboard to confirm the documents were stored in
the vector database. In the enclosed screenshot, we see 249 vectors were stored after the execution of the Lambda.
Unset
genai-blog-py3.9)
( [ssm-user@redacted
genai_blog]$
poetry
add
gradio
Using
version
^3.40.1
for
gradio
pdating
U dependencies
Resolving
dependencies...
Downloading
https://files.pythonhosted.org/packages/72/7d/2ad1b94106f8b1971d1eff0ebb97a81d980c448732a3e62
4bba281bd274d/matplotlib-3.7.2-cp310-cp31
Resolving
dependencies...
Downloading
https://files.pythonhosted.org/packages/c3/a0/5dba8ed157b0136607c7f2151db695885606968d1fae123
dc3391e0cfdbf/sniffio-1.3.0-py3-none-any.
Resolving
dependencies...
(15.3s)
Package
operations:
43
installs,
0
updates,
0
removals
• Installing
rpds-py(0.9.2)
•
Installing
sniffio(1.3.0)
•
Installing
anyio
(3.7.1)
•
Installing
h11
(0.14.0)
•
Installing
referencing(0.30.2)
•
Installing
filelock(3.12.2)
•
Installing
fsspec
(2023.6.0)
•
Installing
httpcore(0.17.3)
...........................
• Our gradio frontend will take user inputs about the protocol documents, and use Langchain to retrieve the relevant
chunks of information from the Pinecone vector store, pass the query and the retrieved chunks as context to the LLM.
Therefore, we will need to provide the relevant API keys to the gradio application. In this example, we are running the
frontend application locally. However, this application could be containerized and deployed on a AWS Elastic
Kubernetes Service cluster and scaled using Application Load Balancers.
• For our local deployment, we will use the python-dotenv package to provide the API keys to the application. Create a
folder 'frontend' in your project directory and add the app.py and .env files.
load_dotenv()
inecone_api_key
p =
os.getenv(
"PINECONE_API_KEY"
)
pinecone_env
=
os.getenv(
"PINECONE_API_ENV"
)
pinecone_index_name
=
os.getenv(
"PINECONE_INDEX_NAME"
)
openai_api_key
=
os.getenv(
"OPENAI_API_KEY"
)
inecone.init(api_key=pinecone_api_key,
p environment=pinecone_env)
embeddings
=
OpenAIEmbeddings(openai_api_key=openai_api_key)
def
predict(message,history):
# Initialize the OpenAI module, load and run the Retrieval Q&A chain
vectordb
=Pinecone.from_existing_index(index_name=pinecone_index_name,
embedding=embeddings)
retriever
=
vectordb.as_retriever()
lm
l =
OpenAI(temperature=0
,
openai_api_key=openai_api_key)
qa
=RetrievalQA.from_chain_type(llm, chain_type=
"stuff"
,
retriever=retriever)
response
=
qa.run(message)
return
response
r.ChatInterface(predict,
g
title=
"Clinical Trials Q&A Bot"
,
description=
"Ask questions about Clinical Trial protocol documents..."
,
).launch()
TEST
Unset
gradio app.py
• This should bring up a chatbot interface on the following local url: http://127.0.0.1:7860/. Now, you can ask very specific
questions and have the LLM respond with information from the protocol documents using the Retrieval Augmented
Generation (RAG) approach..
pwc.com