Lec1 24th Nov
Lec1 24th Nov
WILP
AMIL CZG516
ML System Optimization
Murali Parameswaran
[email protected]
BITS Pilani
Pilani Campus
Distributed Computing
AIML CLZG516
ML System Optimization
Session 1
Course Introduction
Sec-1: Prof. Madhusudhanan B {[email protected]}
Sec-2: Prof. Murali P {[email protected]} (I-C)
Lifecycle of New Technologies
Engineering
Manufacturing
Commerce
Commodity
Marketing
Hype
Scientist
Research
Lifecycle of AI/ML – Where are we ?
ChatGPT?
Engineering
Manufacturing
Commerce
Commodity
Marketing
Hype
Scientist
Research
Machine Learning – Enterprise Practice
• AI and Machine Learning is becoming central to organizations:
• No longer a one-off activity
• Multiple problems / perspectives addressed through ML
• Multiple ML solutions deployed
• ML is becoming a continual activity:
• Data change; Context changes
• Drift in the solution
• Problems change; Requirements change;
• New model(s) required
• World changes; Expectations change
• Performance and Standardization critical ==>
• Packaging vs. Pricing
8
Operationalizing AI/ML
Deployment
Develop
Deploy, and Infer
Operationalizing AI/ML
Deployment
Internal Compliance
Regulatory Compliance
Operationalizing AI/ML
Deployment
ML is part of an application
Hardware/Platform to use
Operationalizing AI/ML
Deployment
ML is part of an application
Hardware/Platform to use
Will our model fit in the pipeline?
Operationalizing AI/ML
Software pipelining:
Test whether model is working within pipeline
Whether inputs come in the same form
Deployment
Whether inputs come in the order needed for your model to respond to
Whether responses are being consumed appropriately
Whether responses are to consumed one at a time, or a sequence of
responses have to be consumed
May need to return to training
Operationalizing AI/ML after validation/testing.
Deployment
Machine Learning – Enterprise Practice-Recap
• AI and Machine Learning is becoming central to organizations:
• No longer a one-off activity
• Multiple problems / perspectives addressed through ML
• Multiple ML solutions deployed
• ML is becoming a continual activity:
• Data change; Context changes
• Drift in the solution
• Problems change; Requirements change;
• New model(s) required
• World changes; Expectations change
• Performance and Standardization critical ==>
• Packaging vs. Pricing
• Depends on competition
• Cost for model inference
16
Operationalizing AI/ML
• Example
• E.g. SVM has a time complexity between O(d*N2) and (d*N3) where
• d is the number of dimensions (of the data points) and
• N is the number of data points
• For a large dataset N, say, N = 109 and d=5 this could be costly:
• Assuming 2 simple arithmetic operations per data point:
• this amounts to at least 1019 (=5*2*109*109) operations
• Given a 2.5 GHz processor, i.e. 0.4ns clock cycle
• and 1 CPI (i.e. cycles per instruction), a measure of processor
throughput
• [simplistic but close to reality!]
• 1019 operations will take close to 5.3 years
• Reducing running time during training is a big focus in this course!
Reducing running time
• Typical methods:
• Parallelize or distribute computation:
• Multi-threaded programming on multi-core processors
• Massively multi-threaded programming on many-core GPGPUs
• Distributed Programming on Scale-out Clusters of CPUs or GPUs
• Hand-tuning or compiler-performed code optimization
• Rewritten for parallelism or generated by compilers
• Process = Program + Address Space (at run time)
• Threads share address space:
• Each thread gets its own call stack
• Heap and global area are shared by all threads
• Threads run on a shared memory model (e.g., multi-core, many-core
processors)
• Distributed programming is on Distributed memory ie. Memory of multiple
computers (Processor+memory+disk+OS)
Cost during training
• Megatron-Turing NLG:
• 530 billion parameters
• Microsoft and Nvidia claim to have used hundreds of DGX A100
servers
• Each server costs ~200,000 $
• Add the networking cost, the infrastructure cost is ~100M$
• Each server consumes 6.5kW of power
• Add a comparable cooling cost!
• We will NOT do much about power consumption in this course!
• But we will look at reducing model size as an important aspect!
Sizes of NLP models over the years
Model Size
• LLMs (Large Language Models) like GPT-4 and Bard are notoriously
large.
• But there are systematic approaches to reduce model size
• Without compromising the accuracy too much.
• We will look at model compression in this course!
Cost during Deployment
• Ans.
• Part (A) If the model server is parallel multiple threads or processes could
respond in parallel thereby improving response time and throughput.