Why Machine Learning Must Be Automated
Digital transformation trends have resulted in a data explosion across industries. Enterprises are increasingly
leveraging Artificial Intelligence (AI) and Machine Learning (ML) to identify trends, harness insights based on data, and
make critical business decisions to gain a competitive advantage in the market.

Even though the data deluge is starting to transform business and everyday life, data science today is extremely
labor-intensive. Building and deploying scalable ML models requires expert data scientists who are very scarce and
expensive. Most often, these data scientists spend the majority of their time on repetitive tasks. As a result, ML-driven
business transformation projects often get delayed and value realization becomes a challenge.

Coding Statistics Visualization Industry-specific
Techniques Knowledge

Deep Learning Machine Programming Analytical
Frameworks Learning Languages Tools

Repetitive Data Feature Hyperparameter
Tasks Transformations Engineering Tuning

Figure 1. Challenges in Implementing Machine Learning at Scale

Data Science Automation with Infosys NiaTM Advanced ML Platform

Infosys Nia Advanced ML platform applies automation to the data science workflow, increasing data scientists’
productivity by orders of magnitude.
Infosys Nia Advanced Machine Learning Platform

Data Transform Machine Learning Machine Learning

Data Exploratory Data & Feature Engineering Model Building Output External
Sources Analytics (25-40% of overall effort) (20-30% of overall effort) (5-15% of overall effort) Interfaces
Partially Automatic Fully Automatic Can be Automated Applications

Integrate Data Impute & Transform Select ML Algorithm Predict & Evaluate
Systems &
Analyze Data Engineer Features Tune Hyperparameters Run Model Diagnostic
Warehouses Visualize Data Select Features Train - Tune - Test Deploy Model Down Stream
Infosys Nia Data/
Model Management Experiment Management Automatic Audit Trail
Third-Party Data
Discovery &
Visualization Tools Flexible Delivery (On-Cloud/On-Premise)

Figure 2. Data Science Automation and Optimization with Infosys Nia

The Advanced ML workbench in the Infosys platform optimizes the most effort-intensive and critical tasks in a data
science project. It partially automates the process of identification, selection, and preparation of statistically viable
datasets (feature engineering and feature selection), and provides interactive and extensible capabilities that allow
experts as well as novices to build big data transformation pipelines. The platform automatically selects the best-fit
model for a given dataset and optimally tunes hyperparameters specific to an ML algorithm.

Figure 3. Data Transform Snippets Figure 4. Auto Model Tuning Results

Enables Multiple User Profiles to Solve a Variety of Business Problems

Typically, enterprises tend to have skilled information and data analysts. By simplifying several activities, this
platform allows data analysts, developers, and even business users with limited knowledge of data science to build
accurate and high-performing ML models. Given the scarcity of data scientists, the platform enables all such user
profiles to discover deep analytical insights, predict trends, make recommendations, and identify new business

Automation empowers data scientists to avoid repetitive tasks and spend quality time on critical tasks such as
understanding domain and business pain points, data enrichment, formulating hypotheses, and analyzing results.

Reduces Significantly Rationalizes Augments Optimizes Enhances

variability in reduces costs and business business customer
the data turnaround boosts efficiency and processes and satisfaction
science time from data workforce increases ROI enables better
process to business productivity decision
insights and skill making

Figure 5. Business Benefits

Speed and Scale of Predictive Accuracy with Auditability and Reproducibility

Critical ML algorithms in the Infosys platform are developed from the ground up to support distributed, in-
memory, complex mathematical optimizations. The platform has demonstrated superior performance vis-à-vis
several peer systems. Moreover, it addresses speed as well as scalability requirements of distributed systems for ML
training datasets with trillions of elements. Overall, it provides an end-to-end framework for data science projects
and experiments, complete with audit trail and reports.

Figure 6. Projects Dashboard Figure 7. Data Science Experiment Audit Trail

Success Stories

Real-time Fraud Detection

A global financial services company wanted to block fraudulent transactions and notify customers immediately.
Infosys Nia Advanced ML platform ingested historical data to build and deploy a fully non-linear, supervised ML
model that detected fraud in real-time. The model assigned fraud probability to incoming transactions and identified
variables that were significant in predicting fraud.
With Infosys Nia Advanced ML platform, the client experienced:

Improved customer Continuous

US$50Mn+ improvement of Root-cause
Annual satisfaction with 10% analysis of fraud
increase in accuracy the fraud detection
cost-savings model with full audit trail
of predictions

Monetizing Content Delivery with Accurate Recommendations

A leading handset manufacturer wanted to launch a mobile video streaming service and generate additional revenue
via sale/rental of movies. The sub-optimal speed and accuracy of the legacy recommendation engine prevented
the large team of in-house modellers from realizing business goals. Infosys Nia Advanced ML model enhanced the
recommendation system by understanding and interpreting user preferences accurately.
With Infosys Nia Advanced ML platform, the client experienced:
20% 1500x
More accurate Diversity in
Improvement in Improvement in
market preference content
relevance of speed of
understanding recommendation
recommendations execution

Key Differentiators

Auto Model – Automatic method selection combined with smart search through the algorithm hyperparameter space.
Data Science Automatic feature engineering – Automatic feature generation and feature selection.
Automation Automatic audit trail – End-to-end data science workflow experiment history with interactive graph visualization for repeatability and transparency.
Automatic reports – Model and result insights and interpretability.
Automatic distributed computing – Low-level resource estimation and job management.

End-to-end framework for data science – Integrated enterprise framework for data preparation, modelling, deployment, and reports.
Supports all user roles – GUI is designed for data scientists, data analysts, developers, and business users.
Ease of Multiple interfaces – GUI supports total automation without having to write any code, while the SDK (Python Notebook), API, and CLI are meant for
Use experts or technical users.
Model diagnostics – Variable importance charts and partial dependence plots to better interpret predictive models.
Model performance monitoring – Supports multiple metric options to measure model accuracy.

Supports a broad set of ML methods- Supervised regression and classification (gradient boosted trees, ensemble gradient boosted trees, random
decision forests, generalized linear models, and support vector machines), recommendation (collaborative filtering), unsupervised learning (k-means,
kernel density estimation, and singular value decomposition), as well as deep learning neural networks.
Speed and scalability – Proprietary ML algorithms are optimized for speed, scale, and predictive accuracy. Proven performance in a number of use
Flexibility and cases across business verticals.
Performance Real-time data transforms – Transform snippets based on Python and Spark/PySpark support in-memory distributed data transforms.
Flexible model operationalization – Infosys Nia Prediction Server for streaming or batch predictions, Infosys Nia Evaluator JAR for predictions in Java
environments and PMML export for third-party applications.
Flexible delivery – Supports all major Hadoop distributions and can be deployed on-premise or on-cloud. Self-service provisioning with elastic
scaling for cloud deployments.

