Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Anaconda’s Guide

to Open-Source
Tools and Libraries for Enterprise Data Science
and Machine Learning
What’s Inside

3..........Introduction

4..........Fundamental Data Science Tools and Libraries

7..........Machine Learning

10........Data Visualization

13........Image Processing

16........Scalable Computing

19........Data Preparation / ETL

21........Natural Language Processing (NLP)

24.......Looking Ahead: AI Frontiers

28.......How Can I Manage Open Source Enterprise?

29.......About Anaconda Enterprise

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 2
Open-source collaboration has led to some of the most innovative and
advanced technologies of our time. These are data science and machine
learning tools and libraries that equip data scientists in every industry, including
engineering, manufacturing, cybersecurity, medicine, genetics, and astronomy.
Open-source technologies empower organizations to do breakthrough data
science and create differentiating AI and machine learning technologies.

Python is the most commonly used and most recommended language for data
science and machine learning, which is why many of the open-source tools and
libraries are built for Python. It is also growing in popularity among developers
-- it is currently the second most popular language on GitHub. As Python
becomes a common language between developers and data scientists, getting
machine learning models and applications through production becomes more
efficient. All of the tools listed in this guide are compatible with Python.

There are thousands of open-source data science and machine learning


packages. This guide focuses on a common set of tools that cover most
fundamental tasks in the realm of data science and machine learning. We also
touch on a few tools to take ML and data science to the next level as well as
cutting-edge tools that are at the forefront of solving are the next great
challenges in AI.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 3
Fundamental Data Science
Tools and Libraries
This collection of open-source Python tools and
libraries consists of very popular packages that are
frequently used together to do data science. The
fundamental tools are not only essential and powerful
for individual practitioners, but they are also essential
for doing enterprise data science with Python. Many
other tools and libraries in the Python data science
and ML ecosystem are dependent upon these
fundamental packages.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 4
WHAT IT IS: WHAT IT IS:
Jupyter is an open-source project created to A library for tabular data structures, data analysis,
support interactive data science and scientific and data modeling tools, including built-in
computing across programming languages. plotting using Matplotlib.
Jupyter offers a web-based environment for
working with notebooks containing code, data, WHAT IT’S USED FOR:

and text. Jupyter notebooks are the standard Data manipulation and indexing, reshaping and

workspace for most Python data scientists. pivoting of data sets, label-based slicing and
alignment, high-performance merging and
WHAT IT’S USED FOR: joining of data sets, and time series data analysis.
Jupyter notebooks are used to create and share Pandas includes efficient methods for reading
live code, equations, visualizations and text. It has and writing a wide variety of data, including CSV
become the tool of choice for presenting data files, Excel sheets, and SQL queries.
science projects.
PROJECTS:
PROJECTS: Many companies have found that pandas is easy
Jupyter is used by Google, Microsoft, IBM, to use across teams and boosts productivity for
Bloomberg, NASA, and many other companies data analysis. For example, Appnexus uses
and universities. It is safe to say that if an pandas across their engineering, mathematician,
organization has data scientists working in and analyst teams. Datadog uses pandas to
Python, they use Jupyter notebooks. process time series data on their production
servers. It’s safe to say, if a company is doing data
MORE INFORMATION:
science, they are using Pandas.
jupyter.org
LEARN MORE:

https://pandas.pydata.org/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 5
WHAT IT IS: WHAT IT IS:

The SciPy library consists of a specific set of A core package for scientific computing with
fundamental scientific and numerical tools for Python. Numpy enables array formation and
Python that data scientists use to build their own basic operations with arrays.
tools and programs, not to be confused with the
SciPy community and the SciPy conference, WHAT IT’S USED FOR:

which include anyone working on scientific Numpy is used for indexing and sorting but

computing with Python. can also be used for linear algebra and other
operations. SciPy is more fully featured when
WHAT IT’S USED FOR: it comes to algebra modules and numerical
Routines for numerical integration, interpolation, algorithms. Many other data-science libraries
linear algebra, and statistics. for Python are built on NumPy internally,
including Pandas and SciPy.
PROJECTS:

SciPy is used by Instacart, WalMart, and Vital PROJECTS:


Labs, among others. Vital Labs uses SciPy to Numpy is used by Instacart, Walmart, and
power their analytics tools. Vital Labs for data analysis. It is also used as
a foundation in most other Python
LEARN MORE: data-science packages.
https://www.scipy.org/about.html
LEARN MORE:

https://numpy.org/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 6
Machine Learning
Machine learning (ML) is a discipline within AI that involves
developing and studying algorithms and models machines
use to learn and perform tasks without being explicitly
programmed to do so. Deep learning is a subfield of
ML that involves processing with neural networks and
high-performance computing. These are three of the
most popular open-source machine learning
technologies.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 7
WHAT IT IS: WHAT IT IS:
TensorFlow is an open-source deep learning An open-source deep learning framework that
platform from Google that includes an consists of fundamental tools and libraries for
ecosystem of tools and libraries that enable the Python AI and machine learning development.
building and deployment of AI and deep learning
applications. Keras is a high-level API used to WHAT IT’S USED FOR:

build and train deep learning models, originally To build and train deep learning models, such

as a separate library but now included with as CNNs and GANs. A rich ecosystem of libraries

TensorFlow. extends the capabilities of PyTorch for natural


language processing and computer vision.
WHAT IT’S USED FOR:

TensorFlow and Keras are used together to PROJECTS:

efficiently build, train, and deploy deep learning Salesforce, among many others, uses

models, such as convolutional neural networks PyTorch for natural language processing

(CNNs) and generative adversarial networks and multi-task learning.

(GANs). If you download the latest version of


LEARN MORE:
TensorFlow, Keras is included.
https://pytorch.org/

PROJECTS:

Airbnb uses Tensorflow to classify images and


detect objects at scale. Airbus uses TensorFlow to
extract information from satellite images, and
Twitter used Tensorflow to create their ranked
timeline, which shows users the most important
tweets first.

LEARN MORE:

https://www.tensorflow.org/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 8
WHAT IT IS:

A powerful and versatile machine learning library


for machine learning basics like classification,
regression, and clustering. It includes both
supervised and unsupervised ML algorithms with
important functions like cross-validation and
feature extraction. Scikit-learn is the most
frequently downloaded machine learning library.

WHAT IT’S USED FOR:

Efficient for predictive analytics and building


machine learning models with Python. It also
includes tools that make it easy to include deep-
learning models in a scikit-learn pipeline.

PROJECTS:

Booking.com and Spotify use scikit-learn for their


recommendation engines. Spotify has said
scikit-learn is the “most well-designed ML
package we’ve seen so far.” J.P. Morgan uses it
for predictive analytics, and MARS for supply
chain management.

LEARN MORE:

https://scikit-learn.org/stable/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 9
Data Visualization
Data visualization is essential to data exploration,
analysis, and communication, allowing data scientists
to understand their data and share that understanding
with others. Python has many, many viz tools available
(see pyviz.org/tools.html for a complete list), but
we will highlight a few here.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 10
Bokeh & Plotly
WHAT IT IS: WHAT THEY ARE:
Matplotlib is the most well-established Python Popular and powerful browser-based
data visualization tool, focusing primarily on visualization libraries that let you create
two-dimensional plots (line charts, bar charts, interactive, JavaScript-based plots
scatter plots, histograms, and many others). It from Python.
works with many GUI interfaces and file formats,
but has relatively limited interactive support in WHAT THEY ARE USED FOR:

web browsers. Bokeh and Plotly create not just static plots,
but interactive visualizations with panning,
WHAT IT’S USED FOR: zooming, linking between plots, and other
Matplotlib is used to analyze, explore, and show features that let you work in Python but use
relationships between data. the power of modern web technologies to
share your results widely.
PROJECTS:

Nearly every company with data scientists is PROJECTS:


using Matplotlib somewhere, whether directly, Thousands of web sites are built on these
or often via Pandas or the high-level interfaces tools,either directly or using the higher level
made for data scientists like Seaborn, HoloViews, interfaces hvPlot, HoloViews, or Chartify (for
or plotnine. Matplotlib and other open-source Bokeh) or Cufflinks and plotly_express (for Plotly).
Python tools were used to create the first
image of a black hole in the Event Horizon LEARN MORE:

Telescope project. https://bokeh.org and https://plot.ly/python

LEARN MORE:

https://matplotlib.org

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 11
Data Visualization

Panel / Voila / Streamlit / Dash

WHAT THEY ARE: WHAT IT IS:

Python frameworks for building custom HoloViz is an Anaconda project to simplify


visualization-rich apps and dashboards for and improve Python-based visualization by
the web. adding high-performance server-side rendering
(Datashader), simple plug-in replacement for
WHAT THEY ARE USED FOR: static visualizations with interactive Bokeh-
Using Python to create custom applications with based plots (hvPlot), and declarative high-level
live plots, widgets, and other controls to share interfaces for building large and complex
running applications on the web, backed with systems (HoloViews and Param).
the power of Python. Each toolkit has its own
focus and strengths: Panel (simple, Pythonic WHAT IT’S USED FOR:

code, easily transitioning from Jupyter to The HoloViz project provides extensive free
standalone servers), Voila (directly serving tutorials showing how to use these tools for
Jupyter notebooks), Streamlit (apps from Python working with billions of data points interactively,
scripts), Dash (direct control over HTML/CSS for constructing plots and dashboards from a
styling, stateless deployment). few lines of Python code, and for working with
streaming, geographic, network, or other more
PROJECTS: complex types of data.
The best way to see what projects are possible
with these tools is to see the examples at PROJECTS:

awesome-panel.org, voila-gallery.org, awesome- See demos and tutorials for the many types
streamlit.org, and dash-gallery.plotly.host of visualizations possible with HoloViz at
http://holoviews.org/gallery/index.html.
LEARN MORE:

panel.holoviz.org, voila.readthedocs.io, LEARN MORE:

www.streamlit.io, and plot.ly/dash holoviz.org

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 12
Image Processing
Advances in computing and data-storage hardware
have made it practical to move beyond simple text and
numeric data types into images, sounds, movies, and
live sensors. Image processing tools enable data
scientists and engineers to build and train models for
AI, such as robots, process sounds and images for
predictive maintenance in factories, and many other
applications that require image processing from
cameras or image files.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 13
PIL/Pillow

WHAT IT IS: WHAT IT IS:


Pillow (a “friendly fork” of the older PIL library) Scikit-Image is an open-source Python package
is a Python imaging library and a general image containing a collection of image-processing
processing tool with support for opening, algorithms, including segmentation, geometric
manipulating, and saving images in many transformations, color space manipulation, and
different file formats. feature detection. It uses NumPy arrays as
image objects.
WHAT IT’S USED FOR:

Data preparation for image training and basic WHAT IT’S USED FOR:
image manipulation. scikit-image is used for processing large volumes
of images, and it is commonly used for scientific
PROJECTS:
applications ranging from biomedical imaging
Data scientists, analysts, and others in banking, to astronomy.
finance and health care industries have used
Pillow for image manipulation. PROJECTS:

INRIA has used scikit-image for


LEARN MORE:
neuroimaging and computer vision to
https://pillow.readthedocs.io/ support leading-edge research.

LEARN MORE:

https://scikit-image.org

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 14
Image Processing

WHAT IT IS:

An open-source library of programming


functions for real-time computer vision with
C++, Java, Python and MATLAB interfaces.

WHAT IT’S USED FOR:

OpenCV is the most commonly used library for


robotics. It’s also used for face tracking and
detection and image processing and recognition.
OpenCV has been used to build intrusion
detection and monitoring tools and to help
robots navigate and identify objects.

PROJECTS:

OpenCV is used by Google, Yahoo, Microsoft,


Intel, Honda, Toyota for computer vision. Famous
projects based on OpenCV include the Robot
Operating System and Integrating Vision Toolkit.

LEARN MORE:

https://opencv.org/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 15
Scalable Computing
Scalable computing, including distributed and parallel
computing, speeds up analysis, model training and
performance. It enables multiple tasks and calculations to
be performed simultaneously across computers or
processors. These packages can be used as boosters for
many Python data science and machine learning tasks.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 16
WHAT IT IS: WHAT IT IS:
Numba is a high-performance Python compiler. Dask is a Python package used to scale
It makes Python faster and optimizes the NumPy workflows with parallel processing to
performance of Numpy arrays, reaching the enable multi-dimensional data analysis, enabling
speed of FORTRAN and C without a compiler. users to store and process data larger than
their computer’s RAM. Dask can scale out to
WHAT IT’S USED FOR:
clusters, or scale down to a single computer.
Accelerating Python functions and
Dask mimics the pandas and NumPy API,
parallelizing algorithms for GPUs and CPUs,
making it more intuitive for Python data
such as in Datashader.
scientists than Apache Spark.

PROJECTS:
WHAT IT’S USED FOR:
Datashader, a data visualization tool, uses
Dask is used to accelerate processing in a variety
Numba for acceleration. Fortune 100 finance
of fields, including research in Earth science,
firms have used it for financial modeling, and it is
satellite imagery, and genomics. It is also used in
also commonly used for building simulations.
business and engineering. For example, it is used
Numba was also used, among other tools, in the
to increase efficiency in cashflow model
Xenon1T experiment to detect dark matter.
management systems and civic modeling.

LEARN MORE:
PROJECTS:
http://numba.pydata.org/
With implementations of Dask, Capital One
reduced model training times by 91%. Other
organizations have used Dask for genome
sequencing, cashflow modeling systems, satellite
imagery processing.

LEARN MORE:

https://stories.dask.org/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 17
Scalable Computing

WHAT IT IS: WHAT IT IS:

RAPIDS is basically a tool for running Pandas, A fault-tolerant cluster computing framework
Scikit-Learn, and NetworkX (graph analytics and interface for programming clusters launched
library) on GPUs. It also integrates with some by UC Berkeley. Developed for Java/Hadoop
deep learning libraries ecosystem but with support for Python. PySpark
is the Python API for Spark.
WHAT IT’S USED FOR:

Accelerating data science and analytics pipelines WHAT IT’S USED FOR:
by utilizing GPUs. Spark is a multi-purpose tool that can be used
for data preparation and processing as well as
PROJECTS: training ML algorithms. Spark is great for
Capital One uses Rapids in conjunction with managing data streams in real time and
Dask to speed up their data science workflows interactive analytics through interactive queries.
and scale on GPUs. They also find that former
SAS users and other data scientists because they PROJECTS:
do not have to learn Spark or Java to be effective. Spark is used by a wide variety of companies.
eBay uses Apache Spark for log transaction
LEARN MORE: aggregation and analytics. MyFitnessPal uses
https://rapids.ai/about.html Spark to clean up users’ data and to build
recommendation engines for foods and recipes.

LEARN MORE:

https://spark.apache.org/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 18
Data Preparation / ETL
Data preparation is a prerequisite to doing data analysis,
data science and machine learning, and it can also be
the most rigorous and time-consuming part of the
whole process. Most data-science workflows initially
use custom Pandas and other data-manipulation code,
but these data preparation / ETL (extract, transform,
and load) tools help automate the process to make
data preparation more efficient in production for
companies and large organizations.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 19
WHAT IT IS: WHAT IT IS:

An open-source workflow automation tool by A data ingest/loading library for a wide variety of
Apache for creating data workflows, scheduling file formats and data services, with hierarchical
tasks and monitoring results. It integrates with cataloguing, searching, and interactivity with
multiple cloud providers, including AWS, Azure, remote storage platforms under a single interface.
and Google Cloud.
WHAT IT’S USED FOR:

WHAT IT’S USED FOR: Intake lets an organization catalog data of all
Airflow is used to manage and automate data types, including fitted model descriptions, images,
pipelines for use in data analysis and machine and unstructured log entries, so Python data
learning models. scientists can then focus on their analyses rather
than boilerplate I/O code. Catalogs are text files
PROJECTS: that can easily be shared with others and reused
Airflow was created by developers from Airbnb between projects.
for managing big data pipelines from multiple
sources. Currently used for data pipeline PROJECTS:
management by Airbnb, Slack, Walmart, Lyft Intake is currently used by Zillow, NASA, and USGS
and Hello Fresh among others. to catalog data of many types for use in Python.

LEARN MORE: LEARN MORE:

https://airflow.apache.org/ https://intake.readthedocs.io

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 20
Natural Language Processing (NLP)
Natural Language Processing (NLP) involves programming
machines to parse and understand human language
and to interact with humans through both written and
spoken language. The field of NLP includes speech
recognition, language generation, document analysis,
and information retrieval.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 21
NLTK

WHAT IT IS: WHAT IT IS:


An open-source Python natural language toolkit A Python library for topic modeling, document
for symbolic and statistical NLP. It includes a suite indexing, and similarity retrieval for large bodies
of text processing libraries for classification, of text with efficient multicore implementations
tokenization, stemming, tagging, parsing, and of NLP algorithms.
semantic reasoning in multiple languages.
WHAT IT’S USED FOR:
WHAT IT’S USED FOR: Gensim is great for the efficient analysis of large
NLTK is used to process human language via bodies of text and extraction of semantic topics.
tokenization, parsing, classification, and semantic
reasoning. PROJECTS:

Companies use Gensim to search for relevant


PROJECTS: information and themes in large bodies of text.
NLTK has been used to analyze large bodies of For example, DynAdmic, an online video
text in academic research projects in a variety of advertising company, uses Genism to curate
fields, including software engineering studies, digital video ads. Tailwind, an app for scheduling
cinemetrics, communications, and sociology. Pinterest and Instagram posts, uses it to help
customers post relevant content. Sports
LEARN MORE:
Authority uses this tool to analyze text fields from
https://www.nltk.org/ customer surveys and social media commentary.

LEARN MORE:

https://pypi.org/project/gensim/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 22
WHAT IT IS:

spaCy is an open-source Python library for


NLP and one of the fastest, if not the fastest,
syntactic parser.

WHAT IT’S USED FOR:

spaCy is used for a wide variety of NLP tasks,


especially for large-scale information extraction
tasks. It is also used to prepare text for deep
learning and is interoperable with TensorFlow,
PyTorch, and scikit-learn.

PROJECTS:

spaCy is used by Airbnb, Uber, Stitch Fix, Quora,


and many other organizations. Quill used spaCy
to develop a free online tool that helps students
improve their grammar and writing. It has also
been used for quote extraction and attribution.

LEARN MORE:

https://spacy.io/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 23
Looking Ahead: AI Frontiers
As machine learning technologies advance, AI solutions
will become more and more sophisticated. At the core of
this evolution are questions about fairness and
interpretability. As AI uses data to make more impactful
decisions that change people’s lives (such as hiring,
recidivism, and credit approval), humans must ensure that
these decisions are as fair as possible and that they are
explainable to those who are affected. For AI to advance,
portability and interoperability are also a problem.
Those who work on AI models need to be able to move
them between platforms with ease instead of having to
rebuild and re-code. Here are a few tools on the cutting
edge of solving these problems.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 24
FairLearn

WHAT IT IS: WHAT IT IS:

An open neural network exchange making A burgeoning project by open-source developers


machine learning models portable between at Microsoft. FairLearn is a Python package for
frameworks and platforms. Microsoft and assessing fairness and mitigating unfairness in ML
Facebook started this community in 2017 to models and AI systems.
create an open ecosystem for interchangeable
WHAT IT’S USED FOR:
models.
Evaluating fairness of AI/ML models and training
WHAT IT’S USED FOR: data and for mitigating bias in models
Interoperability and portability. The exchange determined to be unfair.
enables data scientists and developers to
PROJECTS:
move AI models between tools and platforms,
which saves a significant amount of time Because this project is fairly new, not many

and headaches in the process of operationalizing companies have published case studies or

models. It is also commonly used for overviews of their use of the tool. One example

serving models. project provided is the mitigation of racial


disparities in ranking of law school applicants.
PROJECTS:
LEARN MORE:
ONNX is used and supported by AMD, AWS, HP,
IBM, Intel, NVIDIA, and other companies on the https://github.com/fairlearn/fairlearn/blob/

cutting edge of AI/ML. master/README.md

LEARN MORE:

https://onnx.ai/

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 25
Looking Ahead: AI Frontiers

AI Fairness 360 (AIF360) InterpretML

WHAT IT IS: WHAT IT IS:

A comprehensive open-source Python toolkit of An open-source Python package that makes it


metrics that checks for and measures bias in easy to compare algorithms for interpretability.
datasets and ML models. It also included It provides a “scikit-learn style uniform API” and
algorithms to mitigate bias. This toolkit was includes an interactive visualization platform and
developed by IBM’s open-source team. dashboard so data scientists can compare
algorithms with ease.
WHAT IT’S USED FOR:

Similar to FairLearn, it’s used for evaluating WHAT IT’S USED FOR:
fairness of AI/ML models and training data and InterpretML is used to explain any existing “black
mitigating bias in current models. box” model (models with means of making
decisions that are incomprehensible to humans),
PROJECTS: and it can also be used to train new models that
|AI Fairness 360 has been used to detect bias in are designed to be interpretable, “glass box”
credit scoring algorithms and to mitigate racial models (models explainable to humans).
bias in healthcare utilization scoring.
PROJECTS:
LEARN MORE: InterpretML was started by open-source
https://aif360.mybluemix.net/ developers at Microsoft, and it has been used
to make credit fraud, churn, and medical
prediction models more interpretable.

LEARN MORE:

https://github.com/interpretml/interpret

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 26
LIME

WHAT IT IS:

LIME is a PyPI package and a model-agnostic HOW DO I START USING


interpretability tool. LIME explains individual ALL THESE TOOLS?
predictions for text classifiers that act on tables
All of these libraries and packages can be
or images. Support for scikit-learn classifiers is downloaded individually with pip, but more
built into the tool. than 250 of the most commonly used open-
source data science and machine learning
packages are automatically installed when
WHAT IT’S USED FOR:
you download the Anaconda Distribution,
Lime is used to help data scientists understand and many others can be installed by simply
and explain the decisions of black box models typing conda install [package-name].
with two or more classes. It works by perturbing Anaconda Distribution is an installer and
package management system and the
data samples to understand how this changes
easiest and most efficient way to perform
the model’s predictions, narrowing down the Python/R data science and machine learning
logic that was used to make a particular decision. on Linux, Windows, and Mac OS X. It updates
packages and their dependencies and also
PROJECTS:
creates, saves, loads, and switches between
environments on your computer.
LIME has been used by both academic and
corporate data scientists to understand model
decision-making. The main contributor to LIME
Learn more
is a researcher at Microsoft.
https://www.anaconda.com/distribution/
LEARN MORE:

https://github.com/marcotcr/lime

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 27
How Can I Manage Open
Source in the Enterprise?

While Anaconda Distribution is perfect for individual practitioners, it is not well-equipped


for package management or collaboration at the enterprise level. With Anaconda Team
Edition, companies can mirror Anaconda’s powerful repository onto corporate
infrastructure for control over availability, reporting on common vulnerabilities and
exposures (CVEs), user access control, license type control, and private and shared
channels for package management. Know who’s using what packages in which models
and blacklist or whitelist packages as needed.

Another option is to manage open-source packages with our end-to-end machine


learning platform. Anaconda Enterprise combines package management, collaboration
on projects via Jupyter notebooks, governance and one-click deployment for a full-
featured data science and machine learning platform that meets enterprise requirements.

Guide to Open-Source Tools and Libraries for Enterprise Data Science and Machine Learning 28
About Anaconda
With more than 20 million users, Anaconda is the world’s most popular data
science platform and the foundation of modern machine learning. We
pioneered the use of Python for data science, champion its vibrant community,
and continue to steward open-source projects that make tomorrow’s
innovations possible. Our enterprise-grade solutions enable corporate,
research, and academic institutions around the world to harness the power of
open-source for competitive advantage, groundbreaking research, and a better
world.

Visit https://www.anaconda.com to learn more.

You might also like