Data Science Toc Srinivas
Data Science Toc Srinivas
Data Science Toc Srinivas
Pre-requisites
1. Intermediate level expertise with Python
2. Basic idea of the Python data ecosystem
3. Some background on file formats and Relational Databases
Lab requirements
1. 1:1 or 2:1 participant-machine ratio for hands-on and exercises
Agenda
Python refresher
• The Python interpreter
• Python Data Types
• Data and type introspection basics
• Control structures
• Functions
• Classes
• Errors and exceptions
• Regular expressions
Class basics
• __init__
• self
• private vs public convention
• magic functions
• object creation
• type of objects
• inheritance, multiple inheritance
Numpy
• Why numpy?
• Comparison on memory and run-time with native lists
• Numpy arrays
• Multi-dim arrays
• Mapped operation on numpy arrays
• Filtering
Pandas
• DataFrames
• Series
• Indexes
• Inherited operations from numpy arrays
• from_* methods for reading file formats
• Selecting columns with [] and .
• Filtering
• value_counts()
• group_by() and aggregation functions
• sort_index() and sort_values() to speed-up lookups
• pivoting/unstacking
• Merging dataframes
• Appending
• .loc[] and .iloc[] based lookup
• Working with dates
• Timeseries
• Real examples to try all these operations
• Text Processing
• Lemmatization
• Parts of Speech Tagging
• Named Entity Recognition
• Word Embeddings
• Ngrams
• Tf-IDF
• Text Classification