@DataScience - Ir - 111 Essential Concepts For Data Scientists
@DataScience - Ir - 111 Essential Concepts For Data Scientists
@DataScience - Ir - 111 Essential Concepts For Data Scientists
1- Data Loading:
Use pandas to load data from different formats like CSV, Excel, SQL
databases, etc.
2- Data Cleaning:
3- Data Wrangling:
4- Data Visualization:
Using libraries like Matplotlib and Seaborn to create graphs and charts to
understand data patterns and relationships.
5- Descriptive Statistics:
6- Inferential Statistics:
7- Probability:
9- Bayesian Thinking:
A powerful tool for string manipulation, data cleaning, and text analysis.
Using libraries like BeautifulSoup and Scrapy for extracting data from web
pages.
Understanding of data interchange formats like JSON and XML for handling data
from APIs.
Understanding of big data concepts and platforms like Hadoop and Spark for
processing large-scale datasets.
Familiarity with cloud platforms like AWS, GCP, or Azure for data storage,
processing, and machine learning tasks.
Techniques for creating and managing Docker containers for reproducible data
science environments.
24- Calculus:
42- Autoencoders:
Techniques for cleaning and processing text data for machine learning tasks.
47- TF-IDF:
A type of statistical model for discovering the abstract "topics" that occur
in a collection of documents.
A model architecture that uses self-attention mechanisms and has been used in
various tasks like translation, summarization, etc.
52- BERT:
54- Q-Learning:
An algorithm that combines Q-Learning and deep neural networks at its core.
58- ARIMA:
Using Long Short-Term Memory models, a type of recurrent neural network, for
forecasting time series data.
A property of learning algorithms that the error obtained on any dataset can
be broken down into bias, variance, and noise.
Problems in machine learning where the model performs well on the training
data but not on the unseen data (overfitting) or where the model performs
poorly on both (underfitting).
66- Cross-Validation:
79- MLOps:
Practices for combining Machine Learning, DevOps, and Data Engineering, which
aims to standardize and streamline the continuous delivery of ML systems.
80- AutoML:
84- Meta-Learning:
Learning from data that cannot fit into the main memory of a computing
device.
When information from outside the training dataset is used to create the
model.
Statistical methods for analyzing the time until the occurrence of an event.
The process of predicting future values based on historical time series data.
Algorithms and data structures used to find the closest points in a dataset
to a given query point, such as KD-trees or Locality-Sensitive Hashing (LSH).
A group of algorithms for pattern analysis whose best known element is the
support vector machine (SVM). The general task of pattern analysis is to find
and study general types of relations in datasets.
A type of machine learning where the data provides the supervision. It's a
subfield of unsupervised learning techniques where auxiliary tasks are
created for the purpose of self-supervision.