DATA SCIENCE

  1. What is Data Science?
  2. Statistics
    • Central tendency
    • Variability
    • Hypothesis testing
    • Anova
    • Correlation
    • Regression
    • Probability
    • Joint probabilities
    • Bayes theorem
  3. Mathematics
    • Linear Algebra
    • Calculus
    • Integral transformations
    • Vector algebra
    • Vector calculus
    • Matrices and vector spaces
    • Information theory
  4. Databases
    • Database Types and Concepts
    • SQL vs NoSQL 
    • Data Modeling and Database Design
    • Data Cleaning and Transformation
    • Database Optimization and Performance
    • Data Integrity and Quality Checks
    • Integration with Tools and Languages
  5. Type Of Data
    • Structured Data
    • Unstructured Data
    • Semi-structured Data
  6. Data Manipulation & Analysis
    • Data Extraction
      • Data Extraction vs Data Mining
      • Role of Extract, Transform, Load (ETL)
    • Data Wrangling / Data Cleaning / Data Munging
    • Data Visualisation
      • Tools
        • Tableau
        • Google Charts
        • Dundas BI
        • Power BI
        • JupyteR
        • Infogram
        • ChartBlocks
        • D3.js
        • FusionCharts
        • Grafana
    • Data Modeling
    • Exploratory Data Analysis (EDA)
  7. Big Data
    • Overview
    • Engineering with Hadoop
      • Overview
      • Ecosystem
      • HDFS Architecture
      • MapReduce
      • Yarn
      • Hive
      • HBase
      • Pig
    • Engineering with Spark
      • Introduction to Spark
      • Working with RDDs in Spark
      • Aggregating Data with Pair RDDs
      • Writing and Deploying Spark Applications
      • Parallel Processing
      • Spark RDD Persistence
      • Spark MLlib
      • Integrating Apache Flume and Apache Kafka
      • Spark Streaming
      • Improving Spark Performance
      • Spark SQL and Data Frames
      • Scheduling/Partitioning in Spark
    • Data Processing & Analysis
      • Stream vs Batch Processing
      • Apache Flink
      • Apache Storm
    • Distributed Storage Systems
      • HDFS
    • Data Warehousing
      • Amazon Redshift
    • Data Lakes
      • Apache Delta Lake
    • Data Science with R
      • Overview
      • R packages
      • Sorting DataFrame
      • Matrices and vectors
      • Reading data from external files
      • Generating plots
      • Analysis of Variance (ANOVA)
      • K-means clustering
      • Association rule mining
      • Regression in R
      • Analyzing relationship with regression
      • Advanced regression
      • Logistic Regression
      • Advanced Logistic Regression
      • Receiver Operating Characteristic (ROC)
      • Kolmogorov-Smirnov chart
      • Database connectivity with R
      • Integrating R with Hadoop
    • Data Science with Python
      • Overview
      • Python packages
      • Pandas
        • Introduction
        • Creating Objects
        • Viewing Data
        • Selection
        • Manipulating Data
        • Grouping Data
        • Merging, Joining and Concatenating
        • Working with Date and Time
        • Working With Text Data
        • Working with CSV and Excel files
        • Operations
        • Visualization
      • Numpy
        • Introduction
        • Ndarray
        • Datatypes
        • Arrays
      • Matplotlib
        • Introduction
      • Seaborn
        • Introduction
      • Scikit-learn
        • Introduction
      • Statsmodels
        • Introduction
      • SciPy
        • Introduction
      • TensorFlow
        • Introduction
      • PyTorch
        • Introduction
      • Keras
        • Introduction
      • NLTK (Natural Language Toolkit)
        • Introduction
    • Miscellaneous
      • Data Engineer vs Data Analyst vs Data Scientist vs Machine Learning Engineer
      • Data Science Resources