Introduction to Data Science
Module 1
Build a solid foundation in data science thinking and gain comprehensive knowledge of the 3 pillars of data (Structured Data, Computer Vision, NLP). This course focuses on problem fundamentals, standard data processing workflows, and practical applications to give you the most hands-on perspective of the field.
Course Overview
Target Audience
High school students (Grade 10-11) interested in Data Science who want to gain an overview to assess their fit and plan their future development path.
Approach
Build solid thinking foundation and practical skills for beginners. Focus on problem fundamentals, standard workflows, and real-world applications.
Organizers
The Noders PTNK × PRISEE
Duration
4 sessions × 1 hour 30 minutes
Scheduled for January 2026
Course Curriculum
4 comprehensive sessions, each 1 hour 30 minutes
Data Science Thinking & Standard Workflows
Objective: Understand the big picture and professional working processes
Materials will be available after the session
Topics Covered
Overview of Data Science
- Definition and role in the digital era
- Distinguishing Data Analytics vs Data Science
The Three Core Data Pillars
- Structured Data
- Computer Vision
- Natural Language Processing (NLP)
Standard Data Processing Workflow (End-to-End)
- Data Collection
- Data Pre-processing: Cleaning and standardization
- Modeling & Analysis
- Visualization & Reporting
Case Study
- IELTS score data analysis
Data Processing & Visualization Techniques
Objective: Master tools for working with tabular data
Materials will be available after the session
Topics Covered
Analysis Tool Ecosystem
- SQL: Query and extract data from large systems
- Pandas (Python): Powerful data processing and analysis library
- Matplotlib/Tableau: Data visualization tools
Hands-on Practice
- Data querying techniques with SQL (SELECT, WHERE...)
- Data cleaning and transformation with Pandas DataFrame
- Building visualization charts (Bar Chart, Line Plot) to find insights
Computer Vision & Basic Machine Learning
Objective: Understand how computers process images and classification algorithms
Materials will be available after the session
Topics Covered
Digital Image Fundamentals
- Representing images as numerical matrices (Pixel Matrix)
- Image preprocessing techniques: Resize, Grayscale, Normalization
K-Nearest Neighbors (KNN) Algorithm
- Operating principles and applications in classification problems
- Practice: Build models to predict shirt sizes and recognize handwritten digits
Introduction to Deep Learning
- Convolutional Neural Networks (CNN) and advantages over traditional algorithms
Natural Language Processing & Model Evaluation
Objective: Approach text data and AI model evaluation standards
Materials will be available after the session
Topics Covered
Basic NLP Techniques
- Text cleaning process
- Tokenization and normalization techniques (Stemming/Lemmatization)
Text Representation Methods
- Bag of Words & TF-IDF
- Word Embedding (Word2Vec): Vectorizing word semantics
Model Evaluation Metrics
- Accuracy, Precision, Recall, F1-Score
- Analyzing meaning and choosing appropriate metrics for each problem (Examples: Healthcare, Finance)
Tools & Practice Environment
Python
Primary programming language
Jupyter Notebook
Interactive development environment
Pandas & SQL
Data manipulation and querying
Matplotlib
Data visualization
Scikit-learn
Machine learning algorithms
Practice Notebooks Included
Lecture_2_Demo.ipynb
SQL, Pandas & Visualization
Hands-on practice with data querying, manipulation, and creating insightful visualizations
Lecture_3_Demo.ipynb
KNN Algorithm & Computer Vision
Build classification models and explore image processing fundamentals
What You'll Achieve
Form correct data analysis thinking
Understand the standard workflow of a Data Scientist
Gain foundational knowledge to continue developing at more advanced levels
Practical experience with Python, Jupyter Notebook, and industry-standard tools
Comprehensive overview of all three data pillars (Structured, Vision, NLP)
Ability to evaluate and choose appropriate models for different problem types