Omdena Academy

Data Science Fundamentals: Empowering Communities with AI Technologies

Course Details
By the end of this course, learners will have developed a comprehensive hands-on artifact: a Data Science Portfolio Web Application using Streamlit and GitHub integration. This project serves as a showcase of their acquired skills in Python, data collection, exploratory data analysis (EDA), natural language processing (NLP), deep learning fundamentals, and web application development. The Data Science Portfolio Web Application will allow users to interactively explore various data science projects and analyses conducted by the learner.
The course, tailored for young talent and junior AI employees at the respective country, offers significant benefits to various job roles within these regions. Overall, the course caters to a diverse audience of young talent and junior AI employees in the local community, offering them a pathway to enhance their skills, advance their careers, and achieve their professional goals in the rapidly growing field of data science and artificial intelligence.
Familiarity with Programming language and logic.
- Quizzes before and after every Module: Assess understanding and track progress through pre-module and post-module quizzes, ensuring learners grasp key concepts before advancing.
- Assignments:
- Python Programming Skills Assessment: Evaluate proficiency in Python programming to solidify foundational skills crucial for data science and AI tasks.
- Data Visualization Application: Apply knowledge by creating a data visualization application, showcasing the ability to present insights effectively.
- Project: Undertake an end-to-end Data Science Project encompassing:
- Data Collection: Gather data from various sources, applying the skills in data acquisition and preprocessing.
- EDA (Exploratory Data Analysis): Analyze and explore data to uncover patterns and insights, setting the stage for informed decision-making.
- Visualization: Create visual representations to communicate findings intuitively.
- ML Models: Implement machine learning models to extract predictive insights from data.
- Web Deployment: Develop and deploy a web application to showcase the project's results, enhancing accessibility and interactivity using Streamlit.
- Maintaining GitHub Repository: Manage project codebase on GitHub, facilitating collaboration and version control.
- 2.5 hours – each week
- 40-minute lectures following 15-minute breakout room activities.
- Students are expected to work 3 - 5 hours per week.
Learning Objectives
Acquire proficiency in Python programming, including syntax, data structures, and object-oriented programming principles, to effectively manipulate data and implement machine learning algorithms.
Gain a solid understanding of key concepts in data science, including data collection, exploratory data analysis (EDA), data visualization, and machine learning fundamentals.
Develop practical skills through hands-on projects, quizzes, and assignments, reinforcing theoretical knowledge with real-world applications and scenarios.
Understand the importance of collaboration and version control in data science projects, learn how to use tools like GitHub to manage project codebase, and facilitate teamwork.
Build a comprehensive data science portfolio showcasing proficiency in Python programming, data analysis, machine learning, and web development, demonstrating the ability to solve complex problems and communicate findings effectively.
Course Modules
In this module, learners will be introduced to the fundamentals of Python programming. They will start by understanding basic concepts such as variables, data types, and operators. Then, they will delve into control structures such as loops and conditional statements. Next, they will learn about functions and how to define and call them. Finally, the module will cover object-oriented programming (OOP) principles, including classes, objects, inheritance, and polymorphism. Through hands-on exercises and coding challenges, learners will gain proficiency in writing Python code to solve basic programming problems and manipulate data structures.
Assessments:
- Pre-Post Test
- Python Programming Skills Assignment
Learning Objective: Understand Data Science Fundamentals
This module introduces learners to basic Python libraries commonly used in data science applications. They will explore libraries such as NumPy for numerical computing, Pandas for data manipulation and analysis, and Matplotlib and Seaborn for data visualization. Through practical examples and guided exercises, learners will understand how to use these libraries to perform data manipulation, exploration, and visualization tasks essential for data science projects.
Assessments:
- Pre-Post Test
- Choosing Project Topic/Area of Interest
Learning Objective: Apply Data Collection Techniques
This module focuses on web scraping techniques using the Beautiful Soup library in Python. Learners will understand how to extract data from websites by inspecting HTML structure and using Beautiful Soup to parse and extract relevant information. They will explore common web scraping challenges and learn best practices for ethical and efficient data collection. By the end of the module, learners will be able to compile publicly accessible data for analysis and further processing.
Assessments:
- Pre-Post Test
- Data Collection and Cleaning for Project
Learning Objective: Advanced Data Collection Techniques
Building upon the previous module, this module introduces learners to more advanced data collection techniques using Splash and Scrapy. They will learn how to scrape dynamic web pages and handle JavaScript-rendered content using Splash, a headless browser, and how to structure and automate web scraping tasks using Scrapy, a web crawling framework. Through hands-on projects, learners will gain experience in collecting structured data from various websites efficiently and effectively.
Assessments:
- Pre-Post Test
- Data Collection and Cleaning for Project
Learning Objective: Analyze and Understand Data
This module focuses on exploratory data analysis (EDA) techniques and feature engineering. Learners will understand the importance of exploratory data analysis in understanding the underlying patterns and relationships within the data. They will explore various statistical and visualization techniques to summarize and visualize data distributions, relationships, and anomalies. Additionally, learners will learn how to engineer new features from existing data to improve model performance and predictive accuracy.
Assessments:
- Pre-Post Test
- Applying EDA on Data
Learning Objective: Visualize Data Effectively
In this module, learners will focus on data visualization techniques using Python libraries such as Matplotlib, Seaborn, and Plotly. They will learn how to create various types of plots, including scatter plots, line plots, bar charts, histograms, and heatmaps, to effectively communicate insights from data. Through hands-on exercises and projects, learners will gain proficiency in creating informative and visually appealing data visualizations for exploratory analysis and presentation purposes.
Assessments:
- Pre-Post Test
- Data Visualization Application Assignment
Learning Objective: Apply Machine Learning Techniques
This module focuses on practical applications of machine learning techniques in data science projects. Learners will understand how to select appropriate machine learning algorithms based on the characteristics of the data and the problem domain. They will explore supervised learning techniques such as classification and regression, unsupervised learning techniques such as clustering and dimensionality reduction, and ensemble learning techniques such as random forests and gradient boosting. Through hands-on projects and case studies, learners will gain experience in building, training, and evaluating machine learning models for various real-world applications, including predictive modeling, recommendation systems, and anomaly detection. Additionally, learners will understand how to interpret model outputs and make data-driven decisions based on model predictions. By the end of the module, learners will be equipped with the skills and knowledge necessary to leverage machine learning techniques effectively in data science projects.
Assessments:
- Pre-Post Test
- Applying related ML models on the DataSet for the Project
Learning Objective: Apply NLP Techniques
This module introduces learners to natural language processing (NLP) techniques for text data analysis. They will learn how to preprocess text data, including tasks such as tokenization, stemming, and lemmatization. Additionally, learners will explore techniques for text classification, sentiment analysis, and topic modeling using libraries such as NLTK and spaCy. Through practical examples and projects, learners will understand how to apply NLP techniques to extract insights from textual data.
Assessments:
- Pre-Post Test
- Applying related NLP models on the DataSet for Project
Learning Objective: Collaborate and Version Control
In this module, learners will learn how to effectively use Git and GitHub for version control and collaboration in data science projects. They will understand the basics of Git, including creating repositories, branching, committing changes, and merging branches. Additionally, learners will explore best practices for collaborating with teammates, managing project workflows, and resolving conflicts. By the end of the module, learners will be proficient in using Git and GitHub to manage and share their data science projects efficiently.
Assessments:
- Pre-Post Test
- Creating Project Repository on Github
Learning Objective: Develop Interactive Web Applications
This module focuses on developing interactive web applications using Streamlit, a Python library for building data-driven web apps. Learners will understand how to create user-friendly interfaces for data science projects, allowing users to interactively explore data visualizations, machine learning models, and insights. They will learn how to deploy Streamlit applications and share them with others, enabling seamless collaboration and knowledge sharing within the data science community.
Assessments:
- Pre-Post Test
- Deploying Streamlit Application
Instructors
Additional information
Duration12 weeks
Skill LevelBeginner
Certificates upon completionYes
Live coding sessions
Recordings available after the classes
Network with peers
Engaging sessions with quizzes and tasks
Peer-to-peer mentoring