Who is a data scientist?

Someone whose task is to create value for the company by developing data analysis solutions which add value to the business, and to implement these solutions in a production environment.

A data scientist should:
  • Understand the business discover ways to add value through data analysis.
  • Develop analysis methods based on the state of the art in statistics, machine learning, data mining, information retrieval etc.
  • Prototype such analysis methods using platforms like R or Matlab. While such platforms might not be fit for production, they are a great playing field to try out ideas and play around with data.
  • Write production quality codes and transfer analysis to a production environment. This will usually mean implementing the method in languages such as Python or Java.
  • Discover ways to scale the method using any of the recent clustering technology, for example NoSQL data bases, stream processing frameworks, messaging, map reduce, etc.
  • Build ways to monitor the system to keep it running in production.
Curriculum

This course is intended to provide participants with the knowledge and skills to identify, model, analyze, measure, and document business processes that comply with recognized best practices and that are practical. The focus of the course is on understanding business processes and how to analyze them to identify improvement areas, conduct analysis to develop alternative recommendations with relevant justification, document/present findings and recommendations, and establish a continuous process improvement discipline.

  • Introduction to Data Science and Analytics
    At the end of this section, students will be able to Know key concepts of Data Science, current trends and technologies behind Big data.
  • Introduction to Command line and version control
    At the end of this section, students will be able to practice various commands to perform different operations like navigating directories, organizing files and use version control tools like Github.
  • Data Cleaning
    At the end of this section, students will be able to apply various techniques to clean data and prepare data for analysis.
  • Python for Data Science
    At the end of this section, students will be able to use Python to perform basic exploratory analysis to better understand the data.
  • Introduction to databases and SQL for Data Analysis
    At the end of this section, students will be able to interact and manipulate real data in relational databases.
  • Introduction to R and Practical Statistics
    At the end of this section, students will be able to manipulate data with R and build regression models.
  • Business Intelligence
    At the end of this section, students will be able to design effective visualizations that account for visual perception.
  • Supervised Machine Learning
    At the end of this section, students will be able to apply different machine learning algorithms to real problems and identify appropriate set of algorithms for a given problem statement.
  • Unsupervised Machine Learning
    At the end of this section, students will be able to know the fundamental principles of Clustering and Dimensionality reduction.
  • Text and Web Analytics
    At the end of this section, students will be able to scrap data from the web and build data pipelines.
  • Communicate results to stakeholders
    At the end of this section, students will be able to explain analysis results to technical and non-technical audiences and deploy Machine Learning models to production.
Apply