What is data science?

Data science is an area of study that focuses on organizing, analyzing, and extracting insights from large data sets (big data) using various scientific methods, algorithms, and processes. The term originates from the use of mathematical analysis to discover hidden patterns from raw data.

The lifecycle of data science generally consists of:

  1. Entering, capturing, and extracting data.
  2. Maintaining data in a data warehouse, including cleaning and staging data.
  3. Processing data through data mining, classifying, clustering, and modeling.
  4. Communicating the results to others through reporting and visualization.

What does a data scientist do?

A data scientist is able to identify questions, collect data from various sources, organize and analyze the data, and then communicate their findings to affect meaningful decisions for businesses.

A data scientist will use some or all of the following technologies: Apache Hadoop, Apache Spark, ETL, AWS Redshift, Jupyter, MapReduce, NoSQL and SQL databases, Python, R, Tableau, and GitHub.

Using data science

Data science helps solve problems that would otherwise be impossible to solve. The following are areas where data science is currently being used to innovate and provide solutions:

  • Automotive manufacturers like Tesla use data science so self-driving cars can adjust their speed based on surrounding vehicles.
  • Fashion and shopping companies use data science to recommend purchases based on previous purchasing activities.
  • Financial institutions use data science to detect fraudulent behavior through anomaly detection.
  • Healthcare uses data science to develop new ways to diagnose and treat diseases.
  • Technology companies use data science to mark emails as spam through classification.

