The field of data science, big data, machine learning, and artificial intelligence is exciting and complex at the same time. Data science is also rapidly growing with new tools, technologies, algorithms, datasets, and use cases. For a beginner in this field, the learning curve can be fairly daunting. This is where this book helps.
The data science solutions book provides a repeatable, robust, and reliable framework to apply the right-fit workflows, strategies, tools, APIs, and domain for your data science projects.
This book takes a solutions focused approach to data science. Each chapter meets an end-to-end objective of solving for data science workflow or technology requirements. At the end of each chapter you either complete a data science tools pipeline or write a fully functional coding project meeting your data science workflow requirements.
SEVEN STAGES OF DATA SCIENCE SOLUTIONS WORKFLOW
Every chapter in this book will go through one or more of these seven stages of data science solutions workflow.
STAGE 1: Question. Problem. Solution.
Before starting a data science project we must ask relevant questions specific to our project domain and datasets. We may answer or solve these during the course of our project. Think of these questions-solutions as the key requirements for our data science project. Here are some templates that can be used to frame questions for our data science projects.
Can we classify an entity based on given features if our data science model is trained on certain number of samples with similar features related to specific classes?
Do the samples, in a given dataset, cluster in specific classes based on similar or correlated features?
Can our machine learning model recognise and classify new inputs based on prior training on a sample of similar inputs?
STAGE 2: Acquire. Search. Create. Catalog.
This stage involves data acquisition strategies including searching for datasets on popular data sources or internally within your organisation. We may also create a dataset based on external or internal data sources.
The acquire stage may feedback to the question stage, refining our problem and solution definition based on the constraints and characteristics of the acquired datasets.
STAGE 3: Wrangle. Prepare. Cleanse.
The data wrangle phase prepares and cleanses our datasets for our project goals. This workflow stage starts by importing a dataset, exploring the dataset for its features and available samples, preparing the dataset using appropriate data types and data structures, and optionally cleansing the data set for creating model training and solution testing samples.
The wrangle stage may circle back to the acquire stage to identify complementary datasets to combine and complete the existing dataset.
STAGE 4: Analyse. Patterns. Explore.
The analyse phase explores the given datasets to determine patterns, correlations, classification, and nature of the dataset. This helps determine choice of model algorithms and strategies that may work best on the dataset.
The analyse stage may also visualize the dataset to determine such patterns.
STAGE 5: Model. Predict. Solve.
The model stage uses prediction and solution algorithms to train on a given dataset and apply this training to solve for a given problem.
STAGE 6: Visualize. Report. Present.
The visualization stage can help data wrangling, analysis, and modeling stages. Data can be visualized using charts and plots suiting the characteristics of the dataset and the desired results.
Visualization stage may also provide the inputs for the supply stage.
STAGE 7: Supply. Products. Services.
Once we are ready to monetize our data science solution or derive further return on investment from our projects, we need to think about distribution and data supply chain. This stage circles back to the acquisition stage. In fact we are acquiring data from someone else's data supply chain.
The data science solutions book provides a repeatable, robust, and reliable framework to apply the right-fit workflows, strategies, tools, APIs, and domain for your data science projects.
This book takes a solutions focused approach to data science. Each chapter meets an end-to-end objective of solving for data science workflow or technology requirements. At the end of each chapter you either complete a data science tools pipeline or write a fully functional coding project meeting your data science workflow requirements.
SEVEN STAGES OF DATA SCIENCE SOLUTIONS WORKFLOW
Every chapter in this book will go through one or more of these seven stages of data science solutions workflow.
STAGE 1: Question. Problem. Solution.
Before starting a data science project we must ask relevant questions specific to our project domain and datasets. We may answer or solve these during the course of our project. Think of these questions-solutions as the key requirements for our data science project. Here are some templates that can be used to frame questions for our data science projects.
Can we classify an entity based on given features if our data science model is trained on certain number of samples with similar features related to specific classes?
Do the samples, in a given dataset, cluster in specific classes based on similar or correlated features?
Can our machine learning model recognise and classify new inputs based on prior training on a sample of similar inputs?
STAGE 2: Acquire. Search. Create. Catalog.
This stage involves data acquisition strategies including searching for datasets on popular data sources or internally within your organisation. We may also create a dataset based on external or internal data sources.
The acquire stage may feedback to the question stage, refining our problem and solution definition based on the constraints and characteristics of the acquired datasets.
STAGE 3: Wrangle. Prepare. Cleanse.
The data wrangle phase prepares and cleanses our datasets for our project goals. This workflow stage starts by importing a dataset, exploring the dataset for its features and available samples, preparing the dataset using appropriate data types and data structures, and optionally cleansing the data set for creating model training and solution testing samples.
The wrangle stage may circle back to the acquire stage to identify complementary datasets to combine and complete the existing dataset.
STAGE 4: Analyse. Patterns. Explore.
The analyse phase explores the given datasets to determine patterns, correlations, classification, and nature of the dataset. This helps determine choice of model algorithms and strategies that may work best on the dataset.
The analyse stage may also visualize the dataset to determine such patterns.
STAGE 5: Model. Predict. Solve.
The model stage uses prediction and solution algorithms to train on a given dataset and apply this training to solve for a given problem.
STAGE 6: Visualize. Report. Present.
The visualization stage can help data wrangling, analysis, and modeling stages. Data can be visualized using charts and plots suiting the characteristics of the dataset and the desired results.
Visualization stage may also provide the inputs for the supply stage.
STAGE 7: Supply. Products. Services.
Once we are ready to monetize our data science solution or derive further return on investment from our projects, we need to think about distribution and data supply chain. This stage circles back to the acquisition stage. In fact we are acquiring data from someone else's data supply chain.