ABSTRACT: Over the last decade, there has been an explosion in the amount of data gathered by both private companies and publicly funded research institutions, and computational ability to analyze these datasets has also increased. However, in order to make use of this huge amount of information to draw useful conclusions, the data must be cleaned, organized, analyzed, and presented in a way that is easily understood. Therefore, demand for individuals with the skill set to accomplish these tasks has soared, and data science is one of the fastest growing professions in the world. This comprehensive data science tutorial shows the reader the foundations of the data science pipeline, teaching statistical and computational techniques for analysis and visualization of data. This tutorial will also serve as a model for science-based data science analysis using the topic of medical appointment no-shows. I will begin with background on the importance of the topic, then follow with a multi-section tutorial walking through the different steps of the data science pipeline, including set-up, data collection and processing, exploratory data analysis, and machine learning analysis. Each section will have a short intro describing the section and any relevant definitions or concepts, followed by code, graphic visualizations, and prose explanation. I will conclude by using the results of the analysis to provide relevant insights and policy suggestions on the given topic of medical appointment no-shows.
(WC:228)
READER’S PROFILE: I can imagine a reader being overwhelmed by all of the different data science concepts, especially the python specific methods used in the coding section.
READER’S RESPONSE: I'm excited to learn a useful skill that will likely be attractive to employers, but I'm not sure I can follow the wide range of coding concepts covered. I'm sure there are resources available, but I'm not sure where to look. If only this tutorial provided more curated links to further documentation and practical examples for the common methods used in this analysis....
Over the last decade, there has been an explosion in the amount of data gathered by both private companies and publicly funded research institutions, and computational ability to analyze these datasets has also increased. However, in order to make use of this huge amount of information to draw useful conclusions, the data must be cleaned, organized, analyzed, and presented in a way that is easily understood. Therefore, demand for individuals with the skill set to accomplish these tasks has soared, and data science is one of the fastest growing professions in the world. This comprehensive data science tutorial shows the reader the foundations of the data science pipeline, teaching statistical and computational techniques for analysis and visualization of data. This tutorial will also serve as a model for science-based data science analysis using the topic of medical appointment no-shows. I will begin with background on the importance of the topic, then follow with a multi-section tutorial walking through the different steps of the data science pipeline, including set-up, data collection and processing, exploratory data analysis, and machine learning analysis. Each section will have a short intro describing the section and any relevant definitions or concepts, followed by code, graphic visualizations, and prose explanation. I will conclude by using the results of the analysis to provide relevant insights and policy suggestions on the given topic of medical appointment no-shows.
(WC:228)
READER’S PROFILE: I can imagine a reader being overwhelmed by all of the different data science concepts, especially the python specific methods used in the coding section.
READER’S RESPONSE: I'm excited to learn a useful skill that will likely be attractive to employers, but I'm not sure I can follow the wide range of coding concepts covered. I'm sure there are resources available, but I'm not sure where to look. If only this tutorial provided more curated links to further documentation and practical examples for the common methods used in this analysis....