FINAL PROJECT PROPOSAL: Magnus Opus and Exigence > An Introduction to the Data Science Pipeline through the Coffee Price Crisis

Audience: This document is targeted at someone with basic knowledge of programming (specifically in python) with an interest in developing familiarity with data science methods. The document may also be of interest to the coffee community, although the analysis is not designed for a lay person.

Context: The data analysis pipeline is a five step process that allows application-specific insight to be gained from unstructured and disparate data sources. There are specific tools, procedures, and vocabulary that are favored by researchers; knowledge of these allows for the creation of widely understandable and reproducable results.

Purpose: The purpose of this document is to understand the coffee price crisis in the context of data on commodity price and production as well as to introduce the audience to the data analysis pipeline.

Document Type: The final document will be a Jupyter notebook, as it is one of the most elegant ways to display a mix of code and prose. It will present both the results of the research and analysis as well as a way to replicate them from scratch.

Design/format: The document will be formatted into sections loosely following the steps of the data analysis pipeline (Data collection, data processing, exploratory data analysis & visualization, analysis & hypothesis testing, insight & policy decision).

Citation style: Citations will be referral links, as those will be easy to click on and skim while reading through the Jupyter notebook.

November 27, 2019 | Unregistered CommenterBD

B,
I like this plan and applaud use of JN as a "bilingual guide." I wonder how you will think about the audience, as in a real one. Would this be placed on a GitHub profile (yours?) or could this become a tutorial in DS on campus?

If you plan to share this in another class, touch base with me about this soon so we can develop a plan of permissions.

Is this method you note used in other food supply chains? Corn? Rice? What about cocoa and chocolate?

General comment: you will write more annotations and longer annotations than most data scientists typically need. However, this is good practice for professional life, to think of the audience of non coding supervisors and colleagues.


Mb

December 1, 2019 | Registered CommenterMarybeth Shea