The Data Science Workflow

master
Matt Huntington 9 years ago
parent 65a585b7b7
commit 2e4e6a1640

@ -60,13 +60,41 @@ Let's talk through the following questions in groups:
## Self Assessment on Data Science Skills ## Self Assessment on Data Science Skills
For a given class size
- how many people will rate themselves strongest in Programming Skills?
- how many people will rate themselves strongest in Math and Statistics Knowledge?
- how many people will rate themselves strongest in Business Sense?
- how many people will rate themselves strongest in Domain Knowledge?
- how many people will rate themselves strongest in Communication Skills?
1. Create a table for the qualities of a data scientist and then rate yourself on each of these skills on a scale from 1-10. 1. Create a table for the qualities of a data scientist and then rate yourself on each of these skills on a scale from 1-10.
1. We will then use the data to show how simple statistics in action are part of the data science workflow. 1. We will then use the data to show how simple statistics in action are part of the data science workflow.
| Skill | Value | | Skill | Value |
| --- | --- | | --- | --- |
| Programming skills | | | Programming Skills | |
| Math and statistics knowledge | | | Math and Statistics Knowledge | |
| Business sense | | | Business Sense | |
| Domain Knowledge | | | Domain Knowledge | |
| Communication Skills | | | Communication Skills | |
## The Data Science Workflow
1. Identify the problem
- what are we trying to do?
1. Acquire the data
- get data in its raw form
- scraping the data from a website
- downloading a file
- reading a book/article
1. Parse the data
- format the data so that it's all the same
1. Mine the data
- collect information from the data
1. Refine the data
- clean the data up
- discard outliers, etc
1. Build a data model
- figure out a formula that represents what we are trying to learn
1. Present the results
- visualize the results

Loading…
Cancel
Save