You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
130 lines
4.5 KiB
130 lines
4.5 KiB
# What Is Data Science?
|
|
|
|
## Lesson Objectives
|
|
|
|
1. Talk About Each Other
|
|
1. Describe What Data Science Is
|
|
1. Describe the Qualities Of A Data Scientist
|
|
1. Describe the Data Science Workflow
|
|
|
|
## Talk About Each Other
|
|
|
|
1. Here's a bit about me
|
|
1. This class can be about networking, too! Tell us about yourself!
|
|
- What is Your Name?
|
|
- What Brings You To GA?
|
|
- What Are Your Current Activities?
|
|
|
|
## Describe What Data Science Is
|
|
|
|
What is it, exactly?
|
|
|
|
- A set of tools and techniques used to extract useful information from data.
|
|
- An interdisciplinary, problem-solving oriented subject
|
|
|
|
What does it consist of?
|
|
|
|
- Programming skills
|
|
- Math and Statistics knowledge
|
|
- Business sense
|
|
- Domain Knowledge
|
|
- Communication Skills
|
|
|
|

|
|
|
|
## Describe the Qualities Of A Data Scientist
|
|
|
|
### Exercise
|
|
|
|
Let's talk through the following questions in groups:
|
|
|
|
1. What do you think are the most important qualities for a data scientist?
|
|
2. Can you think of any other quality/skill we have not mentioned?
|
|
3. What is your field of expertise?
|
|
4. Do you use tools such as Excel, Stata, R, or Python?
|
|
5. Where are you in the intersection of these skills?
|
|
|
|
### Possible Answers
|
|
|
|
- Ask good questions:
|
|
- What is required?
|
|
- How are results evaluated? (measures of success)
|
|
- What do we currently know? (existing data)
|
|
- What has happened? (descriptive analytics)
|
|
- What will happen (if)? (predictive analytics)
|
|
- What to do to achieve what we require? (insight)
|
|
- Define and test a hypothesis/run experiments.
|
|
- Scrape, & sample business relevant data.
|
|
- Manipulate, sanitize, and wrangle data.
|
|
- Visualize data.
|
|
- Understand data relationships.
|
|
- Tell the machine how to learn from data.
|
|
- Create data products that deliver actionable insight.
|
|
- Tell relevant business stories from data.
|
|
|
|
## Describe the Data Science Workflow
|
|
|
|
### Self Assessment on Data Science Skills
|
|
|
|
For a given class size:
|
|
|
|
- how many people will rate themselves strongest in Programming Skills?
|
|
- how many people will rate themselves strongest in Math and Statistics Knowledge?
|
|
- how many people will rate themselves strongest in Business Sense?
|
|
- how many people will rate themselves strongest in Domain Knowledge?
|
|
- how many people will rate themselves strongest in Communication Skills?
|
|
|
|
What to do:
|
|
|
|
1. Create a table for the qualities of a data scientist and then rate yourself on each of these skills on a scale from 1-10.
|
|
1. We will then use the data to show how simple statistics in action are part of the data science workflow.
|
|
|
|
| Skill | Value |
|
|
| --- | --- |
|
|
| Programming Skills | |
|
|
| Math and Statistics Knowledge | |
|
|
| Business Sense | |
|
|
| Domain Knowledge | |
|
|
| Communication Skills | |
|
|
|
|
### The Data Science Workflow
|
|
|
|
1. Identify the problem
|
|
- what are we trying to do?
|
|
- ask questions
|
|
- form hypothesis
|
|
1. Acquire the data
|
|
- get data in its raw form
|
|
- scraping the data from a website
|
|
- downloading a file
|
|
- reading a book/article
|
|
1. Parse the data
|
|
- format the data so that it's all the same
|
|
1. Mine the data
|
|
- collect information from the data
|
|
1. Refine the data
|
|
- clean the data up
|
|
- discard outliers, etc
|
|
1. Build a data model
|
|
- figure out a formula that represents what we are trying to learn
|
|
1. Present the results
|
|
- visualize the results
|
|
1. Deploy and validate
|
|
- create a site
|
|
- publish findings
|
|
|
|

|
|
|
|
|
|
### Your Turn: The Data Science Workflow
|
|
|
|
You are a junior data scientist at Amazon. Your boss asks you about the leading indicators that a user will make a new online purchase. How would you go about solving this question?
|
|
|
|
1. Identify the problem: What do you think are the indicators?
|
|
1. Acquire Data: What could we do first here? What are some considerations we should make?
|
|
1. Parse Data: How do you format the data so it is all the same?
|
|
1. Mine and Refine: What calculations/transformation do you recommend doing? How do you determine the presence of outliers?
|
|
1. Data Model: What attributes would you include in the modeling stage? How do you know if the model is performing well?
|
|
1. Present Results: Who is your audience? What is the best way to present your results?
|
|
1. Deploy and Validate: How would this be shared with the community? How will it be validated?
|