You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

101 lines
3.2 KiB

# Introduction to Data Science
## Lesson Objectives
1. Intros
1. What is Data Science?
## Intros
1. Here's a bit about me
1. This class can be about networking, too! Tell us about yourself!
- What is Your Name?
- What Brings You To GA?
- What Are Your Current Activities?
## What is Data Science?
What is it, exactly?
- A set of tools and techniques used to extract useful information from data.
- An interdisciplinary, problem-solving oriented subject
What does it consist of?
- Programming skills
- Math and Statistics knowledge
- Business sense
- Domain Knowledge
- Communication Skills
![venn diagram](https://static1.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10f77ab/1364352052403/Data_Science_VD.png)
## Your Turn: Qualities Of A Data Scientist And You
Let's talk through the following questions in groups:
1. What do you think are the most important qualities for a data scientist?
2. Can you think of any other quality/skill we have not mentioned?
3. What is your field of expertise?
4. Do you use tools such as Excel, Stata, R, or Python?
5. Where are you in the intersection of these skills?
## Possible Answers: Qualities Of A Data Scientist And You
- Ask good questions:
- What is required?
- How are results evaluated? (measures of success)
- What do we currently know? (existing data)
- What has happened? (descriptive analytics)
- What will happen (if)? (predictive analytics)
- What to do to achieve what we require? (insight)
- Define and test a hypothesis/run experiments.
- Scrape, & sample business relevant data.
- Manipulate, sanitize, and wrangle data.
- Visualize data.
- Understand data relationships.
- Tell the machine how to learn from data.
- Create data products that deliver actionable insight.
- Tell relevant business stories from data.
## Self Assessment on Data Science Skills
For a given class size
- how many people will rate themselves strongest in Programming Skills?
- how many people will rate themselves strongest in Math and Statistics Knowledge?
- how many people will rate themselves strongest in Business Sense?
- how many people will rate themselves strongest in Domain Knowledge?
- how many people will rate themselves strongest in Communication Skills?
1. Create a table for the qualities of a data scientist and then rate yourself on each of these skills on a scale from 1-10.
1. We will then use the data to show how simple statistics in action are part of the data science workflow.
| Skill | Value |
| --- | --- |
| Programming Skills | |
| Math and Statistics Knowledge | |
| Business Sense | |
| Domain Knowledge | |
| Communication Skills | |
## The Data Science Workflow
1. Identify the problem
- what are we trying to do?
1. Acquire the data
- get data in its raw form
- scraping the data from a website
- downloading a file
- reading a book/article
1. Parse the data
- format the data so that it's all the same
1. Mine the data
- collect information from the data
1. Refine the data
- clean the data up
- discard outliers, etc
1. Build a data model
- figure out a formula that represents what we are trying to learn
1. Present the results
- visualize the results