Challenge 8: Roller Coaster Analysis
Those who submit an entry will be eligible to win a free copy of any book related to data analysis by Packt! The more weeks you participate, the more chances you get to win!
There is no deadline for submission.
Scenario
You are a data analyst and you have been tasked with uncovering insights about roller coasters. The data is very messy and will require some cleaning. Your manager would like the following questions answered.
Get the Data
Download dataset here.
Read in the data
url = 'https://raw.githubusercontent.com/kedeisha1/Challenges/main/coaster_db.csv'
Questions
- How many columns and rows are in the dataset?
- Is there any missing data?
- Display the summary statistics of the numeric columns using the describe method.
- Rename the following columns:
- coaster_name ➡️ Coaster_Name
- year_introduced ➡️ Year_Introduced
- opening_date_clean ➡️ Opening_Date
- speed_mph ➡️ Speed_mph
- height_ft ➡️ Height_ft
- Inversions_clean ➡️ Inversions
- Gforce_clean ➡️ Gforce
- Are there any duplicated rows?
- What are the top 3 years with the most roller coasters introduced?
- What is the average speed? Also display a plot to show it’s distribution.
- Explore the feature relationships. Are there any positively or negatively correlated relationships?
- Create your own question and answer it.
Submission Instructions
Place a comment below the post with your submission, then make a LinkedIn or twitter post with a screenshot or PDF of your answers. You can explain your thought process if you’d like. Just make sure to tag the Data in Motion LLC LinkedIn page or Twitter page.
Categories:Data Analysis Challenge