Your grade is determined by (1) an individual component and (2) a team project component.
For the individual component, you will demonstrate your proficiency of the presented topics in multiple choice quizzes in E-Learning.
For the team project component, you will work on a semester-long
data science project using
R. The goal of
the project is to go through the complete data science process to answer
questions you have about some topic of your own choice.
You will acquire and preprocess the data, design your visualizations,
run machine learning algorithms, and communicate your results.
You work in a team of 3 to 5 students. In general, the grades for each group member will most likely be the same. However, if one team member evidently (a) did not contribute a fair share of the team’s work, (b) delivered poor or incomplete work, (c) missed deadlines, (d) did not assist team mates and/or (e) threatened to quit if the work became difficult, this team member will receive a lowered grade.
There are a few milestones for your final project, see the table
below. Please note that no extensions will be given for
any of the project due dates for any reason. Projects submitted after
the final due date will not be graded. Mandatory deliverables submitted
after due date will be assessed as not submitted.
If you anticipate any issues, e.g. due to travel or health-related, you need to send an email at least one week in advance. There are several deliverables for your project that will be graded individually to make up your final project score:
|20.05||team formation & project proposal submission; registration|
|22.05||project proposal feedback|
|06.07||final project submission due|
|09.07||final presentations (exact date/time to be updated)|
Any changes that you make to your GitHub repositories and webpages after the due date will be ignored. Please have all your work submitted and tested (websites, screencasts, etc.) before the deadline.
You have to complete this form to register for this course officially. After 20.05, registration is no longer possible.
You start your project by forming your groups. Submit a project proposal as R Markdown document where you describe what topic you are interested in exploring. Within your proposal, you must provide the following information:
Each team will only need to submit one proposal. You will schedule a project review meeting with the course instructor shortly after submission (exact dates will be announced). Make sure all of your team members are present at the meeting.
In your project, you will work on a self-chosen dataset. Please
consider that you have to use a dataset that hasn’t been studied
extensively. For example, you shouldn’t use a dataset from the UCI ML
repository or Kaggle.
Also, don’t use very small, trivial or toy datasets like iris or play golf. A list of websites with interesting datasets is provided on the FAQ site.
An important part of your project is your R Markdown process notebook. Your notebook details all your steps in developing your solution, including how you collected the data, alternative solutions you tried, describing machine learning algorithms/techniques you used, and the insights you got. It is strongly recommended to include many visualizations. Your process notebook should include the following topics:
Make sure that your process notebook is a standalone document that fully describes your process and results.
You are expected you to write high-quality and readable
R code, considering aspects such as reusability, error
handling and documentation.
You will create a public website for your project using Google Sites, GitHub Pages, Netlify (using blogdown) or any other web hosting service of your choice. The web site should effectively summarize the main results of your project and tell a story. Consider your audience (the site is public) and keep the level of discussion at the appropriate level. Your R Markdown process notebook and data should be linked to the web site as well, either using a zip file, GitHub, Bitbucket, or another code hosting site. Also embed your main visualizations and your screencast in your website.
Each team will create a two minute screencast with
narration showing a demo of your R Markdown notebook and/or
some slides. There a various screencast software packages available,
(30-day trial) for Windows & Mac and Bandicam (non-registered version
with watermarks) for Windows. Please ensure a sufficient sound
Upload the video to an online video-platform such as YouTube or Vimeo and embed it into your project web page. The video is shown as teaser before your final presentation in class. Focus the majority of your screencast on your main contributions rather than on technical details.
You will prepare a 20-min presentation on your project summarizing your project for your fellow students. The presentations will take place in the last two course weeks. Exact dates and times to be announced. You should fairly distribute the speech parts among all team members, i.e., there should not be a presentation where one team member does most or all of the talking.
Your overall course grade will be determined by the following components:
1. 30 pt: Weekly quizzes
2. 70 pt: Project:
→ Max. points total: 100
Each team must use a single shared GitHub repository1. If your work cannot be accessed because these directions are not followed correctly, this part will be considered as not submitted. You will need to specify your project GitHub URL in the project proposal form. Store the following in your GitHub repository:
The project assignment description was inspired from the Harvard course Introduction to Data Science: BST 260.