Your grade is determined by (1) an individual component and (2) a team project component.

For the individual component, you will demonstrate your proficiency of the presented topics in multiple choice quizzes in E-Learning.

For the team project component, you will work on a semester-long data science project using R. The goal of the project is to go through the complete data science process to answer questions you have about some topic of your own choice. You will acquire and preprocess the data, design your visualizations, run machine learning algorithms, and communicate your results.

Project Team

You work in a team of 3 to 5 students. In general, the grades for each group member will most likely be the same. However, if one team member evidently (a) did not contribute a fair share of the team’s work, (b) delivered poor or incomplete work, (c) missed deadlines, (d) did not assist team mates and/or (e) threatened to quit if the work became difficult, this team member will receive a lowered grade.

Project Milestones

There are a few milestones for your final project, see the table below. Please note that no extensions will be given for any of the project due dates for any reason. Projects submitted after the final due date will not be graded. Mandatory deliverables submitted after due date will be assessed as not submitted.
If you anticipate any issues, e.g. due to travel or health-related, you need to send an email at least one week in advance. There are several deliverables for your project that will be graded individually to make up your final project score:

Date Description
20.05 team formation & project proposal submission; registration
22.05 project proposal feedback
06.07 final project submission due
09.07 final presentations (exact date/time to be updated)

Any changes that you make to your GitHub repositories and webpages after the due date will be ignored. Please have all your work submitted and tested (websites, screencasts, etc.) before the deadline.

You have to complete this form to register for this course officially. After 20.05, registration is no longer possible.

Team formation and project proposal

You start your project by forming your groups. Submit a project proposal as R Markdown document where you describe what topic you are interested in exploring. Within your proposal, you must provide the following information:

  • Project title
  • Name(s) of team member(s)
  • Background and motivation
  • Project objectives
  • Name(s) of dataset(s) you use
  • Design overview (algorithms and methods you plan to use)
  • Time plan including distribution of responsibilities and workload among team members written as weekly deadlines

Each team will only need to submit one proposal. You will schedule a project review meeting with the course instructor shortly after submission (exact dates will be announced). Make sure all of your team members are present at the meeting.

Topic

In your project, you will work on a self-chosen dataset. Please consider that you have to use a dataset that hasn’t been studied extensively. For example, you shouldn’t use a dataset from the UCI ML repository or Kaggle.
Also, don’t use very small, trivial or toy datasets like iris or play golf. A list of websites with interesting datasets is provided on the FAQ site.

R Markdown process notebook

An important part of your project is your R Markdown process notebook. Your notebook details all your steps in developing your solution, including how you collected the data, alternative solutions you tried, describing machine learning algorithms/techniques you used, and the insights you got. It is strongly recommended to include many visualizations. Your process notebook should include the following topics:

  • Overview and motivation: overview of the project goals and the motivation for it
  • Related work: anything related, such as a paper, a website, a newspaper article or something else
  • Initial questions:
    • What questions are you trying to answer?
    • How did these questions evolve over the course of the project?
    • What new questions did you consider in the course of your analysis?
  • Data: source, scraping method, cleanup, storage, etc.
  • Exploratory data analysis:
    • What visualizations did you use to look at your data in different ways?
    • What are the different machine learning methods you considered?
    • Justify the decisions you made, and show any major changes to your ideas.
    • How did you reach these conclusions?
  • Final analysis:
    • What did you learn about the data?
    • How did you answer the questions?
    • How can you justify your answers?

Make sure that your process notebook is a standalone document that fully describes your process and results.

Code

You are expected you to write high-quality and readable R code, considering aspects such as reusability, error handling and documentation.

Project website

You will create a public website for your project using Google Sites, GitHub Pages, Netlify (using blogdown) or any other web hosting service of your choice. The web site should effectively summarize the main results of your project and tell a story. Consider your audience (the site is public) and keep the level of discussion at the appropriate level. Your R Markdown process notebook and data should be linked to the web site as well, either using a zip file, GitHub, Bitbucket, or another code hosting site. Also embed your main visualizations and your screencast in your website.

Project screencast

Each team will create a two minute screencast with narration showing a demo of your R Markdown notebook and/or some slides. There a various screencast software packages available, including Camtasia (30-day trial) for Windows & Mac and Bandicam (non-registered version with watermarks) for Windows. Please ensure a sufficient sound quality.
Upload the video to an online video-platform such as YouTube or Vimeo and embed it into your project web page. The video is shown as teaser before your final presentation in class. Focus the majority of your screencast on your main contributions rather than on technical details.

Final presentation

You will prepare a 20-min presentation on your project summarizing your project for your fellow students. The presentations will take place in the last two course weeks. Exact dates and times to be announced. You should fairly distribute the speech parts among all team members, i.e., there should not be a presentation where one team member does most or all of the talking.

Grading

Your overall course grade will be determined by the following components:

1. 30 pt: Weekly quizzes
2. 70 pt: Project:

  • 8 pt: Project proposal
  • 25 pt: Quality of R Markdown notebook with respect to correctness, comprehensibility and reproducibility
  • 10 pt: Complexity and level of difficulty of the project
  • 5 pt: Completeness and overall functionality of the repository and website
  • 8 pt: Screencast
  • 14 pt: Final presentation

→ Max. points total: 100

Mapping of points to grades

Grade 5.0 4.0 3.7 3.3 3.0 2.7 2.3 2.0 1.7 1.3 1.0
Points 0-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100

Project Submission Instructions

Each team must use a single shared GitHub repository1. If your work cannot be accessed because these directions are not followed correctly, this part will be considered as not submitted. You will need to specify your project GitHub URL in the project proposal form. Store the following in your GitHub repository:

  • R Markdown Notebook: your project process notebook
  • Data: include all the data that you used in your project. If the data is too large for GitHub store it on an external cloud storage provider, such as Dropbox, Google Drive or OneDrive.
  • README: the README.md file must give an overview of what you are handing in: your project notebook, data, and URLs to your project websites and screencast videos.

  1. If you are unfamiliar with Git, you may have a look at the ebook Happy Git with R.↩︎