This is the website for the Data Science with R (DataSciR) course offered in 2021. Click here if you are looking for the 2020 course website.

News

27 July 2022: I am proud to announce that the paper “Data-Driven Prediction of Athletes’ Performance based on their Social Media Presence” by Frank Dreyer, Jannik Greif, Kolja Günther, Myra Spiliopoulou and Uli Niemann has been accepted for presentation as a full paper at the 25th International Conference on Discovery Science 2022 in Montpellier, France. This paper is a follow-up to the project “The Impact of NBA Player-related Social Media Posts on Their On-court Performance - An Analysis” of Frank, Jannik and Kolja as part of the Data Science with R course. Congratulations especially to the three students on this great achievement! 😀

Administrative Information

  • Course day/time: Fridays, 9:15-10:45 (via Zoom)
  • Instructor: Uli Niemann
  • Course type: seminar
  • ECTS credits: 6
  • Audience: all FIN Master degree programs
  • Course language: english
  • Application: see section Application & registration
  • Prerequisites: see section Prerequisites
  • Grading: based on several deliverables in the context of a semester-long data science project → see Project page

Application & registration

The course is limited to max. 30 students. Please apply for DataSciR by completing the following tasks until 03. April 2021:

  1. Enroll to the course in the LSF.
  2. Send a short motivation email (not more than 300 words) to the course instructor () using the subject [DataSciR] Course application. Please send this email using your ovgu email address and please also give your matriculation number.

On 05.04.2021, you will be notified if you can attend the course.

After admission, please complete this registration form until 20.05.2021.

Course description

Ten years ago, who would have thought, that R, the “environment for statistical computing and graphics”, would become one of the most popular programming languages for data scientists?

The impressive growth of R is not a coincidence. As free & open-source alternative to expensive & proprietary software like SPSS, Matlab and Excel, R’s strengths have always been its capabilities for statistical data analysis as well as its functionalities to create powerful, aesthetically appealing graphics and charts.

While R attracted a rather exclusively academic audience in the 90’s & 00’s, the R community since has grown not only by sheer number but also in diversity, as people from different industries and backgrounds discover R’ usefulness for a wide range of applications. As of February 2020, more than 15,000 (!) packages have been published to CRAN, ca. half of them since 2015.

Especially in the last decade, the functionality and versatility of R has gained momentum. Among the most popular R packages are:

In Data Science with R (DataSciR), you will learn fundamentals of R and how to use the following packages for Data Science:

  • the “tidyverse” which includes packages like dplyr and tidyr for data manipulation and ggplot2 for data visualization,
  • rmarkdown and knitr for reproducible & automated reporting,
  • shiny for creating interactive web applications, and
  • tidymodels for inferential and predictive modeling.

You will demonstrate your proficiency in these packages on a semester-long graded data science project.

(Tentative) Schedule

Wk Date Topic Slides Videos Exer-cises Quiz-zes Additional Materials
1 05.-11.04 Welcome & Introduction Welcome, First tour of R & RStudio Welcome & Course Intro, First tour of R & RStudio Ex 1
2 12.-18.04 Visualizing data with ggplot2 Introduction to ggplot2, Effective visualizations Intro to ggplot2, Visualizing numerical and categorical data, Tips for effective vizualizations Ex 2, Code-along Q 1 R4DS chapters: Data Visualization, Exploratory Data Analysis, Graphics for communication. ggplot2 cheat sheet. Official ggplot2 book. curated list of ggplot2 resources and extensions. Keynote talk: Alberto Cairo - How Charts Lie: Getting Smarter About Visual Information
3 19.-25.04 The Tidyverse The Tidyverse Tidyverse intro & dplyr, Join functions, tidyr, readr&tibble Ex 3, Code-along Q 2 R4DS chapters: Data transformation, Tibbles, Data import, Tidy data, Relational data, Pipes. Cheat sheets: dplyr, Data Import (readr & tidyr).
4 26.04-02.05 R Markdown R Markdown R Markdown Ex 4, no code-along this week Q 3 R4DS chapters: R Markdown, R Markdown formats, R Markdown workflow. Quick Markdown tutorial. R Markdown cheat sheet. RStudio’s R Markdown reference guide. RStudio’s R Markdown Get Started Tutorial. List of all available Knitr chunk options. Book R Markdown: The Definitive Guide
5 03.-09.05 The R language: vectors, classes, functions, iteration Vectors, classes, functions, iteration Vectors, Classes, Functions, Iteration Ex 5, Code-along Q 4 R4DS chapters: Factors, Dates and times, Functions, Iteration
6 10.-16.05 Linear models Linear models Correlation, Linear regression Ex 6, Code-along Q 5 Tidy Modeling with R chapter: Fitting models with parsnip. R4DS chapters: Model basics, Model building, Many models
20.05 Deadline: project proposal submission & registration
7 17.-23.05 Data modeling with tidymodels tidymodels tidymodels Ex 7, Code-along Q 6 Tidymodels tutorial. Book Tidy Modeling with R.
8 24.-31.05 Creating web applications with shiny shiny shiny Ex 8 Q 7 RStudio’s “Learn Shiny” tutorial. Mastering Shiny. Shiny cheat sheet. “Awesome” Shiny Extensions.
9 01.06-07.06 Misc. topics Misc. topics Misc. topics
06.07 Deadline: project submission
09.07 Deadline: final presentation

Student Projects

# Team Members Title Code Repo Website Presentation Date & Time
1 Diana Guzman, Philipp Blüml Analysis of Finance-related Reddit Communities URL URL Fri, 09.07, 09:00
2 Anish Kumar Singh, Priyanka Singh, Ramanpreet Kaur, Venkata Srinath Mannam Classifying Whether Twitter Authors Spread Hate Using Supervised Learning URL URL Fri, 09.07, 09:30
3 Kiran Babu Thatha, Marcel Schulte, Obinna Patrick Nkwocha, Shweta Pandey, Thorben Hebbelmann Multi-Perspective and Predictive Analysis of Forests: Issues, Challenges and Global Trends URL URL Fri, 09.07, 10:00
4 Ammar Ateeq, Muhammad Hashim Naveed, Sidra Aziz Effect of COVID-19 Induced Lockdown on Mental Wellbeing of Students Attending Online Classes URL URL Fri, 09.07, 10:30
5 Indrani Sarkar, Indranil Maji, Michael Thane, Sharanya Hunasamaranahalli Thotadarya Trends in Programming Language Popularity URL URL Fri, 09.07, 11:00
6 Madhuri Sajith, Usama Ashfaq, Vishnu Jayanand, Sujith Nyarakkad Sudhakaran, Ranjiraj Rajendran Nair Behavioral and Psychological Distress of COVID-19 and Infodemics URL URL Mon, 12.07, 09:00
7 Jannik Greif, Kolja Günther, Frank Dreyer The Impact of NBA Player-related Social Media Posts on Their On-court Performance - An Analysis URL URL Fri, 16.07, 09:00

Prerequisites

There are no mandatory prerequisites for DataSciR. However, you are expected to have a profound knowledge of fundamental data mining techniques, such as classification, regression and clustering. Hence, it is recommended that you have heard at least one of the following lectures (or comparable):

Also, you should have a basic programming and statistics knowledge. For example, you will learn the most important vector types and classes in R, but you will not learn what a vector or a class is in general. Accordingly, you should know what the terms mean, standard deviation, probability, etc. mean.

Software

By the end of the first week, you should have installed the following software on your own laptop:

  1. R (>=4.0.0)
  2. RStudio (>=1.4)
  3. on Windows: Rtools

Also, please check whether you can successfully install packages. To do so, click on the Packages tab in the bottom-right pane in RStudio. Then, click on the Install button and specify an arbitrary package, e.g. dplyr. Finally, click on Install. Alternatively, you can install a package from the console with install.packages("dplyr"). If everything is set up correctly, no error messages should be displayed when you load the installed package with library(dplyr).