R
(DataSciR)This is the website for the Data Science with R
(DataSciR) course offered in 2021. Click
here if you are looking for the 2020 course website.
27 July 2022: I am proud to announce that the paper “Data-Driven Prediction of Athletes’ Performance based on their Social Media Presence” by Frank Dreyer, Jannik Greif, Kolja Günther, Myra Spiliopoulou and Uli Niemann has been accepted for presentation as a full paper at the 25th International Conference on Discovery Science 2022 in Montpellier, France. This paper is a follow-up to the project “The Impact of NBA Player-related Social Media Posts on Their On-court Performance - An Analysis” of Frank, Jannik and Kolja as part of the Data Science with R course. Congratulations especially to the three students on this great achievement! 😀
The course is limited to max. 30 students. Please apply for DataSciR by completing the following tasks until 03. April 2021:
On 05.04.2021, you will be notified if you can attend the course.
After admission, please complete this registration form until 20.05.2021.
Ten years ago, who would have thought, that R
, the
“environment for
statistical computing and graphics”, would become one of the most
popular programming languages for data scientists?
The impressive
growth of R
is not a coincidence. As free &
open-source alternative to expensive & proprietary software like
SPSS, Matlab and Excel, R
’s strengths have always been its
capabilities for statistical data analysis as well as its
functionalities to create powerful, aesthetically appealing graphics and
charts.
While R
attracted a rather exclusively academic audience
in the 90’s & 00’s, the R
community since has grown not
only by sheer number but also in diversity, as people from different
industries and backgrounds discover R
’ usefulness for a
wide range of applications. As of February 2020, more than 15,000 (!)
packages have been published to CRAN, ca.
half of them since 2015.
Especially in the last decade, the functionality and versatility of
R
has gained momentum.
Among the
most popular R
packages are:
In Data Science with R (DataSciR), you will learn
fundamentals of R
and how to use the following packages for
Data Science:
tidyverse
” which includes packages like
dplyr
and tidyr
for data manipulation and
ggplot2
for data visualization,rmarkdown
and knitr
for reproducible &
automated reporting,shiny
for creating interactive web applications,
andtidymodels
for inferential and predictive
modeling.You will demonstrate your proficiency in these packages on a semester-long graded data science project.
# | Team Members | Title | Code Repo | Website | Presentation Date & Time |
---|---|---|---|---|---|
1 | Diana Guzman, Philipp Blüml | Analysis of Finance-related Reddit Communities | URL | URL | Fri, 09.07, 09:00 |
2 | Anish Kumar Singh, Priyanka Singh, Ramanpreet Kaur, Venkata Srinath Mannam | Classifying Whether Twitter Authors Spread Hate Using Supervised Learning | URL | URL | Fri, 09.07, 09:30 |
3 | Kiran Babu Thatha, Marcel Schulte, Obinna Patrick Nkwocha, Shweta Pandey, Thorben Hebbelmann | Multi-Perspective and Predictive Analysis of Forests: Issues, Challenges and Global Trends | URL | URL | Fri, 09.07, 10:00 |
4 | Ammar Ateeq, Muhammad Hashim Naveed, Sidra Aziz | Effect of COVID-19 Induced Lockdown on Mental Wellbeing of Students Attending Online Classes | URL | URL | Fri, 09.07, 10:30 |
5 | Indrani Sarkar, Indranil Maji, Michael Thane, Sharanya Hunasamaranahalli Thotadarya | Trends in Programming Language Popularity | URL | URL | Fri, 09.07, 11:00 |
6 | Madhuri Sajith, Usama Ashfaq, Vishnu Jayanand, Sujith Nyarakkad Sudhakaran, Ranjiraj Rajendran Nair | Behavioral and Psychological Distress of COVID-19 and Infodemics | URL | URL | Mon, 12.07, 09:00 |
7 | Jannik Greif, Kolja Günther, Frank Dreyer | The Impact of NBA Player-related Social Media Posts on Their On-court Performance - An Analysis | URL | URL | Fri, 16.07, 09:00 |
There are no mandatory prerequisites for DataSciR. However, you are expected to have a profound knowledge of fundamental data mining techniques, such as classification, regression and clustering. Hence, it is recommended that you have heard at least one of the following lectures (or comparable):
Also, you should have a basic programming and statistics knowledge.
For example, you will learn the most important vector types and classes
in R
, but you will not learn what a vector or a class is in
general. Accordingly, you should know what the terms mean, standard
deviation, probability, etc. mean.
Data Mining / Statistical Analysis:
R
-specific:
caret
package. Online documentation.other:
By the end of the first week, you should have installed the following software on your own laptop:
Also, please check whether you can successfully install packages. To
do so, click on the Packages tab in the bottom-right pane in
RStudio. Then, click on the Install button and specify an
arbitrary package, e.g. dplyr
. Finally, click on
Install. Alternatively, you can install a package from the
console with install.packages("dplyr")
. If everything is
set up correctly, no error messages should be displayed when you load
the installed package with library(dplyr)
.