**Select a categorical variable from the 'Studentsbiodata'. Write a brief summary which compares 'Height' with respect to the variables of your choice. Make sure to include appropriate summary statistics and graphs (be sure to label axes, and give the plot an appropriate and descriptive title that includes the name of the of graph you created).**

Here's the 'Studentsbiodata' file (To download, right click and select 'save link as').

Here's the 'Studentsbiodata' file (To download, right click and select 'save link as').

The **Answer to the Question**

is below this banner.

**Here's the Solution to this Question**

## Solution Using R Programming

This question was solved using R Studio.

We need to install two packages on R-Studio: xlsx (which allows the use of excel files on R programming) and dplyr (for easy data summary capability).

First, we will check if the two packages are already installed.

The code below checks if "xlsx" is installed. If installed, the output will be `TRUE`

, otherwise, it'll be `FALSE`

.

```
any(grepl("xlsx", installed.packages()))
```

The code below checks if "dplyr" is installed. If installed, the output will be `TRUE`

, otherwise, it'll be `FALSE`

.

```
any(grepl("dplyr", installed.packages()))
```

If your output is `TRUE`

for both, you can skip this part. If it's false, run the codes below to install the two packages.

To install "xlsx":

```
install.packages("xlsx")
```

To install "dplyr":

```
install.packages("dplyr")
```

After the installation, to use both packages, you need to **load** them into the current runtime:

Use the code below to load the "xlsx" package:

```
library(readxl)
```

Use the code below to load the "dplyr" package:

```
library("dplyr")
```

Having handled that, we can proceed to answer the question.

In the first part, we will select a categorical variable from the studentbiodata. For this solution, we will use "Drink".

Next, we are required to summarize the table comparing the quantitative variable, **Height,** to the categorical variable we chose, **Drink**.

Essentially what this means is to look for the mean, median, min, max, etc. of students' height to what they drink./p>

The code below solves that:

```
temp_table <- studentsbiodata %>%
select(Drink, Height) %>%
group_by(Drink) %>%
summarise(Total = n(),
min = min(Height, na.rm = TRUE),
max = max(Height, na.rm = TRUE),
mean = mean(Height, na.rm = TRUE),
median = median(Height, na.rm = TRUE),
std = sd(Height, na.rm = TRUE),
unique_height = n_distinct(Height, na.rm = TRUE)
)
```

Below is the output of the resultant table:

Next, we need to graph the resulting table from the summary.

To do that, we cannot use the `temp_table`

as it is. The non-numerical columns should be removed otherwise the graph will not plot, rather, it will produce the error below:

Error in -0.01 * height : non-numeric argument to binary operator

To filter the `temp_table`

, run the code below:

```
filtered_temp_table <- temp_table %>% select(Total, min, max, mean, median, std, unique_height)
```

The output will be:

Now, we can use the new table `filtered_temp_table`

to plot the bar chart graph:

Use the code below to plot the bar chart in R programming:

```
barplot(as.matrix(filtered_temp_table), main="Graphical Comparison of Height with Respect to the Drinks", xlab="Variable Comparison", ylab="Height Frequency", col=c("yellow", "purple", "green"), beside=TRUE, width=.3)
```

The output will be:

To add more details to the graph, use the code below:

```
legend("topright", c("Milk","Pop", "Water"), fill=c("yellow", "purple", "green"))
```

The final output of the bar chart will be: