Interactive Visualization of Euclidean Distance with R
Introduction
In this post, we will explore the concept of Euclidean distance and its visualization using interactive tools in R. We will delve into the world of clustering algorithms and examine how to visualize the results in an interactive manner. Our journey begins by understanding what Euclidean distance is and how it’s used in data analysis.
What is Euclidean Distance?
Euclidean distance, also known as straight-line distance or L2 distance, measures the distance between two points in a multi-dimensional space. It’s calculated using the Pythagorean theorem:
[d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + … + (z_n - z_m)^2}]
where (d) is the distance between two points (P(x_1, y_1, …, z_n)) and (Q(x_2, y_2, …, z_n)).
Euclidean distance is a fundamental concept in data analysis, machine learning, and computer science. It’s used to measure the similarity or dissimilarity between data points.
Euclidean Distance in R
To work with Euclidean distance in R, we can use the dist() function from the stats package:
# Load necessary libraries
library(stats)
# Create a sample dataset
df <- data.frame(x = runif(10), y = runif(10))
# Calculate Euclidean distances
distance <- dist(df[, c("x", "y")])
In this example, we create a sample dataset df with two columns x and y, and calculate the Euclidean distance between each pair of points using the dist() function.
Interactive Visualization
Now that we have a basic understanding of Euclidean distance in R, let’s explore interactive visualization tools. We will examine how to visualize Euclidean distances using different packages and libraries.
ggplotly() vs fviz_cluster()
We are given an example code snippet using both ggplotly() and fviz_cluster() for interactive visualization:
# Required libraries
library(tidyverse) # data manipulation
library(cluster) # clustering algorithms
library(factoextra) # clustering algorithms & visualization
library(plotly)
df <- USArrests
df <- na.omit(df)
df <- scale(df)
distance <- get_dist(df)
k2 <- kmeans(df, centers = 2, nstart = 25)
df %>% as_tibble() %>% mutate(cluster = k2$cluster,
state = row.names(USArrests)) %>%
fviz_cluster(k2, geom = "point", data = df) + ggtitle("k = 2")
However, we notice that the legend and hover text information have issues. Let’s break down what each part of this code does:
- We create a sample dataset
dfwith theUSArrestsdata from the built-in R package. - We remove any missing values using
na.omit(). - We scale the data using
scale(), which standardizes the variables by subtracting their means and dividing by their standard deviations. - We calculate the Euclidean distances between each pair of points using the
get_dist()function from theclusterpackage. - We perform k-means clustering on the scaled dataset using the
kmeans()function, specifying two centers (k = 2) and an initial number of iterations (nstart). - We mutate the data to add a new column called
cluster, which assigns each observation to one of the clusters.
The first part of the code uses ggplotly() for interactive visualization:
# ggplotly() example
p1 <- ggplot(df, aes(x = x, y = y)) +
geom_point(aes(color = cluster)) +
coord_equal() +
theme_classic()
However, we notice that the legend and hover text information have issues. Let’s explore other options for interactive visualization.
plotly()
We can use the plotly() package to create an interactive scatter plot:
# Required libraries
library(plotly)
# Create a sample dataset
df <- data.frame(x = runif(10), y = runif(10))
# Calculate Euclidean distances
distance <- dist(df[, c("x", "y")])
# Plot the points with their corresponding distances
plot_ly(df, x = ~x, y = ~y, mode = 'markers',
text = ~paste0('Distance: ', round(distance[as.integer(x)], 2)),
hoverinfo = 'text') %>%
layout(title = 'Euclidean Distance Visualization')
In this example, we create a sample dataset df with two columns x and y, calculate the Euclidean distance between each pair of points using the dist() function, and plot the points on an interactive scatter plot.
We also use the plotly() function to add text labels for each point showing its corresponding distance:
# Add hover information
hoverinfo <- 'text'
Finally, we customize the layout with a title.
Shiny App
To create an even more engaging and interactive visualization, we can build a Shiny app. A Shiny app is a R package that allows us to create web-based applications using R code:
# Required libraries
library(shiny)
# UI definition
ui <- fluidPage(
# Add interactive plotly chart
plotlyOutput('distPlot'),
# Add ui components
sidebarLayout(
sidebarPanel(
sliderInput('k', 'Number of Clusters:', min = 1, max = 5, value = 2),
actionButton('plot', 'Plot')
),
mainPanel(
plotlyOutput('distPlot')
)
)
)
# Server definition
server <- function(input, output) {
# Create reactive input and output functions
output$distPlot <- renderPlotly({
# Create the interactive plot using plotly
df <- USArrests
df <- na.omit(df)
df <- scale(df)
k2 <- kmeans(df, centers = input$k, nstart = 25)
df %>% as_tibble() %>% mutate(cluster = k2$cluster,
state = row.names(USArrests)) %>%
ggplot(aes(x = x, y = y, color = cluster)) +
geom_point() +
coord_equal() +
theme_classic()
})
# Create reactive input and output functions
output$distPlot <- renderPlotly({
# Calculate Euclidean distances
distance <- dist(df[, c("x", "y")])
# Plot the points with their corresponding distances
plot_ly(df, x = ~x, y = ~y, mode = 'markers',
text = ~paste0('Distance: ', round(distance[as.integer(x)], 2)),
hoverinfo = 'text') %>%
layout(title = 'Euclidean Distance Visualization')
})
}
# Run the Shiny app
shinyApp(ui = ui, server = server)
In this example, we define a UI with an interactive plotly chart using plotlyOutput(). We also add user input components such as sliders and buttons. When the user clicks on a button or moves the slider, the visualization updates in real-time.
Interactive Visualization of Euclidean Distance
We have explored various options for interactive visualization of Euclidean distance in R. Each option has its strengths and weaknesses:
ggplotly()is suitable for creating static plots that can be easily shared.plotly()allows for more dynamic visualizations with hover text and other features.- Shiny apps provide a comprehensive platform for building interactive web-based applications using R.
We hope this post has provided you with the knowledge to create your own interactive Euclidean distance visualization in R.
Last modified on 2024-10-03