Title: | Additive Profile Clustering Algorithms |
---|---|
Description: | Obtain overlapping clustering models for object-by-variable data matrices using the Additive Profile Clustering (ADPROCLUS) method. Also contains the low dimensional ADPROCLUS method for simultaneous dimension reduction and overlapping clustering. For reference see Depril, Van Mechelen, Mirkin (2008) <doi:10.1016/j.csda.2008.04.014> and Depril, Van Mechelen, Wilderjans (2012) <doi:10.1007/s00357-012-9112-5>. |
Authors: | Henry Heppe [aut, cre, cph], Julian Rossbroich [aut], Jeffrey Durieux [aut], Tom Wilderjans [aut] |
Maintainer: | Henry Heppe <[email protected]> |
License: | GPL (>= 3) |
Version: | 2.0.0.9000 |
Built: | 2025-02-14 04:27:11 UTC |
Source: | https://github.com/henry-heppe/adproclus |
Yields an object of class adpc
, which can be printed, plotted and
summarized by the corresponding methods. Mandatory input are the membership
matrix and the profile matrix
(where the number of columns from
corresponds to
the number of rows in
),
if the object is to represent a full dimensional ADPROCLUS model.
For a low dimensional ADPROCLUS model, the matrices
and
have to be provided and
can
be inferred from those. All other inputs are optional but may be included
so that the output from the
summary(), print(), plot()
is complete.
For further details on the (low dimensional) ADPROCLUS model and
what every element of the objects means
see adproclus
and adproclus_low_dim
.
adpc( A, P, sse = NULL, totvar = NULL, explvar = NULL, iterations = NULL, timer = NULL, timer_one_run = NULL, initial_start = NULL, C = NULL, B = NULL, runs = NULL, parameters = NULL )
adpc( A, P, sse = NULL, totvar = NULL, explvar = NULL, iterations = NULL, timer = NULL, timer_one_run = NULL, initial_start = NULL, C = NULL, B = NULL, runs = NULL, parameters = NULL )
A |
Membership matrix A. |
P |
Profile matrix P. |
sse |
Sum of Squared Error. |
totvar |
Total variance. |
explvar |
Explained variance. |
iterations |
Number of iterations. |
timer |
Time needed to run the complete algorithm. |
timer_one_run |
Time to complete this single algorithm start. |
initial_start |
List containing type of start and
|
C |
Low dimensional profiles matrix C. |
B |
Matrix of base vectors connecting low dimensional components with original variables B. |
runs |
List of suboptimal models. |
parameters |
List of algorithm parameters. |
Object of class adpc
.
# Create the information needed for a minimal object of class adpc x <- stackloss result <- adproclus(x, 3) A <- result$A P <- result$P # Use constructor to obtain object of class adpc result_object <- adpc(A, P)
# Create the information needed for a minimal object of class adpc x <- stackloss result <- adproclus(x, 3) A <- result$A P <- result$P # Use constructor to obtain object of class adpc result_object <- adpc(A, P)
Perform additive profile clustering (ADPROCLUS) on object-by-variable data. Creates a model that assigns the objects to overlapping clusters which are characterized in terms of the variables by the so-called profiles.
adproclus( data, nclusters, start_allocation = NULL, nrandomstart = 3, nsemirandomstart = 3, algorithm = "ALS2", save_all_starts = FALSE, seed = NULL )
adproclus( data, nclusters, start_allocation = NULL, nrandomstart = 3, nsemirandomstart = 3, algorithm = "ALS2", save_all_starts = FALSE, seed = NULL )
data |
Object-by-variable data matrix of class |
nclusters |
Number of clusters to be used. Must be a positive integer. |
start_allocation |
Optional matrix of binary values as starting
allocation for first run. Default is |
nrandomstart |
Number of random starts (see |
nsemirandomstart |
Number of semi-random starts
(see |
algorithm |
Character string " |
save_all_starts |
Logical. If |
seed |
Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility. |
In this function, Mirkin's (1987, 1990) Additive Profile Clustering
(ADPROCLUS) method is used to obtain an unrestricted overlapping clustering
model of the object by variable data provided by data
.
The ADPROCLUS model approximates an object by
variable data matrix
by an
model matrix
that can be decomposed into an
binary
cluster membership matrix
and a
real-valued cluster profile matrix
, with
indicating the number of overlapping clusters.
In particular, the aim of an ADPROCLUS analysis is therefore,
given a number of clusters
, to estimate a
model matrix
which reconstructs the data matrix
as close as possible in a least squares sense
(i.e. sum of squared residuals). For a detailed illustration of the
ADPROCLUS model and associated loss function, see Wilderjans et al. (2011).
The alternating least squares algorithms ("ALS1
" and "ALS2
")
that can be used for minimization of the loss function were proposed by
Depril et al. (2008). In "ALS2
", starting from an initial random or
rational estimate of (see
get_random
and
get_semirandom
), and
are alternately re-estimated conditionally upon each other until convergence.
The "
ALS1
" algorithm differs from the previous one in that each
row in is updated independently and that the
conditionally optimal
is recalculated after each row
update, instead of the end of the matrix. For a discussion and comparison of
the different algorithms, see Depril et al., 2008.
Warning: Computation time increases exponentially with increasing
number of clusters, . We recommend to determine the computation time
of a single start for each specific dataset and
before increasing the
number of starts.
adproclus()
returns a list with the following
components, which describe the best model (from the multiple starts):
model
matrix. The obtained overlapping clustering model
M of the same size as data
.
A
matrix. The membership matrix A of the clustering model. Clusters are sorted by size.
P
matrix. The profile matrix P of the clustering model.
sse
numeric. The residual sum of squares of the clustering model, which is minimized by the ALS algorithm.
totvar
numeric. The total sum of squares
of data
.
explvar
numeric. The proportion of variance
in data
that is accounted for by the clustering model.
iterations
numeric. The number of algorithm iterations until convergence of the relevant single start.
timer_one_run
numeric. The amount of time (in seconds) the relevant single start ran for.
initial_start
list. Containing the initial
membership matrix, as well as the type of start that was used
to obtain the clustering solution. (as returned by get_random
or get_semirandom
)
runs
list. Each element represents one model obtained from one of the multiple starts. Each element contains all of the above information for the respective start.
parameters
list. Contains the parameters used for the model.
timer
numeric. The amount of time (in seconds) the complete algorithm ran for.
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & Depril, D. (2011S). ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices. Behavior Research Methods, 43(1), 56-65.
Depril, D., Van Mechelen, I., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics and Data Analysis, 52, 4923-4938.
Mirkin, B. G. (1987). The method of principal clusters. Automation and Remote Control, 10:131-143.
Mirkin, B. G. (1990). A sequential fitting procedure for linear data analysis models. Journal of Classification, 7(2):167-195.
adproclus_low_dim
for low dimensional ADPROCLUS
get_random
for generating random starts
get_semirandom
for generating semi-random starts
get_rational
for generating rational starts
# Loading a test dataset into the global environment x <- stackloss # Quick clustering with K = 2 clusters clust <- adproclus(data = x, nclusters = 2) # Clustering with K = 3 clusters, # using the ALS2 algorithm, # with 2 random and 2 semi-random starts clust <- adproclus(x, 3, nrandomstart = 2, nsemirandomstart = 2, algorithm = "ALS2" ) # Saving the results of all starts clust <- adproclus(x, 3, nrandomstart = 2, nsemirandomstart = 2, save_all_starts = TRUE ) # Clustering using a user-defined rational start profile matrix # (here the first 4 rows of the data) start <- get_rational(x, x[1:4, ])$A clust <- adproclus(x, 4, start_allocation = start)
# Loading a test dataset into the global environment x <- stackloss # Quick clustering with K = 2 clusters clust <- adproclus(data = x, nclusters = 2) # Clustering with K = 3 clusters, # using the ALS2 algorithm, # with 2 random and 2 semi-random starts clust <- adproclus(x, 3, nrandomstart = 2, nsemirandomstart = 2, algorithm = "ALS2" ) # Saving the results of all starts clust <- adproclus(x, 3, nrandomstart = 2, nsemirandomstart = 2, save_all_starts = TRUE ) # Clustering using a user-defined rational start profile matrix # (here the first 4 rows of the data) start <- get_rational(x, x[1:4, ])$A clust <- adproclus(x, 4, start_allocation = start)
Perform low dimensional additive profile clustering (ADPROCLUS) on object by variable data. Use case: data to cluster consists of a large set of variables, where it can be useful to interpret the cluster profiles in terms of a smaller set of components that represent the original variables well.
adproclus_low_dim( data, nclusters, ncomponents, start_allocation = NULL, nrandomstart = 3, nsemirandomstart = 3, save_all_starts = FALSE, seed = NULL )
adproclus_low_dim( data, nclusters, ncomponents, start_allocation = NULL, nrandomstart = 3, nsemirandomstart = 3, save_all_starts = FALSE, seed = NULL )
data |
Object-by-variable data matrix of class |
nclusters |
Number of clusters to be used. Must be a positive integer. |
ncomponents |
Number of components (dimensions) to which the profiles should be restricted. Must be a positive integer. |
start_allocation |
Optional matrix of binary values as starting
allocation for first run. Default is |
nrandomstart |
Number of random starts (see |
nsemirandomstart |
Number of semi-random starts
(see |
save_all_starts |
logical. If |
seed |
Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility |
In this function, an extension by Depril et al. (2012) of
Mirkins (1987, 1990) additive profile clustering method is used to obtain a
low dimensional overlapping clustering model of the object by variable data
provided by data
.
More precisely, the low dimensional ADPROCLUS model approximates an
object by variable data matrix
by an
model matrix
. For
overlapping
clusters,
can be decomposed into an
binary cluster membership matrix
and a
real-valued cluster profile matrix
s.t.
With the simultaneous dimension reduction,
is restricted
to be of reduced rank
, such that it can be decomposed
into
with
a
matrix and
a
matrix. Now, a row in
represents the profile values associated with the
respective cluster in terms of the
components, while
the entries of
can be used to interpret the components
in terms of the complete set of variables. In particular, the aim of an
ADPROCLUS analysis is therefore, given a number of clusters
and a
number of dimensions
, to estimate a model matrix
that reconstructs data matrix
as close as possible in a least squares sense and
simultaneously reduce the dimensions of the data.
For a detailed illustration of the low dimensional ADPROCLUS model and
associated loss function, see Depril et al. (2012).
Warning: Computation time increases exponentially with increasing
number of clusters, . We recommend to determine the computation time
of a single start for each specific dataset and
before increasing the
number of starts.
adproclus_low_dim()
returns a list with the following
components, which describe the best model (from the multiple starts):
model
matrix. The obtained overlapping clustering model
of the same size as
data
.
model_lowdim
matrix. The obtained low dimensional clustering
model of size
A
matrix. The membership matrix of the
clustering model. Clusters are sorted by size.
P
matrix. The profile matrix of the
clustering model.
c
matrix. The profile values in terms of the low dimensional components.
B
Variables-by-components matrix.
Base vectors connecting low dimensional components with original variables.
matrix. Warning: for computing
use
.
sse
numeric. The residual sum of squares of the clustering model, which is minimized by the ALS algorithm.
totvar
numeric. The total sum of squares
of data
.
explvar
numeric. The proportion of variance
in data
that is accounted for by the clustering model.
iterations
numeric. The number of algorithm iterations until convergence of the relevant single start.
timer_one_run
numeric. The amount of time (in seconds) the relevant single start ran for.
initial_start
list. A list containing the initial
membership matrix, as well as the type of start that was used
to obtain the clustering solution. (as returned by get_random
or get_semirandom
)
runs
list. Each element represents one model obtained from one of the multiple starts. Each element contains all of the above information.
parameters
list. Containing the parameters used for the model.
timer
numeric. The amount of time (in seconds) the complete algorithm ran for.
Depril, D., Van Mechelen, I., & Wilderjans, T. F. (2012). Lowdimensional additive overlapping clustering. Journal of classification, 29, 297-320.
adproclus
for full dimensional ADPROCLUS
get_random
for generating random starts
get_semirandom
for generating semi-random starts
get_rational
for generating rational starts
# Loading a test dataset into the global environment x <- stackloss # Low dimensional clustering with K = 3 clusters # where the resulting profiles can be characterized in S = 1 dimensions clust <- adproclus_low_dim(x, 3, ncomponents = 1)
# Loading a test dataset into the global environment x <- stackloss # Low dimensional clustering with K = 3 clusters # where the resulting profiles can be characterized in S = 1 dimensions clust <- adproclus_low_dim(x, 3, ncomponents = 1)
A computer generated object-by-variable dataset with an underlying nonrestricted overlapping clustering structure. For illustrative purposes within the ADPROCLUS package only.
CGdata
CGdata
A data frame with 100 rows and 15 variables
Obtain a cluster-by-variable dataframe where the values are the cluster means
for the given variables. Takes as input a (low dimensional) ADPROCLUS model
of class adpc
and a dataset. This dataset must have the same number
of rows as the cluster membership matrix $A$ of the model. The variables can
be different from the ones the model was trained on. The function uses the
cluster membership matrix of the model to computer per cluster the mean of
the variables in the dataset. In the output matrix of cluster means,
the last row Cl0
corresponds to the baseline cluster consisting
of all the observations that were not assigned to a cluster,
if this cluster is not empty. This function effectively computes column means
of the dataset separately for each cluster.
cluster_means(data, model, digits = 3)
cluster_means(data, model, digits = 3)
data |
Object-by-variable matrix. Can contain other variables than the ADPROCLUS model. IMPORTANT: The number of rows must be equal to the number of observations in the ADPROCLUS model. |
model |
ADPROCLUS solution (class: |
digits |
Integer. The number of decimal places that all decimal numbers will be rounded to. |
It is worth noting that the output of this function is different
from the last output matrix in the
summary()
method applied to an ADPROCLUS model.
The former computes the means over the original variable values
while the latter computes them over the approximated model variable values.
Cluster-by-variable dataframe where the values are the cluster means for the given variable.
# Obtain data, compute model, report cluster means x <- CGdata model <- adproclus(x, 3) cluster_means(data = x, model = model)
# Obtain data, compute model, report cluster means x <- CGdata model <- adproclus(x, 3) cluster_means(data = x, model = model)
Generate an initial random start for the (low dimensional) Additive Profile
Clustering algorithm (see adproclus
and
adproclus_low_dim
).
get_random(data, nclusters, seed = NULL)
get_random(data, nclusters, seed = NULL)
data |
Object-by-variable data matrix of class |
nclusters |
Number of clusters to be used. Must be a positive integer. |
seed |
Integer. Seed for the random number generator. Default: NULL |
get_random
generates a random initial binary membership matrix
A such that each entry is an independen draw from a
Bernoulli Distribution with .
For generating an initial start from random draws from the data, see
get_semirandom
.
For generating an initial start based on a specific set of initial cluster
centers, see get_rational
.
Warning: This function does not obtain an ADPRCOLUS model.
To perform aditive profile clustering, see adproclus
.
get_random()
returns a list with the following components:
type
A character string denoting the type of start ('Random Start')
A
A randomly generated initial Membership matrix
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & Depril, D. (2010). ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices. Behavior Research Methods, 43(1), 56-65.
Depril, D., Van Mechelen, I., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics and Data Analysis, 52, 4923-4938.
Depril, D., Van Mechelen, I., & Wilderjans, T. F. (2012). Lowdimensional additive overlapping clustering. Journal of classification, 29, 297-320.
adproclus
, adproclus_low_dim
for details about membership and profile matrices
get_semirandom
for generating semi-random starts
get_rational
for generating rational starts
# Obtain data from data set "Stackloss" and generate start allocation start_allocation <- get_random(stackloss, 3)$A
# Obtain data from data set "Stackloss" and generate start allocation start_allocation <- get_random(stackloss, 3)$A
If cluster profiles are given a priori, this function can be used to compute
the conditionally optimal cluster membership matrix A which can then be
used as a rational starting allocation for the (low dimensional) ADPROCLUS
procedure (see adproclus
and adproclus_low_dim
).
get_rational(data, starting_profiles)
get_rational(data, starting_profiles)
data |
Object-by-variable data matrix of class |
starting_profiles |
A matrix where each row represents the profile
values for a cluster. Needs to be of same dimensions as |
The function uses the same quadratic loss function and minimization method as
the (low dimensional) ADPROCLUS procedure does to find the next conditionally
optimal membership matrix A. (for details, see Depril et al., 2012). For the full
dimensional ADPROCLUS it uses the algorithm ALS2
and not ALS1
.
Warning: This function does not obtain an ADPRCOLUS model. To
perform additive profile clustering, see adproclus
.
get_rational()
returns a list with the following components:
type
A character string denoting the type of start ('Rational Start')
A
An initial Membership matrix
Depril, D., Van Mechelen, I., & Wilderjans, T. F. (2012). Lowdimensional additive overlapping clustering. Journal of classification, 29, 297-320.
adproclus
, adproclus_low_dim
for details about membership and profile matrices
get_random
for generating random starts
get_semirandom
for generating semi-random starts
# Obtain data from standard data set "Stackloss" x <- stackloss # Obtaining a user-defined rational start profile matrix # (here the first 4 rows of the data) start_allocation <- get_rational(x, x[1:4, ])$A
# Obtain data from standard data set "Stackloss" x <- stackloss # Obtaining a user-defined rational start profile matrix # (here the first 4 rows of the data) start_allocation <- get_rational(x, x[1:4, ])$A
Generate an initial semi-random start for the (low dimensional) Additive
Profile Clustering
algorithm (see adproclus
and adproclus_low_dim
).
get_semirandom(data, nclusters, seed = NULL)
get_semirandom(data, nclusters, seed = NULL)
data |
Object-by-variable data matrix of class |
nclusters |
Number of clusters to be used. Must be a positive integer. |
seed |
Integer. Seed for the random number generator. Default: NULL |
An initial cluster membership matrix is generated by
finding the best
conditional
on an initial profile matrix
generated by drawing k randomly chosen, distinct,
rows from
data
(for details, see Depril et al., 2012).
Warning: This function does not obtain an ADPRCOLUS model. To
perform aditive profile clustering, see adproclus
.
get_semirandom
returns a list with the following components:
type
A character string denoting the type of start ('Semi-random Start')
A
An initial Membership matrix
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & Depril, D. (2010). ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices. Behavior Research Methods, 43(1), 56-65.
Depril, D., Van Mechelen, I., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics and Data Analysis, 52, 4923-4938.
#' Depril, D., Van Mechelen, I., & Wilderjans, T. F. (2012). Lowdimensional additive overlapping clustering. Journal of classification, 29, 297-320.
adproclus
, adproclus_low_dim
for details about membership and profile matrices
get_random
for generating random starts
get_rational
for generating rational starts
# Obtain data from data set "Stackloss" and generate start allocation start_allocation <- get_semirandom(stackloss, 3)$A
# Obtain data from data set "Stackloss" and generate start allocation start_allocation <- get_semirandom(stackloss, 3)$A
Performs ADPROCLUS for the number of clusters from min_nclusters
to max_nclusters
.
This replaces the need to manually estimate multiple models to select the best
number of clusters and returns the results in a format compatible with
plot_scree_adpc
to obtain a scree plot.
Output is also compatible with select_by_CHull
to
automatically select a suitable number of clusters.
The compatibility with both functions is only given if
return_models = FALSE
.
mselect_adproclus( data, min_nclusters, max_nclusters, return_models = FALSE, unexplvar = TRUE, start_allocation = NULL, nrandomstart = 1, nsemirandomstart = 1, algorithm = "ALS2", save_all_starts = FALSE, seed = NULL )
mselect_adproclus( data, min_nclusters, max_nclusters, return_models = FALSE, unexplvar = TRUE, start_allocation = NULL, nrandomstart = 1, nsemirandomstart = 1, algorithm = "ALS2", save_all_starts = FALSE, seed = NULL )
data |
Object-by-variable data matrix of class |
min_nclusters |
Minimum number of clusters to estimate. |
max_nclusters |
Maximum number of clusters to estimate. |
return_models |
Boolean. If |
unexplvar |
Boolean. If |
start_allocation |
Optional starting cluster membership matrix to be
passed to the ADPROCLUS procedure. See |
nrandomstart |
Number of random starts computed for each model. |
nsemirandomstart |
Number of semi-random starts computed for each model. |
algorithm |
Character string " |
save_all_starts |
Logical. If |
seed |
Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility. |
Matrix with one column of SSE or unexplained variance scores for all estimated
models. Row names are the value of the cluster parameter for the relevant model.
Depends on the choice of return_models
.
If TRUE
a list of estimated models is returned.
adproclus
for the actual ADPROCLUS procedure
plot_scree_adpc
for plotting the model fits
select_by_CHull
for automatic model selection via CHull method
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 model_fits <- mselect_adproclus(data = x, min_nclusters = 1, max_nclusters = 4, seed = 10) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits)
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 model_fits <- mselect_adproclus(data = x, min_nclusters = 1, max_nclusters = 4, seed = 10) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits)
Performs low dimensional ADPROCLUS for the number of clusters from
min_nclusters
to max_nclusters
and the number of components
from min_ncomponents
to max_ncomponents
.
This replaces the need to manually estimate multiple models to select the best
number of clusters and components and returns the results in a format compatible with
plot_scree_adpc
to obtain a scree plot / multiple scree plots.
Output is also compatible with select_by_CHull
to
automatically select a suitable number of components for each number of clusters.
The compatibility with both functions is only given if
return_models = FALSE
.
mselect_adproclus_low_dim( data, min_nclusters, max_nclusters, min_ncomponents, max_ncomponents, return_models = FALSE, unexplvar = TRUE, start_allocation = NULL, nrandomstart = 1, nsemirandomstart = 1, save_all_starts = FALSE, seed = NULL )
mselect_adproclus_low_dim( data, min_nclusters, max_nclusters, min_ncomponents, max_ncomponents, return_models = FALSE, unexplvar = TRUE, start_allocation = NULL, nrandomstart = 1, nsemirandomstart = 1, save_all_starts = FALSE, seed = NULL )
data |
Object-by-variable data matrix of class |
min_nclusters |
Minimum number of clusters to estimate. |
max_nclusters |
Maximum number of clusters to estimate. |
min_ncomponents |
Minimum number of components to estimate.
Must be smaller or equal than |
max_ncomponents |
Maximum number of components to estimate.
Must be smaller or equal than |
return_models |
Boolean. If |
unexplvar |
Boolean. If |
start_allocation |
Optional starting cluster membership matrix to be
passed to the low dimensional ADPROCLUS procedure.
See |
nrandomstart |
Number of random starts computed for each model. |
nsemirandomstart |
Number of semi-random starts computed for each model. |
save_all_starts |
Logical. If |
seed |
Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility. |
Number of clusters by number of components matrix
where the values are SSE or unexplained variance scores for all estimated
models. Row names are the value of the cluster parameter for the relevant
model. Column names contain the value of the components parameter.
Depends on the choice of return_models
.
If TRUE
a list of estimated models is returned.
adproclus_low_dim
for the actual low dimensional ADPROCLUS procedure
plot_scree_adpc
for plotting the model fits
select_by_CHull
for automatic model selection via CHull method
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 # and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, seed = 1) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits)
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 # and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, seed = 1) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits)
Produce a representation of a (low dimensional) ADPROCLUS solution,
where each cluster is a vertex and the edge between two vertices represents
the overlap between the corresponding clusters.
The size of a vertex corresponds to the cluster size.
The overlap is represented through color, width and numerical label
of the edge.
The numerical edge labels can be relative
(number of overlap observations / total observations)
or absolute (number of observations in both clusters).
NOTE: This function can be called through the
plot(model, type = "network")
function with model an
object of class adpc
.
plot_cluster_network( model, title = NULL, relative_overlap = TRUE, filetype = NULL, filename = "network_plot", ... )
plot_cluster_network( model, title = NULL, relative_overlap = TRUE, filetype = NULL, filename = "network_plot", ... )
model |
ADPROCLUS solution (class: |
title |
String. Optional title. |
relative_overlap |
Logical. If |
filetype |
Optional. Choose type of file to save the plot.
Possible choices: |
filename |
Optional. Name of the file without extension. Default: "network_plot" |
... |
Additional arguments passing to the
|
Invisibly returns the input model.
# Loading a test dataset into the global environment x <- stackloss # Quick low dimensional clustering with K = 3 clusters and S = 1 dimensions clust <- adproclus_low_dim(x, 3, 1) # Plot the overlapping the clusters plot_cluster_network(clust)
# Loading a test dataset into the global environment x <- stackloss # Quick low dimensional clustering with K = 3 clusters and S = 1 dimensions clust <- adproclus_low_dim(x, 3, 1) # Plot the overlapping the clusters plot_cluster_network(clust)
Produce a representation of profile matrix
(or
for low dimensional solution) of an ADPROCLUS
solution of class
adpc
.
The plot displays the profiles in the style of a correlation plot.
NOTE: This function can also be called through the
plot(model, type = "profiles")
function with model an object of
class adpc
.
plot_profiles(model, title = NULL, label_color = "black", ...)
plot_profiles(model, title = NULL, label_color = "black", ...)
model |
Object of class |
title |
String. Optional title. |
label_color |
String. The color of the text labels. Default: "black" |
... |
Additional arguments passing to the
|
Invisibly returns the input model.
# Loading a test dataset into the global environment x <- stackloss # Quick clustering with K = 3 clusters clust <- adproclus(x, 3) # Plot the profile scores of each cluster plot_profiles(clust)
# Loading a test dataset into the global environment x <- stackloss # Quick clustering with K = 3 clusters clust <- adproclus(x, 3) # Plot the profile scores of each cluster plot_profiles(clust)
Used for scree-plot based model selection. Visualizes a set of ADPROClUS models
in terms of their number of clusters and model fit (SSE or unexplained variance).
For low dimensional ADPROCLUS models plots are made with the number of
components on the x-axis for each given number of clusters. One can then
choose to have them displayed all in one plot (grid = FALSE
) or next
to each other in separate plots (grid = TRUE
).
plot_scree_adpc(model_fit, title = NULL, grid = FALSE, digits = 3)
plot_scree_adpc(model_fit, title = NULL, grid = FALSE, digits = 3)
model_fit |
Matrix of SSE or unexplained variance scores as given by the
output of |
title |
String. Optional title. |
grid |
Boolean. |
digits |
Integer. The number of decimal places to display. |
Invisibly returns the ggplot2
object.
mselect_adproclus
to obtain the model_fit
input from the possible ADPROCLUS models
mselect_adproclus_low_dim
to obtain the model_fit
input from the possible low dimensional ADPROCLUS models
select_by_CHull
for automatic model selection via CHull method
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 model_fits <- mselect_adproclus(data = x, min_nclusters = 1, max_nclusters = 4, seed = 1) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits) # Estimating models with cluster parameter values ranging from 1 to 4 # and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, seed = 1) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits)
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 model_fits <- mselect_adproclus(data = x, min_nclusters = 1, max_nclusters = 4, seed = 1) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits) # Estimating models with cluster parameter values ranging from 1 to 4 # and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, seed = 1) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits)
To be used when one has selected a number of components for each number
of clusters. Plots the remaining sets of models to compare SSE or unexplained
variances. The input model_fit
is supposed to be the output from the
select_by_CHull
function applied to the output from
the mselect_adproclus_low_dim
function.
plot_scree_adpc_preselected(model_fit, title = NULL, digits = 3)
plot_scree_adpc_preselected(model_fit, title = NULL, digits = 3)
model_fit |
Matrix with SSE or unexplained variance values.
Can be obtained from |
title |
String. Optional title. |
digits |
Integer. The number of decimal places to display. |
Returns the ggplot2
object.
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 # and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, seed = 1) # Choosing for each number of cluster the best number of components model_fits_preselected <- select_by_CHull(model_fits) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc_preselected(model_fits_preselected)
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 # and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, seed = 1) # Choosing for each number of cluster the best number of components model_fits_preselected <- select_by_CHull(model_fits) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc_preselected(model_fits_preselected)
Produce a representation of variable to component matrix
of a low dimensional ADPROCLUS solution
of class
adpc
. The plot displays the scores in the style of a
correlation plot.
NOTE: This function can be called through the
plot(model, type = "vars_by_comp")
function
with model an object of class adpc
.
plot_vars_by_comp(model, title = NULL, label_color = "black", ...)
plot_vars_by_comp(model, title = NULL, label_color = "black", ...)
model |
Object of class |
title |
String. Optional title. |
label_color |
String. The color of the text labels. Default: "black" |
... |
Additional arguments passing to the
|
Invisibly returns the input model.
# Loading a test dataset into the global environment x <- stackloss # Quick low dimensional clustering with K = 3 clusters and S = 1 dimensions clust <- adproclus_low_dim(x, 3, 1) # Plot the matrix B', connecting components with variables plot_vars_by_comp(clust)
# Loading a test dataset into the global environment x <- stackloss # Quick low dimensional clustering with K = 3 clusters and S = 1 dimensions clust <- adproclus_low_dim(x, 3, 1) # Plot the matrix B', connecting components with variables plot_vars_by_comp(clust)
When passing a (low dimensional) ADPROCLUS solution of class adpc
to
the generic plot()
, this method plots the solution in one of the
following three ways:
Each cluster is a vertex and the edge between two vertices represents the overlap between the corresponding clusters. The size of a vertex corresponds to the cluster size. The overlap is represented through color, width and numerical label of the edge. The numerical edge-labels can be relative (number of overlap observations / total observations) or absolute (number of observations in both clusters).
Plot the profile matrix (
for full dimensional model,
for low dimensional model)
in the style of a correlation plot to visualize the relation of each cluster
with each variable.
Plot the low dimensional
component-by-variable matrix in the style of a
correlation plot to visualize the relation of each component with each
original variable. NOTE: Only works for low dimensional ADPROCLUS.
## S3 method for class 'adpc' plot(x, type = "network", ...)
## S3 method for class 'adpc' plot(x, type = "network", ...)
x |
Object of class |
type |
Choice for type of plot: one of |
... |
additional arguments will be passed on to the functions
|
Invisibly returns the input model.
# Loading a test dataset into the global environment x <- stackloss # Quick low dimensional clustering with K = 3 clusters and S = 1 dimensions clust <- adproclus_low_dim(x, 3, 1) # Produce three plots of the model plot(clust, type = "network") plot(clust, type = "profiles") plot(clust, type = "vars_by_comp")
# Loading a test dataset into the global environment x <- stackloss # Quick low dimensional clustering with K = 3 clusters and S = 1 dimensions clust <- adproclus_low_dim(x, 3, 1) # Produce three plots of the model plot(clust, type = "network") plot(clust, type = "profiles") plot(clust, type = "vars_by_comp")
For an object of class adpc
as input, this method prints basic
information about the ADPROCLUS solution represented by the object.
Works for both full and low dimensional solutions. Adjust the parameters
digits, matrix_rows, matrix_cols
to change the level of detail printed.
## S3 method for class 'adpc' print( x, title = "ADPROCLUS solution", digits = 3, matrix_rows = 10, matrix_cols = 15, ... )
## S3 method for class 'adpc' print( x, title = "ADPROCLUS solution", digits = 3, matrix_rows = 10, matrix_cols = 15, ... )
x |
ADPROCLUS solution (class: |
title |
String. Default: "ADPROCLUS solution" |
digits |
Integer. The number of decimal places that all decimal numbers will be rounded to. |
matrix_rows |
Integer. The number of matrix rows to display. OPTIONAL |
matrix_cols |
Integer. The number of matrix columns to display. OPTIONAL |
... |
ignored |
No return value, called for side effects.
# Obtain data, compute model, print model x <- stackloss model <- adproclus(x, 3) print(model)
# Obtain data, compute model, print model x <- stackloss model <- adproclus(x, 3) print(model)
Prints an object of class summary.adpc
to represent and summarize a
(low dimensional) ADPROCLUS solution. A number of parameters for how the
results should be printed can be passed as an argument to
summary.adpc()
which then passes it on to this method. This method
does not take a model of class adpc
directly as input.
## S3 method for class 'summary.adpc' print(x, ...)
## S3 method for class 'summary.adpc' print(x, ...)
x |
Object of class |
... |
ignored |
Invisibly returns object of class summary.adpc
.
# Obtain data, compute model, print summary of model x <- stackloss model <- adproclus(x, 3) print(summary(model))
# Obtain data, compute model, print summary of model x <- stackloss model <- adproclus(x, 3) print(summary(model))
For a set of full dimensional ADPROCLUS models (each with different number of clusters),
this function finds the "elbow" in the scree plot by using the
CHull procedure (Wilderjans, Ceuleman & Meers, 2013) implemented in
the multichull
package.
For a matrix of low dimensional ADPROCLUS models
(each with different number of cluster and components),
this function finds the "elbow" in the scree plot for each
number of clusters with the CHull methods.
That is, it reduces the number of model to choose from to the number of
different cluster parameter values by choosing the "elbow" number of
components for a given number of clusters. The resulting list can in turn
be visualized with plot_scree_adpc_preselected
.
For this procedure to work, the SSE or unexplained variance values must be
decreasing in the number of clusters (components). If that is not the case
increasing the number of (semi-) random starts can help.
select_by_CHull(model_fit, percentage_fit = 1e-04, ...)
select_by_CHull(model_fit, percentage_fit = 1e-04, ...)
model_fit |
Matrix containing SSEs or unexplained variance of all models
as in the output of |
percentage_fit |
Required proportion of increase in fit of a more complex model. |
... |
Additional parameters to be passed on to |
This procedure cannot choose the model with the largest or smallest number of clusters (components), i.e. for a set of three models it will always choose the middle one. If for a given number of clusters exactly two models were estimated, this function chooses the model with the lower SSE/unexplained variance.
The name of the model fit criterion is propagated from the input matrix based on the first column name. It is either "SSE" or "Unexplained_Variance".
For full dimensional ADPROCLUS a CHull
object describing the
chosen model.
For low dimensional ADPROCLUS a matrix containing the list of chosen models
and the relevant model parameter, compatible with
plot_scree_adpc_preselected
.
Wilderjans, T. F., Ceulemans, E., & Meers, K. (2012). CHull: A generic convex hull based model selection method. Behavior Research Methods, 45, 1-15
mselect_adproclus
to obtain the model_fit
input from the possible ADPROCLUS models
mselect_adproclus_low_dim
to obtain the model_fit
input from the possible low dimensional ADPROCLUS models
plot_scree_adpc
for plotting the model fits
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 model_fits <- mselect_adproclus(data = x, min_nclusters = 1, max_nclusters = 4) # Use and visualize CHull method selected_model <- select_by_CHull(model_fits) selected_model plot(selected_model) # Estimating low dimensional models with cluster parameter values # ranging from 1 to 4 and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, nsemirandomstart = 10, seed = 1) # Using the CHull method pre_selection <- select_by_CHull(model_fits) # Visualize pre-selected models plot_scree_adpc_preselected(pre_selection)
# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 model_fits <- mselect_adproclus(data = x, min_nclusters = 1, max_nclusters = 4) # Use and visualize CHull method selected_model <- select_by_CHull(model_fits) selected_model plot(selected_model) # Estimating low dimensional models with cluster parameter values # ranging from 1 to 4 and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, nsemirandomstart = 10, seed = 1) # Using the CHull method pre_selection <- select_by_CHull(model_fits) # Visualize pre-selected models plot_scree_adpc_preselected(pre_selection)
For an object of class adpc
as input, this method yields a summary
object of class summary.adpc
including group characteristics of the
clusters in the solution in terms of the model variables.
Works for both full and low dimensional solutions.
Adjust the parameters digits, matrix_rows, matrix_cols
to change the
level of detail for the printing of the summary.
## S3 method for class 'adpc' summary( object, title = "ADPROCLUS solution", digits = 3, matrix_rows = 10, matrix_cols = 5, ... )
## S3 method for class 'adpc' summary( object, title = "ADPROCLUS solution", digits = 3, matrix_rows = 10, matrix_cols = 5, ... )
object |
ADPROCLUS solution (class: |
title |
String. Default: "ADPROCLUS solution" |
digits |
Integer. The number of decimal places that all decimal numbers will be rounded to. |
matrix_rows |
Integer. The number of matrix rows to display. OPTIONAL |
matrix_cols |
Integer. The number of matrix columns to display. OPTIONAL |
... |
ignored |
Invisibly returns object of class summary.adpc
.
# Obtain data, compute model, summarize model x <- stackloss model <- adproclus(x, 3) model_summary <- summary(model)
# Obtain data, compute model, summarize model x <- stackloss model <- adproclus(x, 3) model_summary <- summary(model)