tocher performs the Tocher (Rao, 1952) optimization clustering from a distance matrix. The cophenetic distance matrix for a Tocher's clustering can also be computed using the methodology proposed by Silva \& Dias (2013).

# S3 method for dist
tocher(d, algorithm = c("original", "sequential"))
# S3 method for tocher
print(x, ...)
# S3 method for tocher
cophenetic(x)

Arguments

d

an object of class "dist".

algorithm

a character indicating the algorithm to be used for clustering objects. It must be one of the two: "original" (default) or "sequential". The latter is the method proposed by Vasconcelos et al. (2007), and sometimes called "modified" Tocher.

x

an object of class "tocher".

...

optional further arguments from print.

Value

An object of class tocher. A list of

call

the call which produced the result.

algorithm

character; the algorithm that has been used as input.

clusters

a list of length k (the number of clusters), containing the labels of the objects in d for each cluster.

class

a numeric vector indicating the class (the cluster) of each object in d.

criterion

a numeric vector containing the clustering criteria - the greatest amongst the smallest distances involving each object in d. If algorithm = "original", this vector contains an unique value, i.e., the same criterion is used for every clustering step.

distClust

a matrix of distances within (diagonal) and between (off-diagonal) clusters.

d

the input object.

Warning

Clustering a large number of objects (say 300 or more) can be time demanding.

References

Cruz, C.D.; Ferreira, F.M.; Pessoni, L.A. (2011) Biometria aplicada ao estudo da diversidade genetica. Visconde do Rio Branco: Suprema.

Rao, R.C. (1952) Advanced statistical methods in biometric research. New York: John Wiley & Sons.

Sharma, J.R. (2006) Statistical and biometrical techniques in plant breeding. Delhi: New Age International.

Silva, A.R. & Dias, C.T.S. (2013) A cophenetic correlation coefficient for Tocher's method. Pesquisa Agropecuaria Brasileira, 48:589-596.

Vasconcelos, E.S.; Cruz, C.D.; Bhering, L.L.; Resende Junior, M.F.R. (2007) Alternative methodology for the cluster analysis. Pesquisa Agropecuaria Brasileira, 42:1421-1428.

Author

Anderson Rodrigo da Silva <anderson.agro@hotmail.com>

See also

Examples

# example 1 data(garlicdist) (garlic <- tocher(garlicdist))
#> #> Tocher's Clustering #> #> Call: tocher.dist(d = garlicdist) #> #> Cluster algorithm: original #> Number of objects: 17 #> Number of clusters: 6 #> Most contrasting clusters: cluster 3 and cluster 5, with #> average intercluster distance: 11.78786 #> #> $`cluster 1` #> [1] 8 9 12 4 10 2 7 15 #> #> $`cluster 2` #> [1] 1 6 14 #> #> $`cluster 3` #> [1] 11 13 #> #> $`cluster 4` #> [1] 3 5 #> #> $`cluster 5` #> [1] 16 #> #> $`cluster 6` #> [1] 17 #>
garlic$distClust # cluster distances
#> cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6 #> cluster 1 1.745434 4.333530 3.264753 7.070493 8.816863 3.045773 #> cluster 2 4.333530 1.930265 7.525301 4.156222 3.476651 3.560654 #> cluster 3 3.264753 7.525301 2.317785 8.019206 11.787861 6.596850 #> cluster 4 7.070493 4.156222 8.019206 2.324152 4.043741 8.484307 #> cluster 5 8.816863 3.476651 11.787861 4.043741 0.000000 5.441962 #> cluster 6 3.045773 3.560654 6.596850 8.484307 5.441962 0.000000
# example 2 data(USArrests) (usa <- tocher(dist(USArrests)))
#> #> Tocher's Clustering #> #> Call: tocher.dist(d = dist(USArrests)) #> #> Cluster algorithm: original #> Number of objects: 50 #> Number of clusters: 8 #> Most contrasting clusters (first 4 objects): #> cluster 7 (size 2) cluster 8 (size 1) #> 1 Florida Hawaii #> 2 North Carolina #> ... with average intercluster distance: 291.5144
usa$distClust
#> cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6 cluster 7 #> cluster 1 26.14235 63.73447 189.02239 134.11401 225.11576 103.81282 273.19341 #> cluster 2 63.73447 27.40611 129.52450 74.90038 165.32046 45.33134 214.23667 #> cluster 3 189.02239 129.52450 28.01131 59.47827 44.18989 91.36375 89.73627 #> cluster 4 134.11401 74.90038 59.47827 25.12962 93.97672 40.87804 142.65993 #> cluster 5 225.11576 165.32046 44.18989 93.97672 26.66918 126.36387 55.76879 #> cluster 6 103.81282 45.33134 91.36375 40.87804 126.36387 25.32464 175.26789 #> cluster 7 273.19341 214.23667 89.73627 142.65993 55.76879 175.26789 38.52791 #> cluster 8 41.09477 79.71761 206.11320 151.11604 241.91103 118.42070 291.51441 #> cluster 8 #> cluster 1 41.09477 #> cluster 2 79.71761 #> cluster 3 206.11320 #> cluster 4 151.11604 #> cluster 5 241.91103 #> cluster 6 118.42070 #> cluster 7 291.51441 #> cluster 8 0.00000
# cophenetic correlation cophUS <- cophenetic(usa) cor(cophUS, dist(USArrests))
#> [1] 0.9662597
# using the sequential algorithm (usa2 <- tocher(dist(USArrests), algorithm = "sequential"))
#> #> Tocher's Clustering #> #> Call: tocher.dist(d = dist(USArrests), algorithm = "sequential") #> #> Cluster algorithm: sequential #> Number of objects: 50 #> Number of clusters: 4 #> Most contrasting clusters (first 4 objects): #> cluster 3 (size 19) cluster 4 (size 1) #> 1 Illinois Hawaii #> 2 New York #> 3 Michigan #> 4 Louisiana #> ... with average intercluster distance: 217.3266
usa2$criterion
#> [1] 38.52791 59.93071 155.29601 155.29601
# example 3 data(eurodist) (euro <- tocher(eurodist))
#> #> Tocher's Clustering #> #> Call: tocher.dist(d = eurodist) #> #> Cluster algorithm: original #> Number of objects: 21 #> Number of clusters: 6 #> Most contrasting clusters: cluster 4 and cluster 5, with #> average intercluster distance: 3587 #> #> $`cluster 1` #> [1] Geneva Lyons Milan Marseilles #> [5] Paris Munich Brussels Calais #> [9] Hook of Holland Cologne Cherbourg #> #> $`cluster 2` #> [1] Copenhagen Hamburg Stockholm #> #> $`cluster 3` #> [1] Barcelona Madrid #> #> $`cluster 4` #> [1] Gibraltar Lisbon #> #> $`cluster 5` #> [1] Athens Rome #> #> $`cluster 6` #> [1] Vienna #>
euro$distClust
#> cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6 #> cluster 1 689.400 1337.3333 1324.545 2071.00 2053.045 1127.273 #> cluster 2 1337.333 686.3333 2498.000 3142.00 2781.333 1571.667 #> cluster 3 1324.545 2498.0000 636.000 960.75 2704.750 2105.500 #> cluster 4 2071.000 3142.0000 960.750 676.00 3587.000 2955.500 #> cluster 5 2053.045 2781.3333 2704.750 3587.00 817.000 1600.000 #> cluster 6 1127.273 1571.6667 2105.500 2955.50 1600.000 0.000
# End (not run)