Downloads & Free Reading Options - Results

Data Clustering by Guojun Gan

Read "Data Clustering" by Guojun Gan through these free online access and download options.

Search for Downloads

Search by Title or Author

Books Results

Source: The Internet Archive

The internet Archive Search Results

Available books for downloads and borrow from The internet Archive

1Macrostate Data Clustering

By

We develop an effective nonhierarchical data clustering method using an analogy to the dynamic coarse graining of a stochastic system. Analyzing the eigensystem of an interitem transition matrix identifies fuzzy clusters corresponding to the metastable macroscopic states (macrostates) of a diffusive system. A "minimum uncertainty criterion" determines the linear transformation from eigenvectors to cluster-defining window functions. Eigenspectrum gap and cluster certainty conditions identify the proper number of clusters. The physically motivated fuzzy representation and associated uncertainty analysis distinguishes macrostate clustering from spectral partitioning methods. Macrostate data clustering solves a variety of test cases that challenge other methods.

“Macrostate Data Clustering” Metadata:

  • Title: Macrostate Data Clustering
  • Authors:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 13.93 Mbs, the file-s for this book were downloaded 68 times, the file-s went public at Wed Sep 18 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Macrostate Data Clustering at online marketplaces:


2Microsoft Research Video 104362: Dealing With Data: Classification, Clustering And Ranking

By

This talk will be focused on the following three pieces of work that we have done: How to utilize unlabeled data in classification? In many real-world machine learning problems, such as web categorization, only few labeled examples can be available since labeling needs human labor, and unlabeled data are far easy to obtain. So, naturally, one may wonder if we can utilize unlabeled data in our classification tasks. I will present a simple, powerful and mathematically clean approach to this problem, and demonstrate its good experimental results provided by the third party on a number of machine learning benchmarks. Our approach has been considered as state of the art in machine learning literature. How to partition directed graphs like the Web? Spectral clustering for undirected graphs has been being extensively studied since a mathematician Fiedler’s seminal work in 1970’s. The spectral method is so powerful that many people have attempted to generalize it to directed graphs. Among them the most popular one is perhaps Jon Kleinberg’s HITS algorithm for both ranking web pages and detecting web communities. In 2003, Monika Henzinger, the former research director at Google Inc., listed this generalization issue as one of six algorithmic challenges in web search engines. I will show how we thoroughly solve this problem via Markov chain theory, and also the application of our approach to real-world web data. This approach can be implemented with several lines of Matlab code. How to rank objects like images and texts? Link-based ranking has enjoyed a huge success in web search engines. However, in practice, many types of data have no link structure but being modeled as vectors in Euclidean spaces, for instance, texts and images. A principled way of ranking those kinds of data is to explore and exploit their intrinsic geometrical or manifold structure. I will show how we address this issue in a simple mathematical framework. Our approach has been widely used by different communities from image retrieval to bioinformatics. In addition, I will also talk about some theoretic analysis around those approaches, and discuss future extensions. ©2006 Microsoft Corporation. All rights reserved.

“Microsoft Research Video 104362: Dealing With Data: Classification, Clustering And Ranking” Metadata:

  • Title: ➤  Microsoft Research Video 104362: Dealing With Data: Classification, Clustering And Ranking
  • Author:
  • Language: English

“Microsoft Research Video 104362: Dealing With Data: Classification, Clustering And Ranking” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "movies" format, the size of the file-s is: 618.43 Mbs, the file-s for this book were downloaded 84 times, the file-s went public at Fri May 02 2014.

Available formats:
Animated GIF - Archive BitTorrent - Item Tile - Metadata - Ogg Video - Thumbnail - Windows Media - h.264 -

Related Links:

Online Marketplaces

Find Microsoft Research Video 104362: Dealing With Data: Classification, Clustering And Ranking at online marketplaces:


3The Clustering Of Galaxies In The Completed SDSS-III Baryon Oscillation Spectroscopic Survey: Double-probe Measurements From BOSS Galaxy Clustering \& Planck Data -- Towards An Analysis Without Informative Priors

By

We develop a new methodology called double-probe analysis with the aim of minimizing informative priors in the estimation of cosmological parameters. We extract the dark-energy-model-independent cosmological constraints from the joint data sets of Baryon Oscillation Spectroscopic Survey (BOSS) galaxy sample and Planck cosmic microwave background (CMB) measurement. We measure the mean values and covariance matrix of $\{R$, $l_a$, $\Omega_b h^2$, $n_s$, $log(A_s)$, $\Omega_k$, $H(z)$, $D_A(z)$, $f(z)\sigma_8(z)\}$, which give an efficient summary of Planck data and 2-point statistics from BOSS galaxy sample, where $R=\sqrt{\Omega_m H_0^2}\,r(z_*)$, and $l_a=\pi r(z_*)/r_s(z_*)$, $z_*$ is the redshift at the last scattering surface, and $r(z_*)$ and $r_s(z_*)$ denote our comoving distance to $z_*$ and sound horizon at $z_*$ respectively. The advantage of this method is that we do not need to put informative priors on the cosmological parameters that galaxy clustering is not able to constrain well, i.e. $\Omega_b h^2$ and $n_s$. Using our double-probe results, we obtain $\Omega_m=0.304\pm0.009$, $H_0=68.2\pm0.7$, and $\sigma_8=0.806\pm0.014$ assuming $\Lambda$CDM; and $\Omega_k=0.002\pm0.003$ and $w=-1.00\pm0.07$ assuming o$w$CDM. The results show no tension with the flat $\Lambda$CDM cosmological paradigm. By comparing with the full-likelihood analyses with fixed dark energy models, we demonstrate that the double-probe method provides robust cosmological parameter constraints which can be conveniently used to study dark energy models. We extend our study to measure the sum of neutrino mass and obtain $\Sigma m_\nu

“The Clustering Of Galaxies In The Completed SDSS-III Baryon Oscillation Spectroscopic Survey: Double-probe Measurements From BOSS Galaxy Clustering \& Planck Data -- Towards An Analysis Without Informative Priors” Metadata:

  • Title: ➤  The Clustering Of Galaxies In The Completed SDSS-III Baryon Oscillation Spectroscopic Survey: Double-probe Measurements From BOSS Galaxy Clustering \& Planck Data -- Towards An Analysis Without Informative Priors
  • Authors: ➤  

“The Clustering Of Galaxies In The Completed SDSS-III Baryon Oscillation Spectroscopic Survey: Double-probe Measurements From BOSS Galaxy Clustering \& Planck Data -- Towards An Analysis Without Informative Priors” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 1.44 Mbs, the file-s for this book were downloaded 23 times, the file-s went public at Fri Jun 29 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find The Clustering Of Galaxies In The Completed SDSS-III Baryon Oscillation Spectroscopic Survey: Double-probe Measurements From BOSS Galaxy Clustering \& Planck Data -- Towards An Analysis Without Informative Priors at online marketplaces:


4OCA: Overlapping Clustering Application Unsupervised Approach For Data Analysis

By

In this paper, a new data analysis tool called Overlapping Clustering Application (OCA) was presented. It was developed to identify overlapping clusters and outliers in an unsupervised manner. The main function of OCA is composed of three phases. The first phase is the detection of the abnormal values (outliers) in the datasets using median absolute deviation. The second phase is to segment data objects into cluster using k-means algorithm. Finally, the last phase is the identification of overlapping clusters, it uses maxdis as a predictor of data objects that can belong to multiple clusters. Experimental results revealed that the developed OCA proved its capability in detecting overlapping clusters and outliers accordingly.

“OCA: Overlapping Clustering Application Unsupervised Approach For Data Analysis” Metadata:

  • Title: ➤  OCA: Overlapping Clustering Application Unsupervised Approach For Data Analysis
  • Author: ➤  
  • Language: English

“OCA: Overlapping Clustering Application Unsupervised Approach For Data Analysis” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 11.46 Mbs, the file-s for this book were downloaded 134 times, the file-s went public at Tue May 18 2021.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find OCA: Overlapping Clustering Application Unsupervised Approach For Data Analysis at online marketplaces:


5Projection-Based Clustering Through Self-Organization And Swarm Intelligence: Combining Cluster Analysis With The Visualization Of High-Dimensional Data

Cluster Analysis; Dimensionality Reduction; Swarm Intelligence; Visualization; Unsupervised Machine Learning; Data Science; Knowledge Discovery; 3D Printing; Self-Organization; Emergence; Game Theory; Advanced Analytics; High-Dimensional Data; Multivariate Data; Analysis of Structured Data

“Projection-Based Clustering Through Self-Organization And Swarm Intelligence: Combining Cluster Analysis With The Visualization Of High-Dimensional Data” Metadata:

  • Title: ➤  Projection-Based Clustering Through Self-Organization And Swarm Intelligence: Combining Cluster Analysis With The Visualization Of High-Dimensional Data
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 168.91 Mbs, the file-s for this book were downloaded 21 times, the file-s went public at Sun Jun 02 2024.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find Projection-Based Clustering Through Self-Organization And Swarm Intelligence: Combining Cluster Analysis With The Visualization Of High-Dimensional Data at online marketplaces:


6Analysis Of Clustering And Association Using Data Mining Technique For Elderly Health Condition Dataset

By

Data survey on the elderly health condition in each year aimed to investigate the performance result on the elderly health care and to evaluate the elderly’s health and health promotion. Thus, in analyzing the data, it mainly relied on the mining data technique for the evaluating health condition. This study presented the data analysis by clustering method. Then, the data was taken from each group to find the association rule. The analysis results showed that the elderly’s health condition data could be classified into four different groups; cluster 1 (25%) were male elderly with high blood pressure and smoking cigarette, cluster 2 (25%) were female elderly with no the congenital disease but the result from the eye sight examination, it was found that they were long-sighted, cluster 3 (24%) were female elderly with no the congenital disease but having the insomnia and osteoarthritis and cluster 4 (26%) were female elderly with high blood pressure and diabetes. It also indicated that each group had the rule showing the correlation between the data in each group having the minimum value of confidence at 0.8 and the minimum value of support not less than 0.5. 

“Analysis Of Clustering And Association Using Data Mining Technique For Elderly Health Condition Dataset” Metadata:

  • Title: ➤  Analysis Of Clustering And Association Using Data Mining Technique For Elderly Health Condition Dataset
  • Author: ➤  
  • Language: English

“Analysis Of Clustering And Association Using Data Mining Technique For Elderly Health Condition Dataset” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 8.30 Mbs, the file-s for this book were downloaded 15 times, the file-s went public at Wed Feb 07 2024.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find Analysis Of Clustering And Association Using Data Mining Technique For Elderly Health Condition Dataset at online marketplaces:


7A Modified Overlapping Partitioning Clustering Algorithm For Categorical Data Clustering

By

Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data by partitioning data into clusters with similar patterns. Over the past decades, many clustering algorithms have been developed for various clustering problems. An overlapping partitioning clustering (OPC) algorithm can only handle numerical data. Hence, novel clustering algorithms have been studied extensively to overcome this issue. By increasing the number of objects belonging to one cluster and distance between cluster centers, the study aimed to cluster the textual data type without losing the main functions. The proposed study herein included over twenty newsgroup dataset, which consisted of approximately 20000 textual documents. By introducing some modifications to the traditional algorithm, an acceptable level of homogeneity and completeness of clusters were generated. Modifications were performed on the pre-processing phase and data representation, along with the number methods which influence the primary function of the algorithm. Subsequently, the results were evaluated and compared with the k-means algorithm of the training and test datasets. The results indicated that the modified algorithm could successfully handle the categorical data and produce satisfactory clusters.

“A Modified Overlapping Partitioning Clustering Algorithm For Categorical Data Clustering” Metadata:

  • Title: ➤  A Modified Overlapping Partitioning Clustering Algorithm For Categorical Data Clustering
  • Author: ➤  
  • Language: English

“A Modified Overlapping Partitioning Clustering Algorithm For Categorical Data Clustering” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 12.48 Mbs, the file-s for this book were downloaded 77 times, the file-s went public at Fri Nov 06 2020.

Available formats:
Abbyy GZ - Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find A Modified Overlapping Partitioning Clustering Algorithm For Categorical Data Clustering at online marketplaces:


8A Pairwise Likelihood Approach To Simultaneous Clustering And Dimensional Reduction Of Ordinal Data

By

The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue the parameter estimation is carried out through an EM-like algorithm maximizing a pairwise log-likelihood. Examples of application of the model on real and simulated data are performed to show the effectiveness of the proposal.

“A Pairwise Likelihood Approach To Simultaneous Clustering And Dimensional Reduction Of Ordinal Data” Metadata:

  • Title: ➤  A Pairwise Likelihood Approach To Simultaneous Clustering And Dimensional Reduction Of Ordinal Data
  • Authors:
  • Language: English

“A Pairwise Likelihood Approach To Simultaneous Clustering And Dimensional Reduction Of Ordinal Data” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 13.49 Mbs, the file-s for this book were downloaded 23 times, the file-s went public at Wed Jun 27 2018.

Available formats:
Abbyy GZ - Archive BitTorrent - DjVuTXT - Djvu XML - JPEG Thumb - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find A Pairwise Likelihood Approach To Simultaneous Clustering And Dimensional Reduction Of Ordinal Data at online marketplaces:


9Data Clustering And Graph Partitioning Via Simulated Mixing

By

Spectral clustering approaches have led to well-accepted algorithms for finding accurate clusters in a given dataset. However, their application to large-scale datasets has been hindered by computational complexity of eigenvalue decompositions. Several algorithms have been proposed in the recent past to accelerate spectral clustering, however they compromise on the accuracy of the spectral clustering to achieve faster speed. In this paper, we propose a novel spectral clustering algorithm based on a mixing process on a graph. Unlike the existing spectral clustering algorithms, our algorithm does not require computing eigenvectors. Specifically, it finds the equivalent of a linear combination of eigenvectors of the normalized similarity matrix weighted with corresponding eigenvalues. This linear combination is then used to partition the dataset into meaningful clusters. Simulations on real datasets show that partitioning datasets based on such linear combinations of eigenvectors achieves better accuracy than standard spectral clustering methods as the number of clusters increase. Our algorithm can easily be implemented in a distributed setting.

“Data Clustering And Graph Partitioning Via Simulated Mixing” Metadata:

  • Title: ➤  Data Clustering And Graph Partitioning Via Simulated Mixing
  • Authors:

“Data Clustering And Graph Partitioning Via Simulated Mixing” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 0.74 Mbs, the file-s for this book were downloaded 24 times, the file-s went public at Fri Jun 29 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find Data Clustering And Graph Partitioning Via Simulated Mixing at online marketplaces:


10Distance For Functional Data Clustering Based On Smoothing Parameter Commutation

By

We propose a novel method to determine the dissimilarity between subjects for functional data clustering. Spline smoothing or interpolation is common to deal with data of such type. Instead of estimating the best-representing curve for each subject as fixed during clustering, we measure the dissimilarity between subjects based on varying curve estimates with commutation of smoothing parameters pair-by-pair (of subjects). The intuitions are that smoothing parameters of smoothing splines reflect inverse signal-to-noise ratios and that applying an identical smoothing parameter the smoothed curves for two similar subjects are expected to be close. The effectiveness of our proposal is shown through simulations comparing to other dissimilarity measures. It also has several pragmatic advantages. First, missing values or irregular time points can be handled directly, thanks to the nature of smoothing splines. Second, conventional clustering method based on dissimilarity can be employed straightforward, and the dissimilarity also serves as a useful tool for outlier detection. Third, the implementation is almost handy since subroutines for smoothing splines and numerical integration are widely available. Fourth, the computational complexity does not increase and is parallel with that in calculating Euclidean distance between curves estimated by smoothing splines.

“Distance For Functional Data Clustering Based On Smoothing Parameter Commutation” Metadata:

  • Title: ➤  Distance For Functional Data Clustering Based On Smoothing Parameter Commutation
  • Authors:

“Distance For Functional Data Clustering Based On Smoothing Parameter Commutation” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 0.46 Mbs, the file-s for this book were downloaded 20 times, the file-s went public at Fri Jun 29 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find Distance For Functional Data Clustering Based On Smoothing Parameter Commutation at online marketplaces:


11A Complex Networks Approach For Data Clustering

By

Many methods have been developed for data clustering, such as k-means, expectation maximization and algorithms based on graph theory. In this latter case, graphs are generally constructed by taking into account the Euclidian distance as a similarity measure, and partitioned using spectral methods. However, these methods are not accurate when the clusters are not well separated. In addition, it is not possible to automatically determine the number of clusters. These limitations can be overcome by taking into account network community identification algorithms. In this work, we propose a methodology for data clustering based on complex networks theory. We compare different metrics for quantifying the similarity between objects and take into account three community finding techniques. This approach is applied to two real-world databases and to two sets of artificially generated data. By comparing our method with traditional clustering approaches, we verify that the proximity measures given by the Chebyshev and Manhattan distances are the most suitable metrics to quantify the similarity between objects. In addition, the community identification method based on the greedy optimization provides the smallest misclassification rates.

“A Complex Networks Approach For Data Clustering” Metadata:

  • Title: ➤  A Complex Networks Approach For Data Clustering
  • Authors:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 7.86 Mbs, the file-s for this book were downloaded 87 times, the file-s went public at Sun Sep 22 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - JPEG Thumb - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find A Complex Networks Approach For Data Clustering at online marketplaces:


12The VIMOS Public Extragalactic Redshift Survey (VIPERS). Galaxy Clustering And Redshift-space Distortions At Z=0.8 In The First Data Release

By

We present in this paper the general real- and redshift-space clustering properties of galaxies as measured in the first data release of the VIPERS survey. VIPERS is a large redshift survey designed to probe the distant Universe and its large-scale structure at 0.5 < z < 1.2. We describe in this analysis the global properties of the sample and discuss the survey completeness and associated corrections. This sample allows us to measure the galaxy clustering with an unprecedented accuracy at these redshifts. From the redshift-space distortions observed in the galaxy clustering pattern we provide a first measurement of the growth rate of structure at z = 0.8: f\sigma_8 = 0.47 +/- 0.08. This is completely consistent with the predictions of standard cosmological models based on Einstein gravity, although this measurement alone does not discriminate between different gravity models.

“The VIMOS Public Extragalactic Redshift Survey (VIPERS). Galaxy Clustering And Redshift-space Distortions At Z=0.8 In The First Data Release” Metadata:

  • Title: ➤  The VIMOS Public Extragalactic Redshift Survey (VIPERS). Galaxy Clustering And Redshift-space Distortions At Z=0.8 In The First Data Release
  • Authors: ➤  
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 25.56 Mbs, the file-s for this book were downloaded 61 times, the file-s went public at Mon Sep 23 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - JPEG Thumb - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find The VIMOS Public Extragalactic Redshift Survey (VIPERS). Galaxy Clustering And Redshift-space Distortions At Z=0.8 In The First Data Release at online marketplaces:


13Multivariate Multinomial Mixtures: A Data-driven Penalized Criterion For Variable Selection And Clustering

By

We consider the problem of estimating the number of components and the relevant variables in a multivariate multinomial mixture. This kind of models arise in particular when dealing with multilocus genotypic data. A new penalized maximum likelihood criterion is proposed, and a non-asymptotic oracle inequality is obtained. Further, under weak assumptions on the true probability underlying the observations, the selected model is asymptotically consistent. On a practical aspect, the shape of our proposed penalty function is defined up to a multiplicative parameter which is calibrated thanks to the slope heuristics, in an automatic data-driven procedure. Using simulated data, we found that this procedure improves the performances of the selection procedure with respect to classical criteria such as BIC and AIC. The new criterion gives an answer to the question "Which criterion for which sample size?".

“Multivariate Multinomial Mixtures: A Data-driven Penalized Criterion For Variable Selection And Clustering” Metadata:

  • Title: ➤  Multivariate Multinomial Mixtures: A Data-driven Penalized Criterion For Variable Selection And Clustering
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 9.87 Mbs, the file-s for this book were downloaded 91 times, the file-s went public at Fri Sep 20 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Multivariate Multinomial Mixtures: A Data-driven Penalized Criterion For Variable Selection And Clustering at online marketplaces:


14State-Space Dynamics Distance For Clustering Sequential Data

By

This paper proposes a novel similarity measure for clustering sequential data. We first construct a common state-space by training a single probabilistic model with all the sequences in order to get a unified representation for the dataset. Then, distances are obtained attending to the transition matrices induced by each sequence in that state-space. This approach solves some of the usual overfitting and scalability issues of the existing semi-parametric techniques, that rely on training a model for each sequence. Empirical studies on both synthetic and real-world datasets illustrate the advantages of the proposed similarity measure for clustering sequences.

“State-Space Dynamics Distance For Clustering Sequential Data” Metadata:

  • Title: ➤  State-Space Dynamics Distance For Clustering Sequential Data
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 11.82 Mbs, the file-s for this book were downloaded 62 times, the file-s went public at Sun Sep 22 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find State-Space Dynamics Distance For Clustering Sequential Data at online marketplaces:


15Model-Based Clustering Using Multi-allelic Loci Data With Loci Selection

By

We propose a Model-Based Clustering (MBC) method combined with loci selection using multi-allelic loci genetic data. The loci selection problem is regarded as a model selection problem and models in competition are compared with the Bayesian Information Criterion (BIC). The resulting procedure selects the subset of clustering loci, the number of clusters, estimates the proportion of each cluster and the allelic frequencies within each cluster. We prove that the selected model converges in probability to the true model under a single realistic assumption as the size of the sample tends to infinity. The proposed method named MixMoGenD (Mixture Model using Genetic Data) was implemented using c++ programming language. Numerical experiments on simulated data sets was conducted to highlight the interest of the proposed loci selection procedure.

“Model-Based Clustering Using Multi-allelic Loci Data With Loci Selection” Metadata:

  • Title: ➤  Model-Based Clustering Using Multi-allelic Loci Data With Loci Selection
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 10.01 Mbs, the file-s for this book were downloaded 59 times, the file-s went public at Sun Sep 22 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Model-Based Clustering Using Multi-allelic Loci Data With Loci Selection at online marketplaces:


16Revealing Spatial Variability Structures Of Geostatistical Functional Data Via Dynamic Clustering

By

In several environmental applications data are functions of time, essentially con- tinuous, observed and recorded discretely, and spatially correlated. Most of the methods for analyzing such data are extensions of spatial statistical tools which deal with spatially dependent functional data. In such framework, this paper introduces a new clustering method. The main features are that it finds groups of functions that are similar to each other in terms of their spatial functional variability and that it locates a set of centers which summarize the spatial functional variability of each cluster. The method optimizes, through an iterative algorithm, a best fit criterion between the partition of the curves and the representative element of the clusters, assumed to be a variogram function. The performance of the proposed clustering method was evaluated by studying the results obtained through the application on simulated and real datasets.

“Revealing Spatial Variability Structures Of Geostatistical Functional Data Via Dynamic Clustering” Metadata:

  • Title: ➤  Revealing Spatial Variability Structures Of Geostatistical Functional Data Via Dynamic Clustering
  • Authors:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 8.65 Mbs, the file-s for this book were downloaded 59 times, the file-s went public at Sat Sep 21 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Revealing Spatial Variability Structures Of Geostatistical Functional Data Via Dynamic Clustering at online marketplaces:


17Variable Selection For Model-based Clustering Using The Integrated Complete-data Likelihood

By

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often greedy because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require any estimate and its maximization is simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumption. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection.

“Variable Selection For Model-based Clustering Using The Integrated Complete-data Likelihood” Metadata:

  • Title: ➤  Variable Selection For Model-based Clustering Using The Integrated Complete-data Likelihood
  • Authors:
  • Language: English

“Variable Selection For Model-based Clustering Using The Integrated Complete-data Likelihood” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 7.69 Mbs, the file-s for this book were downloaded 36 times, the file-s went public at Tue Jun 26 2018.

Available formats:
Abbyy GZ - Archive BitTorrent - DjVuTXT - Djvu XML - JPEG Thumb - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Variable Selection For Model-based Clustering Using The Integrated Complete-data Likelihood at online marketplaces:


18Data Spectroscopy: Eigenspaces Of Convolution Operators And Clustering

By

This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the data spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters automatically and to group the data accordingly. Our findings extend the intuitions underlying existing spectral techniques such as spectral clustering and Kernel Principal Components Analysis, and provide new understanding into their usability and modes of failure. Simulation studies and experiments on real-world data are conducted to show the potential of our algorithm. In particular, DaSpec is found to handle unbalanced groups and recover clusters of different shapes better than the competing methods.

“Data Spectroscopy: Eigenspaces Of Convolution Operators And Clustering” Metadata:

  • Title: ➤  Data Spectroscopy: Eigenspaces Of Convolution Operators And Clustering
  • Authors:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 12.35 Mbs, the file-s for this book were downloaded 109 times, the file-s went public at Mon Jul 22 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Data Spectroscopy: Eigenspaces Of Convolution Operators And Clustering at online marketplaces:


19A Fast Clustering – Based High-Dimensional Data By Using Text Classification

By

Most existing popular text segregation methods have adopted term-based approaches. It classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. The field of text mining seeks to extract useful information from unstructured textual data through the identification and exploration of interesting patterns. The discovery of relevant features in real-world data for describing user information needs or preferences is a new challenge in text mining. Relevance of a feature indicates that the features is always necessary for an optimal subset, it cannot be removed without affecting the original conditional class distribution. In this paper, an adaptive method for relevance feature discovery is discussed, to find useful features available in a feedback set, including both positive and negative documents, for describing what users need. Thus, this paper discusses the methods for relevance feature discovery using the simulated annealing approximation and genetic algorithm, a population of candidate solutions to an optimization problem toward better solutions.

“A Fast Clustering – Based High-Dimensional Data By Using Text Classification” Metadata:

  • Title: ➤  A Fast Clustering – Based High-Dimensional Data By Using Text Classification
  • Author: ➤  
  • Language: English

“A Fast Clustering – Based High-Dimensional Data By Using Text Classification” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 5.25 Mbs, the file-s for this book were downloaded 73 times, the file-s went public at Sat Feb 09 2019.

Available formats:
Abbyy GZ - Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find A Fast Clustering – Based High-Dimensional Data By Using Text Classification at online marketplaces:


20Non-centroid-based Discrete Differential Evolution For Data Clustering

By

Data clustering can find similarities and hidden patterns within data. Given a predefined number of groups, most partitional clustering algorithms use representative centers to determine their corresponding clusters. These algorithms, such as K-means and optimization-based algorithms, create and update centroids to give (hyper) spherical shape clusters. This research proposes a non-centroid-based discrete differential evolution (NCDDE) algorithm to solve clustering problems and provide non-spherical shape clusters. The algorithm directs the population of discrete vectors to search for data group labels. It uses a novel discrete mutation strategy analogous to the continuous mutation in classical differential evolution. It also combines a sorting mutation to enhance convergence speed. The algorithm adaptively selects crossover rates in high and low ranges. We use the UCI datasets to compare the NCDDE with other continuous centroid-based algorithms by intra-cluster distance and clustering accuracy. The results show that NCDDE outperforms the compared algorithms overall by intra-cluster distance and achieves the best accuracy for several datasets.

“Non-centroid-based Discrete Differential Evolution For Data Clustering” Metadata:

  • Title: ➤  Non-centroid-based Discrete Differential Evolution For Data Clustering
  • Author: ➤  
  • Language: English

“Non-centroid-based Discrete Differential Evolution For Data Clustering” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 7.30 Mbs, the file-s for this book were downloaded 9 times, the file-s went public at Thu Dec 26 2024.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find Non-centroid-based Discrete Differential Evolution For Data Clustering at online marketplaces:


21A Functional Clustering Algorithm For The Analysis Of Dynamic Network Data

By

We formulate a novel technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.

“A Functional Clustering Algorithm For The Analysis Of Dynamic Network Data” Metadata:

  • Title: ➤  A Functional Clustering Algorithm For The Analysis Of Dynamic Network Data
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 9.64 Mbs, the file-s for this book were downloaded 64 times, the file-s went public at Wed Sep 18 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find A Functional Clustering Algorithm For The Analysis Of Dynamic Network Data at online marketplaces:


22A Review On Data Clustering Using Spiking Neural Network (SNN) Models

By

The evolution of Artificial Neural Network recently gives researchers an interest to explore deep learning evolved by Spiking Neural Network clustering methods. Spiking Neural Network (SNN) models captured neuronal behaviour more precisely than a traditional neural network as it contains the theory of time into their functioning model [1]. The aim of this paper is to reviewed studies that are related to clustering problems employing Spiking Neural Networks models. Even though there are many algorithms used to solve clustering problems, most of the methods are only suitable for static data and fixed windows of time series. Hence, there is a need to analyse complex data type, the potential for improvement is encouraged. Therefore, this paper summarized the significant result obtains by implying SNN models in different clustering approach. Thus, the findings of this paper could demonstrate the purpose of clustering method using SNN for the fellow researchers from various disciplines to discover and understand complex data.

“A Review On Data Clustering Using Spiking Neural Network (SNN) Models” Metadata:

  • Title: ➤  A Review On Data Clustering Using Spiking Neural Network (SNN) Models
  • Author: ➤  
  • Language: English

“A Review On Data Clustering Using Spiking Neural Network (SNN) Models” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 8.02 Mbs, the file-s for this book were downloaded 89 times, the file-s went public at Sat May 22 2021.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find A Review On Data Clustering Using Spiking Neural Network (SNN) Models at online marketplaces:


23A Review Study Of Various Data Mining Classification & Clustering Techniques

By

Data mining application includes a variety of methodologies that have been developed by commercial & research centers. This technique has been used for industrial, commercial and scientific purposes. It is most useful in an exploratory analysis scenario in which there are no prearranged notions about what will compose an "interesting" outcome. The WEKA contains a set of visualization tools & algorithms for data analysis and predictive modeling, together with graphical user interfaces for simple access to this functionality. Karambeer Kaur | Mr Surender Singh"A Review Study of various Data Mining Classification & Clustering Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd135.pdf Article URL: http://www.ijtsrd.com/engineering/computer-engineering/135/a-review-study-of-various-data-mining-classification-and-clustering-techniques/karambeer-kaur

“A Review Study Of Various Data Mining Classification & Clustering Techniques” Metadata:

  • Title: ➤  A Review Study Of Various Data Mining Classification & Clustering Techniques
  • Author: ➤  
  • Language: English

“A Review Study Of Various Data Mining Classification & Clustering Techniques” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 6.77 Mbs, the file-s for this book were downloaded 118 times, the file-s went public at Thu Aug 30 2018.

Available formats:
Abbyy GZ - Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find A Review Study Of Various Data Mining Classification & Clustering Techniques at online marketplaces:


24Efficient Hierarchical Clustering For Continuous Data

By

We present an new sequential Monte Carlo sampler for coalescent based Bayesian hierarchical clustering. Our model is appropriate for modeling non-i.i.d. data and offers a substantial reduction of computational cost when compared to the original sampler without resorting to approximations. We also propose a quadratic complexity approximation that in practice shows almost no loss in performance compared to its counterpart. We show that as a byproduct of our formulation, we obtain a greedy algorithm that exhibits performance improvement over other greedy algorithms, particularly in small data sets. In order to exploit the correlation structure of the data, we describe how to incorporate Gaussian process priors in the model as a flexible way to model non-i.i.d. data. Results on artificial and real data show significant improvements over closely related approaches.

“Efficient Hierarchical Clustering For Continuous Data” Metadata:

  • Title: ➤  Efficient Hierarchical Clustering For Continuous Data
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 9.30 Mbs, the file-s for this book were downloaded 87 times, the file-s went public at Sat Sep 21 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Efficient Hierarchical Clustering For Continuous Data at online marketplaces:


25NASA Technical Reports Server (NTRS) 19760014571: Computer-aided Analysis Of LANDSAT-1 MSS Data: A Comparison Of Three Approaches, Including A Modified Clustering Approach. [Ludwig Mt. In San Juan Mountain Range, Colorado

By

There are no author-identified significant results in this report.

“NASA Technical Reports Server (NTRS) 19760014571: Computer-aided Analysis Of LANDSAT-1 MSS Data: A Comparison Of Three Approaches, Including A Modified Clustering Approach. [Ludwig Mt. In San Juan Mountain Range, Colorado” Metadata:

  • Title: ➤  NASA Technical Reports Server (NTRS) 19760014571: Computer-aided Analysis Of LANDSAT-1 MSS Data: A Comparison Of Three Approaches, Including A Modified Clustering Approach. [Ludwig Mt. In San Juan Mountain Range, Colorado
  • Author: ➤  
  • Language: English

“NASA Technical Reports Server (NTRS) 19760014571: Computer-aided Analysis Of LANDSAT-1 MSS Data: A Comparison Of Three Approaches, Including A Modified Clustering Approach. [Ludwig Mt. In San Juan Mountain Range, Colorado” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 11.68 Mbs, the file-s for this book were downloaded 98 times, the file-s went public at Mon Jul 18 2016.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find NASA Technical Reports Server (NTRS) 19760014571: Computer-aided Analysis Of LANDSAT-1 MSS Data: A Comparison Of Three Approaches, Including A Modified Clustering Approach. [Ludwig Mt. In San Juan Mountain Range, Colorado at online marketplaces:


26An Analysis Of Vessel Waypoint Behavior Through Data Clustering

By

In this thesis, we cluster stop points into stop-point regions using one month’s Automatic Identification System (AIS) data from the Gulf of Mexico and Caribbean Sea to characterize vessel behavior in an area with diverse traffic patterns. Initial cleaning of the dataset is necessary to address multiple issues common to AIS transponders. We consider methods for computing inter-point distances. In particular, we study a promising method for combining geospatial coordinates with other vessel attributes. We use the Ordering Points To Identify the Cluster Structure (OPTICS) clustering algorithm because it can identify outliers, and it constructs clusters of varying shapes and densities. Our best results come from dividing the area of interest into seven zones of equal size, and analyzing the results over each zone. Using classification trees to develop a classification tool, we illustrate an approach for predicting the cluster membership of a new observation. Due to the reduction in computation time and accuracy of results, we recommend that further research utilize the methods from this study as the foundation for an automated threat detection system.

“An Analysis Of Vessel Waypoint Behavior Through Data Clustering” Metadata:

  • Title: ➤  An Analysis Of Vessel Waypoint Behavior Through Data Clustering
  • Author:
  • Language: English

“An Analysis Of Vessel Waypoint Behavior Through Data Clustering” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 20.21 Mbs, the file-s for this book were downloaded 42 times, the file-s went public at Sat May 04 2019.

Available formats:
Abbyy GZ - Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find An Analysis Of Vessel Waypoint Behavior Through Data Clustering at online marketplaces:


27Joint Clustering And Registration Of Functional Data

By

Curve registration and clustering are fundamental tools in the analysis of functional data. While several methods have been developed and explored for either task individually, limited work has been done to infer functional clusters and register curves simultaneously. We propose a hierarchical model for joint curve clustering and registration. Our proposal combines a Dirichlet process mixture model for clustering of common shapes, with a reproducing kernel representation of phase variability for registration. We show how inference can be carried out applying standard posterior simulation algorithms and compare our method to several alternatives in both engineered data and a benchmark analysis of the Berkeley growth data. We conclude our investigation with an application to time course gene expression.

“Joint Clustering And Registration Of Functional Data” Metadata:

  • Title: ➤  Joint Clustering And Registration Of Functional Data
  • Authors:

“Joint Clustering And Registration Of Functional Data” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 0.36 Mbs, the file-s for this book were downloaded 18 times, the file-s went public at Sat Jun 30 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find Joint Clustering And Registration Of Functional Data at online marketplaces:


28Towards A Precise Measurement Of Matter Clustering: Lyman-alpha Forest Data At Redshifts 2-4

By

We measure the filling factor, correlation function, and power spectrum of transmitted flux in a large sample of Lya forest spectra, comprised of 30 Keck HIRES spectra and 23 Keck LRIS spectra. We infer the linear matter power spectrum P(k) from the flux power spectrum P_F(k), using an improved version of the method of Croft et al. (1998) that accounts for the influence of z-space distortions, non- linearity, and thermal broadening on P_F(k). The evolution of the shape and amplitude of P(k) over the range z= 2-4 is consistent with gravitational instability, implying that non-gravitational fluctuations do not make a large contribution. Our fiducial measurement of P(k) comes from data with = 2.72. It has amplitude Delta^2(k_p)=0.74^0.20_-0.16 at wavenumber k_p=0.03 (km/s)^-1 and is well described by a power-law of index -2.43 +/- 0.06 or by a CDM-like power spectrum with shape parameter Gamma'=1.3^+0.7_-0.5*10^-3 (km/s) at z=2.72. For Omega_m=0.4, Omega_Lam=0.6, the best-fit Gamma =0.16 (h^-1mpc)^-1, in good agreement with the 2dF Galaxy Redshift Survey, and the best-fit sigma_8=0.82 (Gamma/0.15)^-0.44. Matching the observed cluster mass function and our Delta^2(k_p) in spatially flat models requires Omega_m=0.38^+0.10_-0.08 + 2.2 (Gamma-0.15). Matching Delta^2(k_p) in COBE-normalized, flat CDM models with no tensor fluctuations requires Omega_m = (0.29 +/-0.04) n^-2.89 h_65^-1.9. The Lya forest complements other probes of P(k) by constraining a regime of redshift and lengthscale not accessible by other means, and the consistency of these inferred parameters with independent estimates provides further support for inflation, cold dark matter, and vacuum energy (abridged).

“Towards A Precise Measurement Of Matter Clustering: Lyman-alpha Forest Data At Redshifts 2-4” Metadata:

  • Title: ➤  Towards A Precise Measurement Of Matter Clustering: Lyman-alpha Forest Data At Redshifts 2-4
  • Authors: ➤  
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 34.05 Mbs, the file-s for this book were downloaded 132 times, the file-s went public at Wed Sep 18 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - JPEG Thumb - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Towards A Precise Measurement Of Matter Clustering: Lyman-alpha Forest Data At Redshifts 2-4 at online marketplaces:


29CSAL: Self-adaptive Labeling Based Clustering Integrating Supervised Learning On Unlabeled Data

By

Supervised classification approaches can predict labels for unknown data because of the supervised training process. The success of classification is heavily dependent on the labeled training data. Differently, clustering is effective in revealing the aggregation property of unlabeled data, but the performance of most clustering methods is limited by the absence of labeled data. In real applications, however, it is time-consuming and sometimes impossible to obtain labeled data. The combination of clustering and classification is a promising and active approach which can largely improve the performance. In this paper, we propose an innovative and effective clustering framework based on self-adaptive labeling (CSAL) which integrates clustering and classification on unlabeled data. Clustering is first employed to partition data and a certain proportion of clustered data are selected by our proposed labeling approach for training classifiers. In order to refine the trained classifiers, an iterative process of Expectation-Maximization algorithm is devised into the proposed clustering framework CSAL. Experiments are conducted on publicly data sets to test different combinations of clustering algorithms and classification models as well as various training data labeling methods. The experimental results show that our approach along with the self-adaptive method outperforms other methods.

“CSAL: Self-adaptive Labeling Based Clustering Integrating Supervised Learning On Unlabeled Data” Metadata:

  • Title: ➤  CSAL: Self-adaptive Labeling Based Clustering Integrating Supervised Learning On Unlabeled Data
  • Authors:
  • Language: English

“CSAL: Self-adaptive Labeling Based Clustering Integrating Supervised Learning On Unlabeled Data” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 7.56 Mbs, the file-s for this book were downloaded 36 times, the file-s went public at Tue Jun 26 2018.

Available formats:
Abbyy GZ - Archive BitTorrent - DjVuTXT - Djvu XML - JPEG Thumb - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find CSAL: Self-adaptive Labeling Based Clustering Integrating Supervised Learning On Unlabeled Data at online marketplaces:


30Gamma-ray DBSCAN: A Clustering Algorithm Applied To Fermi-LAT Gamma-ray Data. I. Detection Performances With Real And Simulated Data

By

The Density Based Spatial Clustering of Applications with Noise (DBSCAN) is a topometric algorithm used to cluster spatial data that are affected by background noise. For the first time, we propose the use of this method for the detection of sources in gamma-ray astrophysical images obtained from the Fermi-LAT data, where each point corresponds to the arrival direction of a photon. We investigate the detection performance of the gamma-ray DBSCAN in terms of detection efficiency and rejection of spurious clusters, using a parametric approach, and exploring a large volume of the gamma-ray DBSCAN parameter space. By means of simulated data we statistically characterize the gamma-ray DBSCAN, finding signatures that differentiate purely random fields, from fields with sources. We define a significance level for the detected clusters, and we successfully test this significance with our simulated data. We apply the method to real data, and we find an excellent agreement with the results obtained with simulated data. We find that the gamma-ray DBSCAN can be successfully used in the detection of clusters in gamma-ray data. The significance returned by our algorithm is strongly correlated with that provided by the Maximum Likelihood analysis with standard Fermi-LAT software, and can be used to safely remove spurious clusters. The positional accuracy of the reconstructed cluster centroid compares to that returned by standard Maximum Likelihood analysis, allowing to look for astrophysical counterparts in narrow regions, minimizing the chance probability in the counterpart association. We find that gamma-ray DBSCAN is a powerful tool in the detection of clusters in gamma-ray data, this method can be used both to look for point-like sources, and extended sources, and can be potentially applied to any astrophysical field related with detection of clusters in data.

“Gamma-ray DBSCAN: A Clustering Algorithm Applied To Fermi-LAT Gamma-ray Data. I. Detection Performances With Real And Simulated Data” Metadata:

  • Title: ➤  Gamma-ray DBSCAN: A Clustering Algorithm Applied To Fermi-LAT Gamma-ray Data. I. Detection Performances With Real And Simulated Data
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 14.34 Mbs, the file-s for this book were downloaded 57 times, the file-s went public at Sun Sep 22 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - JPEG Thumb - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Gamma-ray DBSCAN: A Clustering Algorithm Applied To Fermi-LAT Gamma-ray Data. I. Detection Performances With Real And Simulated Data at online marketplaces:


31Time-frequency Clustering For Burst Gravitational Waves Search In TAMA300 Data

By

We have developed a method 'time-frequency (TF) clustering' to find the burst gravitational waves for TAMA data analysis. TF clustering method on sonogram (spectrogram) shows some characteristics of short duration signal. Using parameters which represent the cluster shape, we can efficiently identify some predicted gravitational wave forms and can exclude typical unstable spike like noises due to detector instruments. The requirement of some parameters of cluster achieved roughly 50% average efficiency for injected DFM waveforms of $h_{rss}\sim 2 \times 10^{-19}$ for type I burst. Also the reduction for signal by spike noises are more than one order improvement for the SNR$>$100.

“Time-frequency Clustering For Burst Gravitational Waves Search In TAMA300 Data” Metadata:

  • Title: ➤  Time-frequency Clustering For Burst Gravitational Waves Search In TAMA300 Data
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 4.42 Mbs, the file-s for this book were downloaded 162 times, the file-s went public at Sat Jul 20 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Time-frequency Clustering For Burst Gravitational Waves Search In TAMA300 Data at online marketplaces:


32Robust Clustering For Functional Data Based On Trimming And Constraints

By

Many clustering algorithms when the data are curves or functions have been recently proposed. However, the presence of contamination in the sample of curves can influence the performance of most of them. In this work we propose a robust, model-based clustering method based on an approximation to the "density function" for functional data. The robustness results from the joint application of trimming, for reducing the effect of contaminated observations, and constraints on the variances, for avoiding spurious clusters in the solution. The proposed method has been evaluated through a simulation study. Finally, an application to a real data problem is given.

“Robust Clustering For Functional Data Based On Trimming And Constraints” Metadata:

  • Title: ➤  Robust Clustering For Functional Data Based On Trimming And Constraints
  • Authors:

“Robust Clustering For Functional Data Based On Trimming And Constraints” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 1.13 Mbs, the file-s for this book were downloaded 25 times, the file-s went public at Sat Jun 30 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find Robust Clustering For Functional Data Based On Trimming And Constraints at online marketplaces:


33The Clustering Of Galaxies In The SDSS-III Baryon Oscillation Spectroscopic Survey: Modeling Of The Luminosity And Colour Dependence In The Data Release 10

By

We investigate the luminosity and colour dependence of clustering of CMASS galaxies in the Sloan Digital Sky Survey-III Baryon Oscillation Spectroscopic Survey Tenth Data Release. The halo occupation distribution framework is adopted to model the projected two-point correlation function measurements on small and intermediate scales (from $0.02$ to $60\,h^{-1}{\rm {Mpc}}$) and to interpret the observed trends and infer the connection of galaxies to dark matter halos. We find that luminous red galaxies reside in massive halos of mass $M{\sim}10^{13}$--$10^{14}\,h^{-1}{\rm M_\odot}$ and more luminous galaxies are more clustered and hosted by more massive halos. The strong small-scale clustering requires a fraction of these galaxies to be satellites in massive halos, with the fraction at the level of 5--8 per cent and decreasing with luminosity. The characteristic mass of a halo hosting on average one satellite galaxy above a luminosity threshold is about a factor $8.7$ larger than that of a halo hosting a central galaxy above the same threshold. At a fixed luminosity, progressively redder galaxies are more strongly clustered on small scales, which can be explained by having a larger fraction of these galaxies in the form of satellites in massive halos. Our clustering measurements on scales below $0.4\,h^{-1}{\rm {Mpc}}$ allow us to study the small-scale spatial distribution of satellites inside halos. While the clustering of luminosity-threshold samples can be well described by a Navarro-Frenk-White (NFW) profile, that of the reddest galaxies prefers a steeper or more concentrated profile. Finally, we also use galaxy samples of constant number density at different redshifts to study the evolution of luminous galaxies, and find the clustering to be consistent with passive evolution in the redshift range of $0.5 \lesssim z \lesssim 0.6$.

“The Clustering Of Galaxies In The SDSS-III Baryon Oscillation Spectroscopic Survey: Modeling Of The Luminosity And Colour Dependence In The Data Release 10” Metadata:

  • Title: ➤  The Clustering Of Galaxies In The SDSS-III Baryon Oscillation Spectroscopic Survey: Modeling Of The Luminosity And Colour Dependence In The Data Release 10
  • Authors: ➤  

“The Clustering Of Galaxies In The SDSS-III Baryon Oscillation Spectroscopic Survey: Modeling Of The Luminosity And Colour Dependence In The Data Release 10” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 0.69 Mbs, the file-s for this book were downloaded 24 times, the file-s went public at Sat Jun 30 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find The Clustering Of Galaxies In The SDSS-III Baryon Oscillation Spectroscopic Survey: Modeling Of The Luminosity And Colour Dependence In The Data Release 10 at online marketplaces:


34Excisive Hierarchical Clustering Methods For Network Data

By

We introduce two practical properties of hierarchical clustering methods for (possibly asymmetric) network data: excisiveness and linear scale preservation. The latter enforces imperviousness to change in units of measure whereas the former ensures local consistency of the clustering outcome. Algorithmically, excisiveness implies that we can reduce computational complexity by only clustering a data subset of interest while theoretically guaranteeing that the same hierarchical outcome would be observed when clustering the whole dataset. Moreover, we introduce the concept of representability, i.e. a generative model for describing clustering methods through the specification of their action on a collection of networks. We further show that, within a rich set of admissible methods, requiring representability is equivalent to requiring both excisiveness and linear scale preservation. Leveraging this equivalence, we show that all excisive and linear scale preserving methods can be factored into two steps: a transformation of the weights in the input network followed by the application of a canonical clustering method. Furthermore, their factorization can be used to show stability of excisive and linear scale preserving methods in the sense that a bounded perturbation in the input network entails a bounded perturbation in the clustering output.

“Excisive Hierarchical Clustering Methods For Network Data” Metadata:

  • Title: ➤  Excisive Hierarchical Clustering Methods For Network Data
  • Authors:

“Excisive Hierarchical Clustering Methods For Network Data” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 0.49 Mbs, the file-s for this book were downloaded 30 times, the file-s went public at Fri Jun 29 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find Excisive Hierarchical Clustering Methods For Network Data at online marketplaces:


35Bayesian Clustering Of Replicated Time-course Gene Expression Data With Weak Signals

By

To identify novel dynamic patterns of gene expression, we develop a statistical method to cluster noisy measurements of gene expression collected from multiple replicates at multiple time points, with an unknown number of clusters. We propose a random-effects mixture model coupled with a Dirichlet-process prior for clustering. The mixture model formulation allows for probabilistic cluster assignments. The random-effects formulation allows for attributing the total variability in the data to the sources that are consistent with the experimental design, particularly when the noise level is high and the temporal dependence is not strong. The Dirichlet-process prior induces a prior distribution on partitions and helps to estimate the number of clusters (or mixture components) from the data. We further tackle two challenges associated with Dirichlet-process prior-based methods. One is efficient sampling. We develop a novel Metropolis-Hastings Markov Chain Monte Carlo (MCMC) procedure to sample the partitions. The other is efficient use of the MCMC samples in forming clusters. We propose a two-step procedure for posterior inference, which involves resampling and relabeling, to estimate the posterior allocation probability matrix. This matrix can be directly used in cluster assignments, while describing the uncertainty in clustering. We demonstrate the effectiveness of our model and sampling procedure through simulated data. Applying our method to a real data set collected from {\it Drosophila} adult muscle cells after five-minute Notch activation, we identify 14 clusters of different transcriptional responses among 163 differentially expressed genes, which provides novel insights into underlying transcriptional mechanisms in the Notch signaling pathway. The algorithm developed here is implemented in the R package DIRECT, available on CRAN.

“Bayesian Clustering Of Replicated Time-course Gene Expression Data With Weak Signals” Metadata:

  • Title: ➤  Bayesian Clustering Of Replicated Time-course Gene Expression Data With Weak Signals
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 19.37 Mbs, the file-s for this book were downloaded 82 times, the file-s went public at Sun Sep 22 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Bayesian Clustering Of Replicated Time-course Gene Expression Data With Weak Signals at online marketplaces:


36Precision Growth Index Using The Clustering Of Cosmic Structures And Growth Data

By

We use the clustering properties of Luminous Red Galaxies (LRGs) and the growth rate data provided by the various galaxy surveys in order to constrain the growth index ($\gamma$) of the linear matter fluctuations. We perform a standard $\chi^2$-minimization procedure between theoretical expectations and data, followed by a joint likelihood analysis and we find a value of $\gamma=0.56\pm 0.05$, perfectly consistent with the expectations of the $\Lambda$CDM model, and $\Omega_{m0} =0.29\pm 0.01$, in very good agreement with the latest Planck results. Our analysis provides significantly more stringent growth index constraints with respect to previous studies, as indicated by the fact that the corresponding uncertainty is only $\sim 0.09 \gamma$. Finally, allowing $\gamma$ to vary with redshift in two manners (Taylor expansion around $z=0$, and Taylor expansion around the scale factor), we find that the combined statistical analysis between our clustering and literature growth data alleviates the degeneracy and obtain more stringent constraints with respect to other recent studies.

“Precision Growth Index Using The Clustering Of Cosmic Structures And Growth Data” Metadata:

  • Title: ➤  Precision Growth Index Using The Clustering Of Cosmic Structures And Growth Data
  • Authors:

“Precision Growth Index Using The Clustering Of Cosmic Structures And Growth Data” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 0.34 Mbs, the file-s for this book were downloaded 27 times, the file-s went public at Sat Jun 30 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find Precision Growth Index Using The Clustering Of Cosmic Structures And Growth Data at online marketplaces:


37The DEEP2 Galaxy Redshift Survey: Clustering Of Galaxies In Early Data

By

We measure the two-point correlation function xi(r) using a sample of 2219 galaxies in an area of 0.32 degrees^2 at z=0.7-1.35 from the first season of the DEEP2 Galaxy Redshift Survey. We find that xi(r) can be approximated by a power-law, xi(r)=(r/r_0)^-gamma, on scales 0.1-20 Mpc/h. In a sample with an effective redshift of z_eff=0.82, for a Lcdm cosmology we find r_0=3.53 +/-0.81 Mpc/h (comoving) and gamma=1.66 +/-0.12, while in a higher-redshift sample with z_eff=1.14 we find r_0=3.14 +/-0.72 Mpc/h and gamma=1.61 +/-0.11. We find that red, absorption-dominated, passively-evolving galaxies have a larger clustering scale length, r_0, and more prominent ``fingers of God'' than blue, emission-line, actively star-forming galaxies. Intrinsically brighter galaxies also cluster more strongly than fainter galaxies at z~1, with a significant luminosity-bias seen for galaxies fainter than M*. Our results are suggestive of evolution in the galaxy clustering within our survey volume and imply that the DEEP2 galaxies, with a median brightness one magnitude fainter than M* have an effective bias b=0.97 +/-0.13 if sigma_{8 DM}=1 today or b=1.20 +/-0.16 if sigma_{8 DM}=0.8 today. Given the strong luminosity-dependence in the bias that we measure at z~1, the galaxy bias at M* may be significantly greater. We note that our star-forming sample at z~1 has very similar selection criteria as the Lyman-break galaxies at z~3 and that our red, absorption-line sample displays a clustering strength comparable to the expected clustering of the Lyman-break galaxy descendants at z~1. Our results demonstrate that the clustering properties in the galaxy distribution seen in the local Universe were largely in place by z~1.

“The DEEP2 Galaxy Redshift Survey: Clustering Of Galaxies In Early Data” Metadata:

  • Title: ➤  The DEEP2 Galaxy Redshift Survey: Clustering Of Galaxies In Early Data
  • Authors: ➤  
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 16.12 Mbs, the file-s for this book were downloaded 71 times, the file-s went public at Tue Sep 17 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find The DEEP2 Galaxy Redshift Survey: Clustering Of Galaxies In Early Data at online marketplaces:


38AN EFFICIENT PRE CLUSTERING ALGORTHIM USING AN UNLABELLED DATA SETS

By

Cluster analysis is one of the primary data analysis methods the type of Pre clustering algorthim used to estimate the no of clusters in unlabelled data sets. The Selection of the no of clusters is an important and challenging issue in cluster analysis. A no of attempts have been made to estimate no of clusters c in a given data sets. They attempt to choose the best partition from a set of alternative partitions. In contrast tendency assesement attempts to estimate c before clustering occursThe project focus on pre clustering tendency and determine the no of clusters in unlabeled data sets during cluster analysis by using proposed methodology Trusted Pre cluster Count Algorthim.

“AN EFFICIENT PRE CLUSTERING ALGORTHIM USING AN UNLABELLED DATA SETS” Metadata:

  • Title: ➤  AN EFFICIENT PRE CLUSTERING ALGORTHIM USING AN UNLABELLED DATA SETS
  • Author: ➤  
  • Language: english-handwritten

“AN EFFICIENT PRE CLUSTERING ALGORTHIM USING AN UNLABELLED DATA SETS” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 2.72 Mbs, the file-s for this book were downloaded 1 times, the file-s went public at Fri Aug 22 2025.

Available formats:
Archive BitTorrent - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find AN EFFICIENT PRE CLUSTERING ALGORTHIM USING AN UNLABELLED DATA SETS at online marketplaces:


39Java Data Structure Optimization Of Talent Training Performance Evaluation System Based On Distributed Hierarchical Data Clustering Algorithm

By

In the implementation scheme, the data source uses the message middleware Kafka, and the intermediate results are stored in the memory database Redis through serialization technology. Multi-faceted experimental results and analysis show that social environmental factors, social value orientation utilitarian. Based on this, the following main countermeasures are put forward to enhance the endogenous motivation of applied talents to cultivate college students: to stimulate their own endogenous motivation from the student level, to improve their own subject consciousness, to adjust their own need’s structure, and therefore to teach mixed data structure based on Java. Demonstration system for optimized design. First, we should optimize the design of the framework of the hybrid data structure teaching demonstration system, and then optimize the design of the teaching demonstration database, that is, optimize the design of the database tables.

“Java Data Structure Optimization Of Talent Training Performance Evaluation System Based On Distributed Hierarchical Data Clustering Algorithm” Metadata:

  • Title: ➤  Java Data Structure Optimization Of Talent Training Performance Evaluation System Based On Distributed Hierarchical Data Clustering Algorithm
  • Author: ➤  

“Java Data Structure Optimization Of Talent Training Performance Evaluation System Based On Distributed Hierarchical Data Clustering Algorithm” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 3.54 Mbs, the file-s for this book were downloaded 16 times, the file-s went public at Sat Sep 23 2023.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find Java Data Structure Optimization Of Talent Training Performance Evaluation System Based On Distributed Hierarchical Data Clustering Algorithm at online marketplaces:


40Computer-Aided Clustering Analysis Of Short-Term Interactive Data Of Industry Microblog Marketing Effect And Number Of Fans Based On Quantum Evolutionary Game Algorithm

By

Computer-Aided Clustering Analysis of Short-Term Interactive Data of Industry Microblog Marketing Effect and Number of Fans Based on Quantum Evolutionary Game Algorithm

“Computer-Aided Clustering Analysis Of Short-Term Interactive Data Of Industry Microblog Marketing Effect And Number Of Fans Based On Quantum Evolutionary Game Algorithm” Metadata:

  • Title: ➤  Computer-Aided Clustering Analysis Of Short-Term Interactive Data Of Industry Microblog Marketing Effect And Number Of Fans Based On Quantum Evolutionary Game Algorithm
  • Author:

“Computer-Aided Clustering Analysis Of Short-Term Interactive Data Of Industry Microblog Marketing Effect And Number Of Fans Based On Quantum Evolutionary Game Algorithm” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 3.53 Mbs, the file-s for this book were downloaded 26 times, the file-s went public at Sat Sep 23 2023.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find Computer-Aided Clustering Analysis Of Short-Term Interactive Data Of Industry Microblog Marketing Effect And Number Of Fans Based On Quantum Evolutionary Game Algorithm at online marketplaces:


41DTIC ADA366202: Density Biased Sampling: An Improved Method For Data Mining And Clustering

By

Data mining in large data sets often requires a sampling or summarization step to form an in-core representation of the data that can be processed more efficiently. Uniform random sampling is frequently used in practice and also frequently criticized because it will miss small clusters. Many natural phenomena are known to follow Zipf's distribution and the inability of uniform sampling to find small clusters is of practical concern. Density Biased Sampling is proposed to probabilistically under-sample dense regions and over-sample light regions. A weighted sample is used to preserve the densities of the original data. Density biased sampling naturally includes uniform sampling as a special case. A memory efficient algorithm is proposed that approximates density biased sampling using only a single scan of the data. We empirically evaluate density biased sampling using synthetic data sets that exhibit varying cluster size distributions. Our proposed method scales linearly and out performs uniform samples when clustering realistic data sets.

“DTIC ADA366202: Density Biased Sampling: An Improved Method For Data Mining And Clustering” Metadata:

  • Title: ➤  DTIC ADA366202: Density Biased Sampling: An Improved Method For Data Mining And Clustering
  • Author: ➤  
  • Language: English

“DTIC ADA366202: Density Biased Sampling: An Improved Method For Data Mining And Clustering” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 28.92 Mbs, the file-s for this book were downloaded 74 times, the file-s went public at Tue Apr 24 2018.

Available formats:
Abbyy GZ - Additional Text PDF - Archive BitTorrent - DjVuTXT - Djvu XML - Image Container PDF - JPEG Thumb - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - chOCR - hOCR -

Related Links:

Online Marketplaces

Find DTIC ADA366202: Density Biased Sampling: An Improved Method For Data Mining And Clustering at online marketplaces:


42Improving Clustering With Metabolic Pathway Data.

By

This article is from BMC Bioinformatics , volume 15 . Abstract Background: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.

“Improving Clustering With Metabolic Pathway Data.” Metadata:

  • Title: ➤  Improving Clustering With Metabolic Pathway Data.
  • Authors:
  • Language: English

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 20.71 Mbs, the file-s for this book were downloaded 91 times, the file-s went public at Wed Oct 22 2014.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - JPEG Thumb - JSON - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Improving Clustering With Metabolic Pathway Data. at online marketplaces:


43A Data-driven Investigation Of Childhood Adversity And Neural Development: Examining Longitudinal Clustering Of Deprivation, Threat, And Neural Structure Using Network Analysis

By

Childhood adversity exposure is common and is associated with unfavorable physical and mental health outcomes (Cicchetti & Toth, 1995; McLaughlin et al., 2012). While cumulative risk models of adversity have been instrumental in demonstrating that children with more adversity exposure are at greatest risk for unfavorable outcomes (Felitti et al., 1998), they do not elucidate the mechanisms by which adversity confers this risk. The Dimensional Model of Adversity and Psychopathology (DMAP) proposes two dimensions of adversity exposure – deprivation and threat – which differentially impact developmental outcomes through separate neurodevelopmental pathways (McLaughlin, Sheridan, & Lambert, 2014; Sheridan & McLaughlin, 2014). The model proposes that deprivation is associated with neural structure in areas of the brain that support complex cognitive functions (e.g., executive function, language, associative learning), like the frontoparietal control and dorsal attention networks. Another recent conceptual model proposes that early deprivation exposure may impact early-developing visual regions in the ventral visual stream that may scaffold the development of higher-order cognitive skills (Rosen et al., 2019). The DMAP further proposes that threat exposures are associated with structure in areas of the brain that support fear learning, emotion regulation, and threat perception like the limbic network, salience network, default mode network, amygdala, and hippocampus. Previous hypothesis-driven work has supported that deprivation and threat exposures differentially predict neural structure separately in childhood and adolescence (Busso et al., 2017; McLaughlin et al., 2016; 2019; McLaughlin, Sheridan, Winter, et al., 2014; Rosen et al., 2018; Sheridan, Copeland, et al., 2019). There is, however, a lack of consistent longitudinal evidence supporting the association between adversity exposures and neurodevelopmental trajectories. Furthermore, data-driven tests of the DMAP could support its relevance in understanding developmental outcomes following adversity exposures relative to cumulative risk models (e.g., Sheridan, Shi, et al., 2019). In the present study, we will take a network analytical approach to examine clusters of adversity exposures and structural neural development in a longitudinal neuroimaging sample of children and adolescents. The study involved two study visits starting with youth aged 8-16 years in the Seattle area. Participants were recruited for increased likelihood of maltreatment exposure. At the first visit, participants reported on lifetime deprivation and threat exposures and underwent structural magnetic resonance imaging (MRI). Approximately two years later, subjects completed a follow-up neuroimaging assessment using the same scanner.

“A Data-driven Investigation Of Childhood Adversity And Neural Development: Examining Longitudinal Clustering Of Deprivation, Threat, And Neural Structure Using Network Analysis” Metadata:

  • Title: ➤  A Data-driven Investigation Of Childhood Adversity And Neural Development: Examining Longitudinal Clustering Of Deprivation, Threat, And Neural Structure Using Network Analysis
  • Authors:

Edition Identifiers:

Downloads Information:

The book is available for download in "data" format, the size of the file-s is: 0.14 Mbs, the file-s for this book were downloaded 2 times, the file-s went public at Fri Aug 20 2021.

Available formats:
Archive BitTorrent - Metadata - ZIP -

Related Links:

Online Marketplaces

Find A Data-driven Investigation Of Childhood Adversity And Neural Development: Examining Longitudinal Clustering Of Deprivation, Threat, And Neural Structure Using Network Analysis at online marketplaces:


44Microsoft Research Audio 104362: Dealing With Data: Classification, Clustering And Ranking

By

This talk will be focused on the following three pieces of work that we have done: How to utilize unlabeled data in classification? In many real-world machine learning problems, such as web categorization, only few labeled examples can be available since labeling needs human labor, and unlabeled data are far easy to obtain. So, naturally, one may wonder if we can utilize unlabeled data in our classification tasks. I will present a simple, powerful and mathematically clean approach to this problem, and demonstrate its good experimental results provided by the third party on a number of machine learning benchmarks. Our approach has been considered as state of the art in machine learning literature. How to partition directed graphs like the Web? Spectral clustering for undirected graphs has been being extensively studied since a mathematician Fiedler’s seminal work in 1970’s. The spectral method is so powerful that many people have attempted to generalize it to directed graphs. Among them the most popular one is perhaps Jon Kleinberg’s HITS algorithm for both ranking web pages and detecting web communities. In 2003, Monika Henzinger, the former research director at Google Inc., listed this generalization issue as one of six algorithmic challenges in web search engines. I will show how we thoroughly solve this problem via Markov chain theory, and also the application of our approach to real-world web data. This approach can be implemented with several lines of Matlab code. How to rank objects like images and texts? Link-based ranking has enjoyed a huge success in web search engines. However, in practice, many types of data have no link structure but being modeled as vectors in Euclidean spaces, for instance, texts and images. A principled way of ranking those kinds of data is to explore and exploit their intrinsic geometrical or manifold structure. I will show how we address this issue in a simple mathematical framework. Our approach has been widely used by different communities from image retrieval to bioinformatics. In addition, I will also talk about some theoretic analysis around those approaches, and discuss future extensions. ©2006 Microsoft Corporation. All rights reserved.

“Microsoft Research Audio 104362: Dealing With Data: Classification, Clustering And Ranking” Metadata:

  • Title: ➤  Microsoft Research Audio 104362: Dealing With Data: Classification, Clustering And Ranking
  • Author:
  • Language: English

“Microsoft Research Audio 104362: Dealing With Data: Classification, Clustering And Ranking” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "audio" format, the size of the file-s is: 39.43 Mbs, the file-s for this book were downloaded 5 times, the file-s went public at Sat Nov 23 2013.

Available formats:
Archive BitTorrent - Item Tile - Metadata - Ogg Vorbis - PNG - VBR MP3 -

Related Links:

Online Marketplaces

Find Microsoft Research Audio 104362: Dealing With Data: Classification, Clustering And Ranking at online marketplaces:


45Clustering Data By Inhomogeneous Chaotic Map Lattices

By

A new approach to clustering, based on the physical properties of inhomogeneous coupled chaotic maps, is presented. A chaotic map is assigned to each data-point and short range couplings are introduced. The stationary regime of the system corresponds to a macroscopic attractor independent of the initial conditions. The mutual information between couples of maps serves to partition the data set in clusters, without prior assumptions about the structure of the underlying distribution of the data. Experiments on simulated and real data sets show the effectiveness of the proposed algorithm.

“Clustering Data By Inhomogeneous Chaotic Map Lattices” Metadata:

  • Title: ➤  Clustering Data By Inhomogeneous Chaotic Map Lattices
  • Authors:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 4.94 Mbs, the file-s for this book were downloaded 68 times, the file-s went public at Sun Sep 22 2013.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - Item Tile - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find Clustering Data By Inhomogeneous Chaotic Map Lattices at online marketplaces:


46ERIC ED479825: Developing Learner Concentric Learning Outcome Typologies Using Clustering And Decision Trees Of Data Mining.

By

This study aims to address learning outcomes from the perspective of learners. The research questions asked were: (1) What learner concentric ideas can be used to indicate the outcomes of learning? (2) How would the new learning outcome index be used to generate typologies? (3) What are the inner relationships of the typologies? and (4) How can the typologies be applied? The author argues that typology is fundamental to science, yet it is seriously underused and under-researched in social science. Only a handful of authors have worked on the subject, which has created an insurmountable gap between what has been done and what needs to happen. The study examined first-time college students enrolled in spring 1996 at a suburban college on the West coast with enrollment of 15,000 per semester. The study tracked the students for 6 years for their enrollment behavior, graduation status, and transfer status. The study theorizes that what may be important to the learner may not be important to the institution. The author argues that the "big three" (gender, age, race) are not good predictors of typologies, and that learners' behaviors as reflected by the OIndex are a better way of describing learner outcomes. (Contains 11 references.) (NB)

“ERIC ED479825: Developing Learner Concentric Learning Outcome Typologies Using Clustering And Decision Trees Of Data Mining.” Metadata:

  • Title: ➤  ERIC ED479825: Developing Learner Concentric Learning Outcome Typologies Using Clustering And Decision Trees Of Data Mining.
  • Author:
  • Language: English

“ERIC ED479825: Developing Learner Concentric Learning Outcome Typologies Using Clustering And Decision Trees Of Data Mining.” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 17.50 Mbs, the file-s for this book were downloaded 101 times, the file-s went public at Wed Jan 20 2016.

Available formats:
Abbyy GZ - Animated GIF - Archive BitTorrent - DjVu - DjVuTXT - Djvu XML - JPEG Thumb - Metadata - Scandata - Single Page Processed JP2 ZIP - Text PDF -

Related Links:

Online Marketplaces

Find ERIC ED479825: Developing Learner Concentric Learning Outcome Typologies Using Clustering And Decision Trees Of Data Mining. at online marketplaces:


47Classification And Clustering For Samples Of Event Time Data Using Non-homogeneous Poisson Process Models

By

Data of the form of event times arise in various applications. A simple model for such data is a non-homogeneous Poisson process (NHPP) which is specified by a rate function that depends on time. We consider the problem of having access to multiple independent samples of event time data, observed on a common interval, from which we wish to classify or cluster the samples according to their rate functions. Each rate function is unknown but assumed to belong to a small set of rate functions defining distinct classes. We model the rate functions using a spline basis expansion, the coefficients of which need to be estimated from data. The classification approach consists of using training data for which the class membership is known and to calculate maximum likelihood estimates of the coefficients for each group, then assigning test samples to a class by a maximum likelihood criterion. For clustering, by analogy to the Gaussian mixture model approach for Euclidean data, we consider a mixture of NHPP models and use the expectation maximisation algorithm to estimate the coefficients of the rate functions for the component models and probability of membership for each sample to each model. The classification and clustering approaches perform well on both synthetic and real-world data sets considered. Code associated with this paper is available at https://github.com/duncan-barrack/NHPP .

“Classification And Clustering For Samples Of Event Time Data Using Non-homogeneous Poisson Process Models” Metadata:

  • Title: ➤  Classification And Clustering For Samples Of Event Time Data Using Non-homogeneous Poisson Process Models
  • Authors:

“Classification And Clustering For Samples Of Event Time Data Using Non-homogeneous Poisson Process Models” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 1.35 Mbs, the file-s for this book were downloaded 18 times, the file-s went public at Sat Jun 30 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find Classification And Clustering For Samples Of Event Time Data Using Non-homogeneous Poisson Process Models at online marketplaces:


48Massive Data Clustering In Moderate Dimensions From The Dual Spaces Of Observation And Attribute Data Clouds

By

Cluster analysis of very high dimensional data can benefit from the properties of such high dimensionality. Informally expressed, in this work, our focus is on the analogous situation when the dimensionality is moderate to small, relative to a massively sized set of observations. Mathematically expressed, these are the dual spaces of observations and attributes. The point cloud of observations is in attribute space, and the point cloud of attributes is in observation space. In this paper, we begin by summarizing various perspectives related to methodologies that are used in multivariate analytics. We draw on these to establish an efficient clustering processing pipeline, both partitioning and hierarchical clustering.

“Massive Data Clustering In Moderate Dimensions From The Dual Spaces Of Observation And Attribute Data Clouds” Metadata:

  • Title: ➤  Massive Data Clustering In Moderate Dimensions From The Dual Spaces Of Observation And Attribute Data Clouds
  • Author:

“Massive Data Clustering In Moderate Dimensions From The Dual Spaces Of Observation And Attribute Data Clouds” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 0.32 Mbs, the file-s for this book were downloaded 21 times, the file-s went public at Sat Jun 30 2018.

Available formats:
Archive BitTorrent - Metadata - Text PDF -

Related Links:

Online Marketplaces

Find Massive Data Clustering In Moderate Dimensions From The Dual Spaces Of Observation And Attribute Data Clouds at online marketplaces:


49Ensembled Elbow And Bray-Curtis Fuzzy C-Means Clustering For Energy Efficient Data Aggregation In WSN

By

Wireless sensor network (WSN) comprises the distributed sensors for aggregating and organizing the data. Data aggregation is the major concern in WSN since it relies on several factors, namely energy constraints of sensors, network topology, links conditions and so on. The conventional approach does not perform efficient data aggregation due to their battery power of nodes and degrade the network lifetime. To improve data aggregation and network lifetime, An Energy-Efficient Ensembled Elbow Fuzzy C-means Clustering based Data Aggregation (EEEEFCC-DA) method is designed. Initially, residual energy of each sensor node (SN) is calculated. To determine the number of clusters, the elbow method is used in fuzzy c-means clustering algorithm. Then, Centroids value is calculated for every cluster to group SNs. Bray-Curtis Similarity Index is used to compute the similarity between the SN and Centroids value of cluster. SNs are grouped depends on the similarity value. The process gets iterated until every SNs gets clustered to the suitable clusters. After that, the SN with higher residual energy is selected as cluster head (CH). CH gathers data from each SNs and send to sink node. This, assist to enhance the data gathering accuracy and lessen the energy consumption. Simulation of EEEEFCC-DA method is carried out with various metrics namely energy consumption, network lifetime, data aggregation accuracy (DAA) and data aggregation time with number of SNs and number of data packets (DP). Results show that EEEEFCC-DA method provides better performance in term of DAA , network lifetime , energy consumption and data aggregation time than the conventional methods

“Ensembled Elbow And Bray-Curtis Fuzzy C-Means Clustering For Energy Efficient Data Aggregation In WSN” Metadata:

  • Title: ➤  Ensembled Elbow And Bray-Curtis Fuzzy C-Means Clustering For Energy Efficient Data Aggregation In WSN
  • Author:
  • Language: English

“Ensembled Elbow And Bray-Curtis Fuzzy C-Means Clustering For Energy Efficient Data Aggregation In WSN” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 18.75 Mbs, the file-s for this book were downloaded 111 times, the file-s went public at Fri Mar 19 2021.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find Ensembled Elbow And Bray-Curtis Fuzzy C-Means Clustering For Energy Efficient Data Aggregation In WSN at online marketplaces:


50An Improved Clustering Based On K-means For Hotspots Data

By

Riau province is one of the provinces in Indonesia where forest fires frequently occur every year. Hotspot data is geothermal points and they can be utilized as an indicator of forest fires. Clustering’s method can be used to analyze potential forest fires from hotspot data’s cluster pattern. In this study, hybrid genetic algorithm polygamy with K-means (GAP K-means) was used for hotspot data clustering. GA polygamy was used to determine the initial centroid of K-means. It was used to solve the sensitivity of K-means to the initial centroid, and to find the optimal solution faster. Experimentally compared the performance of GAP K-means, GA K-means, and K-means on the hotspots data, two artificial datasets, and three real-life datasets. Sum square error (SSE), davies bouldin index (DBI), silhouette coefficient (SC) and F-measure are used to evaluation clustering. Based this experiment, GAP K-means outperforms than K-means but GAP K-means still not fast to achieve convergent than GA K-means. 

“An Improved Clustering Based On K-means For Hotspots Data” Metadata:

  • Title: ➤  An Improved Clustering Based On K-means For Hotspots Data
  • Author: ➤  
  • Language: English

“An Improved Clustering Based On K-means For Hotspots Data” Subjects and Themes:

Edition Identifiers:

Downloads Information:

The book is available for download in "texts" format, the size of the file-s is: 7.37 Mbs, the file-s for this book were downloaded 13 times, the file-s went public at Tue Dec 10 2024.

Available formats:
Archive BitTorrent - DjVuTXT - Djvu XML - Item Tile - Metadata - OCR Page Index - OCR Search Text - Page Numbers JSON - Scandata - Single Page Processed JP2 ZIP - Text PDF - chOCR - hOCR -

Related Links:

Online Marketplaces

Find An Improved Clustering Based On K-means For Hotspots Data at online marketplaces:


Buy “Data Clustering” online:

Shop for “Data Clustering” on popular online marketplaces.