Discover Top Posts Tagged with #icml

At ICML 2024, Consider Machine Learning’s Social Impacts

Observing Silicon AI researchers have strong social concerns. Scientists at ICML 2024 are split between exploring the limits of machine learning and considering the social implications of this technology.

ICML

The first keynote address at the 41st International Conference on Machine Learning (ICML) 2024 in Vienna covered the changing nature of the machine learning field: from collaborative to increasingly competitive, it is becoming more exclusive and expensive due to the resources and compute power needed for this research.

AI was mostly applied in university and industrial lab research until 2022. However, due to its quick user uptake, the primarily academic subject became extremely profitable.

AI technology was primarily used for research projects at university and corporate labs before to 2022. ChatGPT was primarily intended to be a test when OpenAI made it available to the public in late 2022, but its quick user uptake transformed the primarily academic sector into a goldmine.

Speaking to 500 people in one of the largest rooms at the venue, Soumith Chintala, who oversees AI projects at Meta in New York, stated, “GPU compute and pretraining of data needs the deep pockets of billionaires. VCs and large corporates.”

Companies have been ceasing open research more frequently as computing and engineering become more expensive and concerns about data safety, legality, and societal impact arise. Big firms care about data safety, legality, and societal impact, he continued, even though they are racing to be leaders in this research.

Twenty years ago, Katherine Heller, a research scientist in Responsible AI at Google Research and the program chair of ICML 2024, said, “they started off as a close-knit worldwide community of about 500 researchers.” Heller has been attending this AI conference on a regular basis for the past twenty years.

ICML 2024 Results

She has been selecting seminars and tutorials for the week-long event, inviting speakers, scheduling events, and vetting entries for the previous few months. “This year, they evaluated over 10,000 papers from 12,000 authors, and they chose roughly 2,600 to present,” she stated, characterizing the ML community’s expansion as “exponential.”

At ICML 2024, the International Conference on Machine Learning, scientists must choose a choice. The community is divided between those who are eager to push machine learning’s boundaries and those who believe it is crucial to take the social implications of this powerful technology into account.

Extending the Boundaries of Innovations and Advancements in Machine Learning: This team of academics aims to improve the technical aspects of machine learning. They are passionate in improved model structures and algorithms as well as cutting-edge applications in fields like finance, healthcare, and autonomous systems. Through their efforts, unprecedented degrees of automation, pattern recognition, and predictive analytics are becoming possible.

ICML Instruction 2024

Developing more effective and efficient neural networks is one way to improve deep learning.

Looking into how machine learning could be changed by quantum computing.

Improving the explainability and interpretability of complex models to make them more understandable.

Scalability is the capacity to create algorithms that can handle big datasets with ease.

The process of combining edge computing, IoT, and machine learning with other state-of-the-art technologies is known as AI integration.

Thinking About the Impacts on Society

Ethics and Responsibility

Still, many scientists emphasize how important it is to consider the broader implications of machine learning. They argue that ethical concerns, biases in AI systems, and the potential for abuse must be addressed in addition to scientific advancements. These experts call for a more controlled and cautious approach to ensure that machine learning benefits society as a whole.

ICML Workshops 2024

Ensuring that algorithms do not perpetuate or reinforce societal biases is a matter of fairness and bias.

Privacy: Ensuring user data security and maintaining confidentiality in machine learning applications.

Machine learning transparency: Explaining decisions and procedures to non-experts.

Regulation and governance involve creating ethical AI rules.

Social Impact: Examining how machine learning affects employment, equity, and social dynamics.

Crossing the Divide

Many attendees of ICML 2024 are aware of how important it is to balance these points of view. They argue that improving machine learning and thinking about how it affects society are complementing goals rather than antagonistic ones. Multidisciplinary research, broad education on the sociological aspects of machine learning, and cooperative efforts between engineers and ethicists are thought to be crucial next stages.

Suggestions and Discussions

Panels & Workshops: Dedicated talks about the implications of machine learning for society, law, and morality.

Collaborative research refers to studies including social scientists, engineers, and policymakers.

Public Engagement: Attempting to educate the broader population about the benefits and risks associated with machine learning.

To sum up

ICML 2024 highlights a pivotal point in the field of machine learning. An increasing awareness that, even as they push the boundaries of technological capabilities, they also need to make sure that these innovations are consistent with moral standards and cultural norms is evident in the ongoing disputes and conversations. This comprehensive approach ought to contribute to the development of a future where machine learning is both responsible and creative.

Read more on govindhtech.com

#ICML2024 #ConsiderMachine #ICML #machinelearning #AItechnology #AIprojects #AI #Google #ICMLInstruction2024 #CrossingDivide #news #ICMLWorkshops2024 #technology #technews #govindhtech

Improving generative modelling with Shortest Path Diffusion (ICML paper)

Diffusion models are currently the most popular algorithms for data generation, with applications in image synthesis, video generation, and molecule design. MediaTek Research has just published a paper showing that the performance of diffusion models is boosted by optimizing the diffusion process to minimize the path taken from the initial to the final data distribution

Website:

https://www.mediatek.com/blog/improving-generative-modelling-with-shortest-path-diffusion-icml-paper

#mediatekresearch #medaitek #ICMLpaper #ICML #generativemodelling

ICML Car Batteries

Keep your car running smoothly with ICML car batteries. With a variety of options to choose from, you can find the perfect battery for your car. Whether you need a new battery for your daily driver or a spare for your weekend getaway, ICML has the right battery for you.

#ICML #Car Batteries #ICML Car Batteries

#machine_learning #ICML #CVPR

ICML2016 notes

Note:write about ICML2016 for me.

http://icml.cc/2016/?page_id=1649

Tutorials

Deep Reinforcement Learning David Silver (Google DeepMind)

A major goal of artificial intelligence is to create general-purpose agents that can perform effectively in a wide range of challenging tasks. To achieve this goal, it is necessary to combine reinforcement learning (RL) agents with powerful and flexible representations. The key idea of deep RL is to use neural networks to provide this representational power. In this tutorial we will present a family of algorithms in which deep neural networks are used for value functions, policies, or environment models. State-of-the-art results will be presented in a variety of domains, including Atari games, 3D navigation tasks, continuous control domains and the game of Go.

Graph Sketching, Streaming, and Space-Efficient Optimization Sudipto Guha (University of Pennsylvania) and Andrew McGregor (University of Massachusetts Amherst)

Graphs are one of the most commonly used data representation tools but existing algorithmicapproaches are typically not appropriate when the graphs of interest are dynamic, stochastic, ordo not ﬁt into the memory of a single machine. Such graphs are often encountered as machinelearning techniques are increasingly deployed to manage graph data and large-scale graph opti-mization problems. Graph sketching is a form of dimensionality reduction for graph data that isbased on using random linear projections and exploiting connections between linear algebra andcombinatorial structure. The technique has been studied extensively over the last ﬁve years andcan be applied in many computational settings. It enables small-space online and data streamcomputation where we are permitted only a few passes (ideally only one) over an input sequence ofupdates to a large underlying graph. The technique parallelizes easily and can naturally be appliedin various distributed settings. It can also be used in the context of convex programming to enablemore eﬃcient algorithms for combinatorial optimization problems such as correlation clustering.One of the main goals of the research on graph sketching is understanding and characterizing thetypes of graph structure and features that can be inferred from compressed representations of therelevant graphs.

Workshop

Abstraction in Reinforcement Learning Daniel Mankowitz, Shie Mannor (Technion Israel Institute of Technology), Timothy Mann (Google Deepmind)

Many real-world domains can be modeled using some form of abstraction. An abstraction is an important tool that enables an agent to focus less on the lower level details of a task and more on solving the task at hand. Temporal abstraction (i.e., options or skills) as well as spatial abstraction (i.e., state space representation) are two important examples. The goal of this workshop is to provide a forum to discuss the current challenges in designing as well as learning abstractions in real-world Reinforcement Learning (RL). http://rlabstraction2016.wix.com/icml

Neural Networks Back To The Future

Léon Bottou (Facebook), David Grangier (Facebook), Tomas Mikolov (Facebook), John Platt (Google)

As research in deep learning is extremely active today, we could take a step back and examine its foundations. We propose to have a critical look at previous work on neural networks, and try to have a better understanding of the differences with today’s work. Previous work can point at promising directions to follow, pitfalls to avoid, ideas and assumptions to revisit. Similarly, today’s progress can allow a critical examination of what should still be investigated, what has been answered… https://sites.google.com/site/nnb2tf

Accepted Papers

Graph embeddings, Clustering周りで面白そうなの集めた

Revisiting Semi-Supervised Learning with Graph Embeddings ,Zhilin Yang Carnegie Mellon University, Ruslan Salakhudinov U. of Toronto, William Cohen CMU

We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.

Compressive Spectral Clustering ,Nicolas TREMBLAY INRIA Rennes, Gilles Puy INRIA, Remi Gribonval INRIA, Pierre Vandergheynst EPFL

Spectral clustering has become a popular technique due to its high performance in many contexts. It comprises three main steps: create a similarity graph between N objects to cluster, compute the first k eigenvectors of its Laplacian matrix to define a feature vector for each object, and run k-means on these features to separate objects into k classes. Each of these three steps becomes computationally intensive for large N and/or k. We propose to speed up the last two steps based on recent results in the emerging field of graph signal processing: graph filtering of random signals, and random sampling of bandlimited graph signals. We prove that our method, with a gain in computation time that can reach several orders of magnitude, is in fact an approximation of spectral clustering, for which we are able to control the error. We test the performance of our method on artificial and real-world network data.

k-variates++: more pluses in the k-means++

kk-means++ seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, kk-variates++, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, \textit{and} a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee, in the form of a \textit{bias+variance} approximation bound of the \textit{global} optimum. This approximation exhibits a reduced dependency on the “noise” component with respect to the optimal potential — actually approaching the statistical lower bound. We show that kk-variates++ \textit{reduces} to efficient (biased seeding) clustering algorithms tailored to specific frameworks; these include distributed, streaming and on-line clustering, with \textit{direct} approximation results for these algorithms. Finally, we present a novel application of kk-variates++ to differential privacy. For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of kk-means++ and its approximation bounds — state of the art contenders appear to be significantly more complex and / or display less favorable (approximation) properties. We stress that our algorithms can still be run in cases where there is \textit{no} closed form solution for the population minimizer. We demonstrate the applicability of our analysis via experimental evaluation on several domains and settings, displaying competitive performances vs state of the art.

CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy ,Nathan Dowlin Princeton, Ran , Kim Laine Microsoft Research, Kristin Lauter Microsoft Research, Michael Naehrig Microsoft Research, John Wernsing Microsoft Research

Applying machine learning to a problem which involves medical, financial, or other types of sensitive data, not only requires accurate predictions but also careful attention to maintaining data privacy and security. Legal and ethical requirements may prevent the use of cloud-based machine learning solutions for such tasks. In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data. This allows a data owner to send their data in an encrypted form to a cloud service that hosts the network. The encryption ensures that the data remains confidential since the cloud does not have access to the keys needed to decrypt it. Nevertheless, we will show that the cloud service is capable of applying the neural network to the encrypted data to make encrypted predictions, and also return them in encrypted form. These encrypted predictions can be sent back to the owner of the secret key who can decrypt them. Therefore, the cloud service does not gain any information about the raw data nor about the prediction it made. We demonstrate CryptoNets on the MNIST optical character recognition tasks. CryptoNets achieve 99% accuracy and can make more than 51000 predictions per hour on a single PC. Therefore, they allow high throughput, accurate, and private predictions.

The Variational Nystrom method for large-scale spectral problems ,Max Vladymyrov Yahoo Labs, Miguel Carreira-Perpinan UC Merced

Spectral methods for dimensionality reduction and clustering require solving an eigenproblem defined by a sparse affinity matrix. When this matrix is large, one seeks an approximate solution. The standard way to do this is the Nystrom method, which first solves a small eigenproblem considering only a subset of landmark points, and then applies an out-of-sample formula to extrapolate the solution to the entire dataset. We show that by constraining the original problem to satisfy the Nystrom formula, we obtain an approximation that is computationally simple and efficient, but achieves a lower approximation error using fewer landmarks and less runtime. We also study the role of normalization in the computational cost and quality of the resulting solution.

Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity ,Ohad Shamir Weizmann Institute of Science

We study the convergence properties of the VR-PCA algorithm introduced by (Shamir, 2015) for fast computation of leading singular vectors. We prove several new results, including a formal analysis of a block version of the algorithm, and convergence from random initialization. We also make a few observations of independent interest, such as how pre-initializing with just a single exact power iteration can significantly improve the runtime of stochastic methods, and what are the convexity and non-convexity properties of the underlying optimization problem.

Convergence of Stochastic Gradient Descent for PCA ,Ohad Shamir Weizmann Institute of Science

We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R^d. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the non-convex nature of the problem, analyzing its performance has been a challenge. In particular, existing guarantees rely on a non-trivial eigengap assumption on the covariance matrix, which is intuitively unnecessary. In this paper, we provide (to the best of our knowledge) the first eigengap-free convergence guarantees for SGD in the context of PCA. This also partially resolves an open problem posed in \cite{hardt2014noisy}. Moreover, under an eigengap assumption, we show that the same techniques lead to new SGD convergence guarantees with better dependence on the eigengap.

Low-Rank Matrix Approximation with Stability ,Dongsheng Li IBM Research – China, Chao Chen , Qin Lv , Junchi Yan , Li Shang , Stephen Chu

Low-rank matrix approximation has been widely adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and noisy, introduces challenges to the algorithm stability — small changes in the training data may significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset, and minimizing the training error may not guarantee error reduction on the testing dataset. In this paper, we investigate the algorithm stability problem of low-rank matrix approximations. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better prediction accuracy compared with both state-of-the-art low-rank matrix approximation methods and ensemble methods in recommendation task.

Fast Methods for Estimating the Numerical Rank of Large Matrices ,Shashanka Ubaru University of Minnesota, Yousef Saad University of Minnesota

We present two computationally inexpensive techniques for estimating the numerical rank of a matrix, combining powerful tools from computational linear algebra. These techniques exploit three key ingredients. The first is to approximate the projector on the non-null invariant subspace of the matrix by using a polynomial filter. Two types of filters are discussed, one based on Hermite interpolation and the other based on Chebyshev expansions. The second ingredient employs stochastic trace estimators to compute the rank of this wanted eigen-projector, which yields the desired rank of the matrix. In order to obtain a good filter, it is necessary to detect a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that correspond to the non-null invariant subspace. The third ingredient of the proposed approaches exploits the idea of spectral density, popular in physics, and the Lanczos spectroscopic method to locate this gap.

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing ,Ke Li UC Berkeley, Jitendra Malik

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by almost all existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality and sub-linear in the size of the dataset and takes space constant in dimensionality and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in dataset density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.

Community Recovery in Graphs with Locality ,Yuxin Chen Stanford University, Govinda Kamath Stanford University, Changho Suh KAIST, David Tse Stanford University

Motivated by applications in domains such as social networks and computational biology, we study the problem of community recovery in graphs with locality. In this problem, pairwise noisy measurements of whether two nodes are in the same community or different communities come mainly or exclusively from nearby nodes rather than uniformly sampled between all nodes pairs, as in most existing models. We present an algorithm that runs linearly in the number of measurements and which achieves the information theoretic limit for exact recovery.

Fast k-means with accurate bounds ,James Newling Idiap Research Institute, Francois Fleuret Idiap research institute

Our first contribution is an accelerated exact k-means algorithm, which performs better than the current state-of-the-art low-dimensional algorithm in 18 of 22 experiments, running up to 3x faster. Our second contribution is a general improvement of existing state-of-the-art accelerated exact k-means algorithms, which relies on better estimates of distance bounds used to reduce the required number of distance calculations, and leads to a speedup in 36 of 44 experiments, running up to 1.8x faster. We have conducted experiments with our own implementations of existing state-of-the-art methods to ensure homogeneous evaluation of performance, and we show that these new implementations perform as well or better than existing available implementations. We also propose simplified variants of standard approaches and show that they are faster than their fully-fledged counterparts in 59 of 62 experiments.

KK-Means Clustering with Distributed Dimensions ,Hu Ding State University of New York at Buffalo, Lingxiao Huang , Yu Liu Tsinghua University, Jian Li

Distributed clustering has attracted significant attention in recent years. In this paper, we study the kk-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.

Non-negative Matrix Factorization under Heavy Noise ,Jagdeep Pani Indian Institute of Science, Ravindran Kannan Microsoft Reseach India, Chiranjib Bhattacharya , Navin Goyal Microsoft Research India

The Noisy Non-negative Matrix factorization (NMF) is: given a data matrix A (d x n), find non-negative matrices B;C (d x k, k x n respy.) so that A = BC +N, where N is a noise matrix. Existing polynomial time algorithms with proven error guarantees require EACH column N_.j to have l1 norm much smaller than ||(BC).j ||_1, which could be very restrictive. In important applications of NMF such as Topic Modeling as well as theoretical noise models (e.g. Gaussian with high sigma), almost EVERY column of N.j violates this condition. We introduce the heavy noise model which only requires the average noise over large subsets of columns to be small. We initiate a study of Noisy NMF under the heavy noise model. We show that our noise model subsumes noise models of theoretical and practical interest (for e.g. Gaussian noise of maximum possible sigma). We then devise an algorithm TSVDNMF which under certain assumptions on B,C, solves the problem under heavy noise. Our error guarantees match those of previous algorithms. Our running time of O(k.(d+n)^2) is substantially better than the O(d.n^3) for the previous best. Our assumption on B is weaker than the “Separability” assumption made by all previous results. We provide empirical justification for our assumptions on C. We also provide the first proof of identifiability (uniqueness of B) for noisy NMF which is not based on separability and does not use hard to check geometric conditions. Our algorithm outperforms earlier polynomial time algorithms both in time and error, particularly in the presence of high noise.

Autoencoding beyond pixels using a learned similarity metric ,Anders B. L. Larsen Technical University of Denmar, Søren Kaae Sønderby University of Copenagen, Hugo Larochelle Twitter, Ole Winther Technical University of Denmark

We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

Matrix Eigendecomposition via Doubly Stochastic Riemannian Optimization ,Zhiqiang Xu Institute of Infocomm Research, Peilin Zhao I2R

Matrix eigendecomposition plays a significant role in machine learning, which is a non-convex quadratic optimization problem. In this paper, we propose a doubly stochastic Riemannian gradient DSRG method to address this problem. Specifically, the double stochasticity comes from the Riemannian counterparts of stochastic Euclidean gradient ascent and stochastic Euclidean coordinate ascent. With the double stochasticity, DSRG can greatly reduce the complexity, and completely avoid the matrix inversion, so that it is very suitable to large matrices eigendecomposition. We theoretically analyze its convergence properties and empirically validate it on real-world datasets. Encouraging experimental results demonstrate its advantages over the non-stochastic counterpart.

Persistence weighted Gaussian kernel for topological data analysis ,Genki Kusano Tohoku University, Yasuaki Hiraoka , Kenji

Topological data analysis (TDA) is an emerging mathematical concept for characterizing shapes in complex data. In TDA, persistence diagrams are widely recognized as a useful descriptor of data, and can distinguish robust and noisy topological properties. This paper proposes a kernel method on persistence diagrams to develop a statistical framework in TDA. The proposed kernel satisfies the stability property and provides explicit control on the effect of persistence. Furthermore, the method allows a fast approximation technique. The method is applied into practical data on proteins and oxide glasses, and the results show the advantage of our method compared to other relevant methods on persistence diagrams.

Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user’s needs. To address this, several methods incorporate constraints obtained from users into clustering algorithms, but unfortunately do not apply to hierarchical clustering. We design an interactive Bayesian algorithm that incorporates user interaction into hierarchical clustering while still utilizing the geometry of the data by sampling a constrained posterior distribution over hierarchies. We also suggest several ways to intelligently query a user. The algorithm, along with the querying schemes, shows promising results on real data.

Interactive Bayesian Hierarchical Clustering ,Sharad Vikram UCSD, Sanjoy Dasgupta UCSD

Cross-graph Learning of Multi-relational Associations ,Hanxiao Liu , Yiming Yan

Cross-graph Relational Learning (CGRL) refers to the problem of predicting the strengths or labels of multi-relational tuples of heterogeneous object types, through the joint inference over multiple graphs which specify the internal connections among each type of objects. CGRL is an open challenge in machine learning due to the daunting number of all possible tuples to deal with when the numbers of nodes in multiple graphs are large, and because the labeled training instances are extremely sparse as typical. Existing methods such as tensor factorization or tensor-kernel machines do not work well because of the lack of convex formulation for the optimization of CGRL models, the poor scalability of the algorithms in handling combinatorial numbers of tuples, and/or the non-transductive nature of the learning methods which limits their ability to leverage unlabeled data in training. This paper proposes a novel framework which formulates CGRL as a convex optimization problem, enables transductive learning using both labeled and unlabeled tuples, and offers a scalable algorithm that guarantees the optimal solution and enjoys a constant time complexity with respect to the sizes of input graphs. In our experiments with a subset of DBLP publication records and an Enzyme multi-source dataset, the proposed method successfully scaled to the large cross-graph inference problem, and outperformed other representative approaches significantly.

Speeding up k-means by approximating Euclidean distances via block vectors ,Thomas Bottesch Ulm University, Thomas Bühler Avira, Markus Kächele Ulm University

This paper introduces a new method to approximate Euclidean distances between points using block vectors in combination with the Holder inequality. By defining lower bounds based on the proposed approximation, cluster algorithms can be considerably accelerated without loss of quality. In extensive experiments, we show a considerable reduction in terms of computational time in comparison to standard methods and the recently proposed Yinyang k-means. Additionally we show that the memory consumption of the presented clustering algorithm does not depend on the number of clusters, which makes the approach suitable for large scale problems.

Faster Eigenvector Computation via Shift-and-Invert Preconditioning ,Dan Garber TTI Chicago, Elad Hazan Princeton University, Chi Jin UC Berkeley, Sham , Cameron Musco Massachusetts Institute of Technology, Praneeth Netrapalli Microsoft Research, Aaron Sidford Microsoft Research

We give faster algorithms and improved sample complexities for the fundamental problem of estimating the top eigenvector of a given matrix. In particular, given an explicit matrix A∈\Rn×dA∈\Rn×d such that Σ=A⊤AΣ=A⊤A, we show how to compute an ϵϵ approximate top eigenvector in time \otilde([\nnz(A)+d\sr(A)\gap2]⋅log1/ϵ)\otilde([\nnz(A)+d\sr(A)\gap2]⋅log⁡1/ϵ). Here \nnz(A)\nnz(A) is the number of nonzeros in AA, \sr(A)\sr(A) is the stable rank, and \gap\gap is the relative eigengap. We also consider an online setting in which, given a stream of i.i.d. samples from a distribution DD with covariance matrix ΣΣ and a vector x0x0 which is an O(\gap)O(\gap) approximate top eigenvector for ΣΣ, we show how to refine x0x0 to an ϵϵ approximation using O(\var(D)\gap⋅ϵ)O(\var(D)\gap⋅ϵ) samples from DD. Here \var(D)\var(D) is a natural notion of variance. Combining our algorithm with previous work to initialize x0x0, we obtain improved sample complexity and runtime results under a variety of assumptions on \D\D. Notably, we show that, for general distributions, we achieve asymptotically optimal accuracy as a function of sample size as the number of samples grows large. We achieve our results by combining a new robust analysis of the classic method of \emph{shift-and-invert} preconditioning, which reduces eigenvector computations to \emph{approximately} solving a sequence of linear systems, with fast stochastic gradient based system solvers to achieve our claims.

Efficient Algorithms for Large-scale Generalized Eigenvector Computation and CCA ,Rong Ge , Chi Jin UC Berkeley, Sham , Praneeth Netrapalli Microsoft Research, Aaron Sidford Microsoft Research

This paper considers the problem of canonical-correlation analysis (CCA) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics. We provide simple iterative algorithms, with improved runtimes, for solving these problems that are globally linearly convergent with moderate dependencies on the condition numbers and eigenvalue gaps of the matrices involved. We obtain our results by reducing CCA to the top-kk generalized eigenvector problem. We solve this problem through a general framework that simply requires black box access to an approximate linear system solver. Instantiating this framework with accelerated gradient descent we obtain a running time of \orderzkκ√ρlog(1/ϵ)log(kκ/ρ)\orderzkκρlog⁡(1/ϵ)log⁡(kκ/ρ) where zz is the total number of nonzero entries, κκ is the condition number and ρρ is the relative eigenvalue gap of the appropriate matrices. Our algorithm is linear in the input size and the number of components kk. This is essential for handling large-scale matrices that appear in practice. To the best of our knowledge this is the first such algorithm with global linear convergence. We hope that our results prompt further research and ultimately improve the practical running time for performing these important data analysis procedures on large data sets.

#ICML #machine-learning #Conference

Microsoft Neural Net & Deep Learning

C O N T E N T S: KEY TOPICS With this group, Hinton hatched a conspiracy: “rebrand” the frowned-upon filed of neural nets with the moniker “Deep Learning”.(More…) Transfer learning for Latin and Chinese characters with deep neural networks.(More…) POSSIBLY USEFUL Yoshua Bengio et al. followed up on this work in 2007 with “Greedy Layer-Wise Training of Deep Networks” 4, in which they present a…

View On WordPress

#AMAX #GPU #ICML #MNIST #Nvidia Drivenet

Con mucho calor #uach #Chile #icml #portrait #pompadour #selfie #vscocam

#icml #pompadour #uach #chile #portrait #vscocam #selfie

New website for dental studio i'm working on.

#icml #implant centre masnic #dental tourism