HDM'26

Unprecedented technological advances lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and proteomics, biomedical imaging, signal processing, astrophysics, finance, web and social networks analysis, among many others. Propelled by the new awareness in the importance of data, practitioners from all areas maintain large repositories of high-dimensional data, albeit only some of them are tagged/labelled, most are unlabelled raw data waiting to be taken advantage of. The number of features in such data is often of the order of thousands or millions, that is much larger than the available (labelled or unlabelled) sample size.

Moreover, high dimensional data with limited sample sizes come with a number of challenges:

High dimensional geometry defeats our intuition rooted in low dimensional experiences, which makes data presentation and visualisation particularly challenging.
Phenomena that occur in high dimensional probability spaces, such as the concentration of measure, are counter-intuitive for the data mining practitioner. For instance, distance concentration is the phenomenon that the contrast between pairwise distances may vanish as the dimensionality increases.
Bogus correlations and misleading estimates may result when trying to fit complex models for which the effective dimensionality is too large compared to the number of data points available.
The accumulation of noise may confound our ability to find low dimensional intrinsic structure hidden in the high dimensional data.
The computation cost of processing high dimensional data or carrying out optimisation over high dimensional parameter domains is often prohibiting.

In addition to the classical challenges, modern machine learning systems introduce a new need: high-dimensional representations should not only be compact and predictive, but also interpretable. In many real-world applications, dimensionality reduction methods produce latent embeddings that are difficult to understand, validate, and trust. As a result, there is growing interest in interpretable dimensionality reduction, where reduced representations preserve semantic meaning and enable human understanding.

This workshop aims to bring together researchers from databases, data science, machine learning, and statistics to cross-pollinate ideas, facilitate collaboration, and expand the breadth and reach of methods and technology to address the curses, exploit the blessings of high dimensionality in data mining, and forge new directions in data mining research.

This year we would like to particularly encourage work that explores interpretable high dimensional data mining as well as work that counters the issues of low sample size and takes advantage of unlabelled or auxiliary data for high dimensional data mining. Topics of interest include (but are not limited to) the following:

Learning and mining with weak supervision, exploiting unlabelled data in high dimensional settings.
Prototype-based dimensionality reduction and explainable representation learning. Interpretable prototype models for high-dimensional data mining.
Managing the tradeoff between computation cost and statistical efficiency.
Models of low intrinsic structure, such as sparse representation, manifold models, latent structure models, overparametrised models, compressible models.
Effect and mitigation of noise and the curse of dimensionality on data mining methods.
Theoretical underpinning of data mining where the data dimension can be larger than the sample size.
New data mining techniques that exploit properties of high dimensional data spaces.
Adaptive and non-adaptive dimensionality reduction for high dimensional data sets.
Random projections, and random matrix theory applied to high dimensional data mining.
Functional data mining.
Data mining applications to real problems in science, engineering, businesses, and the humanities, where the data is high dimensional.

Submissions are solicited for oral presentation at the workshop. The author guidance and paper format of the main conference should be followed. All submissions will be peer-reviewed. Accepted papers will be included in the ICDM Workshop Proceedings (separate from ICDM Main Conference Proceedings), and each workshop paper requires a full registration. Meanwhile, duplicate submissions of the same paper to more than one ICDM workshop are forbidden.

Submission deadline: To be announced
Author notification: To be announced

Program Committee

To be announced later.

Organisers

Jakramate Bootkrajang

Associate Professor at the Department of Computer Science, Chiang Mai University. He received his BSc and MSc in Computer Science from Seoul National University in 2008 and 2010, and a PhD in Computer Science from the University of Birmingham in 2014. His research interests include statistical machine learning, learning from unreliable data, high-dimensional data analysis and its applications in astrophysics. He has co-authored over 40 refereed publications. He is chair of the IEEE CIS Task Force on High Dimensional Data Mining (2024-present).
jakramate.b@cmu.ac.th

Ata Kaban

Professor at the School of Computer Science, at the University of Birmingham, UK. She holds a PhD in Computing Science (2001) and a PhD in Musicology (1999). Her research interests include both theoretical and practical aspects of high dimensional machine learning and data mining, dimensionality reduction, randomised data projections, and probabilistic modelling of data. She has co/authored over 90 refereed publications in these areas, and received Best Paper Awards. She is co-chair and founding member of the IEEE CIS Task Force on High Dimensional Data Mining since 2014. She is Chair of the IEEE CIS Data Mining and Big Data Analytics Technical Committee for 2025, 2026.
a.kaban@bham.ac.uk

Overview

Aims and Scope

Submission

Important Dates

Organisation

Program Committee

Organisers

Jakramate Bootkrajang

Ata Kaban