8th International Conference on Web Intelligence, Mining and Semantics
June 25 – 27 2018, Novi Sad, Serbia
This turorial offers a rich blend of theory and practice regarding dimensionality reduction methods and graph mining algorithms, to deal with challenging issues such as scalability, data noise, and sparsity in recommender systems. Matrix and tensor decomposition methods have been proven to be the most accurate (i.e., Netflix prize) and efficient for handling big data. For each method (SVD, SVD++, HOSVD, CUR, etc.) we will provide a detailed theoretical mathematical background and a step-by-step analysis, by using an integrated toy example, which runs throughout all parts of the tutorial, helping the audience to understand clearly the differences among factorization methods. Moreover, this tutorial surveys important research in a new family of recommender systems aimed at serving multi-dimensional social networks. We will provide the related work for similarity search on graphs. We will see the random walk-based algorithms (i.e., PageRank, SimRank, Katz, etc.) that can be used to provide contextual recommendations in multi-dimensional graphs, where there are many participating entities (users, locations, products, and the time dimension).
Panagiotis Symeonidis is an assistant professor at the Faculty of Computer Science (scientific sector INF/01) of the Free University of Bozen-Bolzano. Before moving to Bolzano he worked for 8 years as assistant professor at the Department of Informatics in Aristotle University of Thessaloniki, Greece. There, he received a Bachelor (BA) in Applied Informatics from Macedonia University of Greece in 1996. He also received a Master diploma (MSc) in Information Systems from the same University in 2004. He received his PhD in Web Mining and Information Retrieval for Personalization from the Department of Informatics in Aristotle University of Thessaloniki, Greece in 2008. His research interests include web mining (usage mining, content mining and graph mining), information retrieval, collaborative filtering, recommender systems, social media in Web 2.0 and location-based social networks. He is the co-author of 3 international books, 1 Greek book, 4 book chapters, 18 journal publications and 29 conference/workshop publications. His articles have received more than 1800 citations from other scientific publications.
Time-series classification is the common denominator in various recognition tasks, such as signature verification, person identification based on keystroke dynamics, detection of cardiovascular diseases and brain disorders (e.g. early stage of Alzheimer disease or dementia). This tutorial aims to give an overview of most prominent challenges (tasks), methods, evaluation protocols and biomedical applications related to time series classification. Besides the "conventional" time series classification task, early classification and semi-supervised classification will be considered. Both preprocessing techniques - Fourier transformation, SAX, etc. - and most prominent classifiers - such as similarity-based, feature-based, motif/shaplet-based classifiers and convolutional neural networks - will be covered. It will be pointed out that carefully designed evaluation protocols are required in order to assess the quality of the models fairly. This includes, depending on the application scenario, realistic assumptions about the availability of training data, careful (e.g. patient-based) train and test splits, etc. Selected applications will be explained, such as classification of functional magnetic resonance imaging (fMRI) data and person identification based on keystroke dynamics.
Krisztian Buza is currently a post-doc research assistant at the University of Bonn. He obtained his Diploma in Computer Science from the Budapest University of Technology and Economics, in 2007; and his Ph.D. from the University of Hildesheim in 2011. He is a co-author of more than 40 publications, including the "best paper" of the IEEE Conference on Computational Science and Engineering (2010) for his work on individualized error prediction for time series classification. His research focuses on time series classification and biomedical applications of machine learning and data mining.
Methods to identify outliers in traffic data use different techniques and formulations, analyzing and translating the traffic data in different ways in order to use statistical techniques, similarity-based techniques, or techniques based on frequent pattern mining. In this tutorial, we give a structured overview relating various approaches to some fundamental outlier detection models. These classic methods (such as the "Local Outlier Factor") are well-understood and have a clear mathematical notion. By relating complex methods (that are adapted and tailored to such a specific application as traffic data) to abstract and fundamental methods, we can better understand their intuition, limitations, and benefits. As a result, practitioners get some guidance for selecting the most suitable methods for their case.
Youcef Djenouri is post doc in the Department for Mathematics and Computer Science (IMADA) at University of Southern Denmark (SDU), in Odense, Denmark. Previously, he was granted a post-doctoral fellowship from the UNIST university on South Korea, and worked as assistant professor at USDB university in Blida, Algeria. Youcef holds bachelor and master-level degrees in Computer Science, involving studies at USDB and USTHB universities in Algeria, where he was ranked fourth and first, respectively in his promotion. He finished his Ph.D. thesis in computer science on "Parallel Association Rule Mining" at USTHB in December 2014. During his Ph.D. he has been granted short-term research visitor internship to ENSMEA University in Poitiers, France. His research interests include data mining, machine learning, parallel computing and artificial intelligence as well as bio-inspired computing. He published more than 30 papers at peer reviewed international conferences and in international journals. He received the "Best Paper Award" at PAKDD's BDM Workshop 2017. Youcef presented several papers on different data mining and parallel computing venues (PAKDD, WIC, PDP, PPAM, and DCAI). For more information please visit this link.
Arthur Zimek is Associate Professor in the Department for Mathematics and Computer Science (IMADA) at University of Southern Denmark (SDU), in Odense, Denmark. Previously he worked as a Privatdozent in the database systems and data mining group at Ludwig-Maximilians-University Munich (LMU), Germany, as a guest professor at Technical University Vienna, Austria, and as a postdoctoral fellow in the department for Computing Science at University of Alberta, Edmonton, Canada. Arthur holds master-level degrees in bioinformatics, philosophy, and theology, involving studies at universities in Germany (TUM, HfPh, LMU Munich, and JGU Mainz) as well as Austria (LFU Innsbruck). He finished his Ph.D.\ thesis in informatics on "Correlation Clustering" at LMU in summer 2008. For this work he received the "SIGKDD Doctoral Dissertation Award (runner-up)" in 2009. His research interests include ensemble techniques for unsupervised learning, clustering, outlier detection, and high dimensional data, developing data mining methods as well as evaluation methodology. He published more than 70 papers at peer reviewed international conferences and in international journals. Together with his co-authors, he received the "Best Paper Honorable Mention Award" at SDM 2008 and the "Best Demonstration Paper Award" at SSTD 2011. Arthur presented several tutorials on different data mining topics at several conferences (SIGKDD, VLDB, PAKDD, ICDM, SDM, ECMLPKDD). For more information please visit this link.
The tendency of k-nearest neighbor graphs constructed from tabular data using some distance measure to contain hubs, i.e. points with in-degree much higher than expected, has drawn a fair amount of attention in recent years due to the observed impact on techniques used in many application domains. This tutorial will be organized into three parts: (1) Origins, which will discuss the causes of the emergence of hubs (and their low in-degree counterparts, the anti-hubs), and their relationships with dimensionality, neighborhood size, distance concentration, and the notion of centrality; (2) Applications, where we will present some notable effects of (anti-)hubs on techniques for machine learning, data mining and information retrieval, identify two different approaches to handling hubs adopted by researchers – through fighting or embracing their existence – and review techniques and applications belonging to the two groups; and (3) Challenges, which will discuss work in progress, open problems, and areas with significant opportunities for hub-related research.
Miloš Radovanović is an associate professor at the Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Serbia, where he received his B.Sc., M.Sc. and Ph.D. degrees. He (co)authored three programming textbooks, a research monograph, and over 60 papers in data mining, machine learning, and related fields. The main focus of his research are phenomena pertaining to high-dimensional data, and their effects on various data mining and machine learning algorithms and applications. His other research interests include time-series and complex-network analysis, as well as computer-science education. From 2009, he is managing editor of the Computer Science and Information Systems (ComSIS) journal.
In order to understand, control or improve a complex system composed out of a large number of inter-related parts it is necessary to quantify, characterize and comprehend the structure and evolution of underlying complex networks. The last two decades have witnessed rapid growth in the number of studies analyzing networks representing complex real-world systems, leading to the emergence of network science, i.e. an interdisciplinary research field focused on metrics, algorithms and models that can reveal, reproduce and explain frequently observed structural and evolutionary characteristics of real-world networks. This tutorial covers methods for analyzing social and information networks in which nodes are augmented with various kinds of attributes. Special emphasis is given to methods for analyzing subgraphs in annotated networks, constructing their block models and mining attachment preferences. The presented methods will be demonstrated on two case studies related to the evaluation of modularized semantic web ontologies and research collaboration.
Miloš Savić is an assistant professor at the Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, where he received his BSc, MSc and PhD degrees in the field of computer science. He defended his PhD thesis entitled "Extraction and analysis of complex networks from different domains" in 2015. His research interests are in the field of complex network analysis with focus on social, information, ontology and software networks. Co-author of 30 research papers published in international journals and proceedings of international conferences. He received the faculty award “Aleksandar Saša Popović” for exceptional research work in the field of computer science in 2011. He is an editorial assistant of the ComSIS journal since 2014. Teaching associate at the Petnica Science Center since 2003. He is also program committee member of several international conferences.