Data Mining and Visual Analytics

Home

Introduction
In recent years, the data mining research community has seen a drift towards the utilization of visual representation of information for analytical reasoning, knowledge extraction and decision making.  This drift has given birth to a new research domain called Visual Analytics. People also use the terms such as Information Visualization and Visual Data Mining to refer to this new and exciting field.

 

The classical visualization pipeline for visual analytics is shown in Figure 1. From Raw Data to producing  a visual representation for a user to interact and acquire knowledge, there are several steps as shown in the figure. Research in all these areas is actively persued by scientists around the world.

Figure 1: Visualization Pipeline for Visual Analytics

The goal of visual analytics is to facilitate the users to interactively search for information, deduce important facts and identify interesting patterns which in turn, can be used by domain experts for decision making. Visualization supports this entire process by involving users to exploit the human capacity to perceive, abstract and understand complex data and information available to the user.

 

The use of visualization is fast becoming a crucial analysis technique in a number of different areas. These areas include but certainly are not limited to:

  • Economics: Stock Market Patterns and Analysis.
  • Sociology: Social Network Analysis.
  • Technology: Exploration of Information on the Web.
  • Tranportation: Optimization of Air, Road and Sea travel across the globe.
  • Geography: Migration behavior for cities, countries and continents.
  • BioInformatics: Analysis and Mining of Biological Networks.

Figure 2 represents visual layouts of data from three of the above mentioned fields. A recent U.S. report to the funding agencies NIH and NSF provides strong arguments in favor of the development of visualization as a research field:

“Visualization is indispensable to the solution of complex problems in every sector, from traditional medical, science and engineering domains to such key areas as financial markets, national security, and public health. Advances in visualization enable researchers to analyze and understand unprecedented amounts of experimental, simulated, and observational data and through this understanding to address problems previously deemed intractable or beyond imagination.”
[from the Executive summary of (Johnson, Moorhead et al. 2006)]

 

Figure 2: Molecular Structure, Social Network of Hollywood Actors and
Metabolic Pathways

Aims and Objectives
The Data Mining and Visual Analytics (DaMiVA) research group aims to develop algorithms, models and systems for technological advancements in the area of Data Mining and Visual Analytics.

Our goal is to focus on large size relational data and develop high speed and efficient algorithms for extraction of knowledge, discovering hidden patterns and support interactive data mining through user interactions.

Often relational data can be represnted through graphs and networks. The term ‘network’ has different significations for people from different walks of life. The term is used extensively to represent systems such as social networks, electrical circuits, economic networks, chemical compounds, transportation systems, epidemic spreading, metabolic pathways, food web, Internet, world wide web, software classes and so on. Although seemingly diverse, these fields have strong common methodological foundations and share methods to analyze, model, understand and organize these networks. We want focus on these real world datasets and address domain specific issues pertaining to respective fields.
 
The idea of the research group is to build on the platform provided by Tulip Software. Tulip is an open source software dedicated to the analysis and visualization of relational data. Tulip aims to provide the developer with a complete library (in C++), supporting the design of interactive information visualization applications for relational data that can be tailored for specific problems. This software is under LGPL licence and can be freely downloaded from its website.

         
Figure 3: Tulip Software with different views for Data Analysis and Mining