Column with Contents

	Working Group on Knowledge Discovery and Management COST Action 282
Main Goals Network Members Network Map Agenda Participants Short Missions Coordinators Carlos Bento Werner Dubitzky
		HEADLINES -------------------------------------------- NEWS for those attending the Group First Meeting in Coleraine. -------------------------------------------- Group First Meeting 24-25 May 2002 Werner Dubitzky (Chairman) University of Ulster School of Biomedical Sciences, Cromore Road, Coleraine BT52 1SA, Northern Ireland Phone: +44-(0)28-70-324478, Fax: +44-(0)28-70-324965
		Main Goals
	The importance of the area of knowledge discovery and knowledge management is evidenced by the increasing number of scientific works and events that take place each year on this subject. Both science and industry communities have understood the importance of this area and are working on knowledge discovery and management (KD&M). According to the Knowledge Management Forum, an architecture for a KD&M system comprises: creation; retention; transfer and utilization of knowledge. This architecture is depicted in Fig. 1.

	Fig. 1 Architecture for KD&M system. Knowledge discovery is a fundamental task in knowledge creation and case-based reuse and maintenance are important approaches for knowledge utilization and retention. The COST 282 working group is devoted to research and application of knowledge discovery and case-based techniques for KD&M. Some areas of application interest within this group include but are not limited to: Life science (biology, chemistry, medicine) Software design Geo-referenced multimedia and Internet Given the complexity of the domains tackled, the data representation and knowledge representations will be correspondingly more complex than those encountered by current knowledge discovery algorithms. Domain knowledge will also play an increasingly more critical part of the overall modeling. This working group will look to the development of knowledge discovery and case-based techniques that can incorporate domain knowledge and utilize it effectively during the discovery of novel, non-trivial, and useful knowledge from related data sets. As knowledge discovery and utilization involve also important aspects concerning creative capabilities, we are also strongly interested in developing synergies with COST282 WG4 on Computational Creativity. Application Examples Life/biological Science Novel high-throughput technologies such as DNA microarrays are generating an overwhelming plethora of biological data. Classical statistical and data mining methods have not been developed to address the specific requirements of life science applications. First, the analysis of gene expression microarrays is hampered by the high dimensionality of the feature space that often exceeds the sample space dimensionality by a factor of 1,000 and more. Traditional statistical and knowledge discovery methods and applications do not have this property. Second, the fact that the gene expression data are very noisy represents another challenge. Typical statistical and standard data mining are very sensitive to noise. Third, most of the existing methods operate in weak-theory domains, and thus have not adequate mechanism for effectively and efficiently integrating background knowledge into the discovery process. The life science community has accumulated huge amounts of background knowledge most of which is accessible through the Internet in electronic format (databases, information bases, knowledge bases, and document bases). The challenge to integrate this information into an automated discovery process is formidable, because of the physical (global on the Internet) and conceptual distribution of the information, and because of the sheer scale of available knowledge. Finally, despite the fact that artificial intelligence methods are closely related to statistics and statistical principles, most techniques do not address the life scientist's requirement for qualifying analytical results by means of confidence measures or a p-values. Software Design Software design is a complex activity that involves skills on software analysis, programming and reuse. In general software designers do not start design from scratch. They search for pieces of software useful for their task and adapt and integrate them into a new application that in its way will constitute new knowledge which will be integrated into a software library. It is a challenging activity to discover the most suitable pieces of software (knowledge) for a specific task and the most effective adaptation and reuse procedures in a framework that is typical for a KD&M process. Within this working group we intend to research methods and architectures suitable for KD&M on software design. Geo-referenced Multimedia and Internet Multimedia data bases and the Internet are probably some of the information sources for which the problematic of KD&M is more evident. In this way much attention has been devoted to the subject. One aspect that we want to work on is the development of discovery and management algorithms for heterogeneous sources of information (like multimedia and Internet) on the presence of geo-referenced data (data comprising geographical information). We are particularly interested on the application of these techniques to the domains of medicine and biology.