[ecoop-info] CfPart: Workshop on Mining Unstructured Data (MUD)

Sat Sep 4 06:50:18 CEST 2010

==========================
   CALL FOR PARTICIPATION
==========================

Dear Colleagues:

We would like to invite you to participate in the

	2010 Workshop on Mining Unstructured Data (MUD)
        "… because mining unstructured data is like fishing in muddy waters!"

	co-located with the 17th Working Conference on Reverse Engineering (WCRE)
	October 15th (13.30h - 17.00h), 2010, Beverly, Massachusetts, USA

	http://sailhome.cs.queensu.ca/mud

In software development, the knowledge of developers, architects and end users
is spread out across dozens of structured and unstructured development 
artifacts. Although traditionally structured development artifacts such as source code have
been the primary focus of software engineering research, research on unstructured data, such
as free-form text requirements and specifications, mailing lists and bug reports, has 
seen a dramatic increase recently.

Mining unstructured data is very challenging, since it typically requires
dealing with source code snippets embedded in natural language fragments. 
Research communities in information retrieval, 
data mining and natural language processing have explored different techniques 
to mine unstructured data. These techniques are usually limited in scope
and intended for use in specific scenarios.

We feel that the knowledge gathered by the different unstructured data
research efforts should be consolidated and propagated to the
empirical software engineering community at large.

TOPICS AND EXPECTED OUTCOMES

The MUD (Mining Unstructured Data) workshop aims to provide a highly interactive
forum for researchers and developers to put challenges of, solutions for and
experiences with mining unstructured data into a common reference frame and to
build connections between the various communities.

The topics addressed by this workshop include, but are not limited to:

 * classifying techniques for extracting unstructured data
 * identifying open research challenges
 * dealing with imperfect data
 * evaluating the performance of MUD extractors
 * cross-linking unstructured data artifacts

The intended outcomes of the workshop are to:

 * build connections between the various communities that mine unstructured data
 * build a taxonomy of existing techniques and methodologies for mining unstructured data
 * help practitioners find the right tool for their needs
 * identify open problems and challenges for mining unstructured data

We hope that the outcomes of the workshop will provide the basis for a roadmap on 
future research in mining unstructured data.

FORMAT

MUD is a half-day workshop, consisting of a keynote and a panel
session with semi-structured group discussions.

There is NO PAPER SUBMISSION. Instead, we strongly encourage active participation
in the panel and discussion sessions.

KEYNOTE SPEAKER

David Lo: "Mining Execution Traces and Bug Reports: Challenges and Opportunities"

	Abstract:

 	There is a huge mass of unstructured data available to be mined including
	execution traces, bug reports, software forums, etc. For these datasets,
	traditional static analysis techniques could not be used. Thus there is a 
	need for new specialized techniques and the applications of techniques 	
	borrowed from areas like data mining, natural language processing, and 
	information retrieval to help the task of analyzing these diverse varieties 	
	of unstructured data.

	This talk focuses on mining from traces collected during the execution of a 
	program and bug reports expressed in natural language. It highlights 
	various challenges due to the diversity of the data, the quality of the 
	data, the suitability of various off-the-shelf mining algorithms, and the 
	complexity of various mining techniques when applied to software engineering 
	datasets. The talk then proceeds to describe some potential generic 
	techniques that could be used to address the issues. Opportunities in terms 
	of open technical problems and potential benefits on mining execution traces 
	and bug reports will also be highlighted.

David Lo is an assistant professor in the School of Information Systems,
Singapore Management University. His research interests include dynamic program
analysis, specification mining, and pattern mining. He has worked on the
extraction of behavioural models from execution logs and the analysis of textual
bug reports, both of which involve mining of unstructured data. For these
problems, he has investigated the use of data mining, information retrieval, and
natural language processing techniques. Lo received a PhD in computer science
from the National University of Singapore.

VENUE

The workshop is co-located with the 2010 Working Conference on Reverse
Engineering (WCRE), which is the premier research conference on the theory and
practice of recovering information from existing software and systems.

For further details, please see <http://reengineer.org/wcre2010/>.

ORGANIZATION

* Nicolas Bettenburg, Software Analysis and Intelligence Lab (SAIL), 
        Queen's University, Kingston Canada

* Bram Adams, Software Analysis and Intelligence Lab (SAIL),
        Queen's University, Kingston Canada

We look forward to welcoming you at the MUD workshop in Beverly!

	 -The MUD 2010 Organization Committee-