[ecoop-info] CfPart: Workshop on Mining Unstructured Data (MUD)
bram at cs.queensu.ca
Sat Sep 4 06:50:18 CEST 2010
CALL FOR PARTICIPATION
We would like to invite you to participate in the
2010 Workshop on Mining Unstructured Data (MUD)
"… because mining unstructured data is like fishing in muddy waters!"
co-located with the 17th Working Conference on Reverse Engineering (WCRE)
October 15th (13.30h - 17.00h), 2010, Beverly, Massachusetts, USA
In software development, the knowledge of developers, architects and end users
is spread out across dozens of structured and unstructured development
artifacts. Although traditionally structured development artifacts such as source code have
been the primary focus of software engineering research, research on unstructured data, such
as free-form text requirements and specifications, mailing lists and bug reports, has
seen a dramatic increase recently.
Mining unstructured data is very challenging, since it typically requires
dealing with source code snippets embedded in natural language fragments.
Research communities in information retrieval,
data mining and natural language processing have explored different techniques
to mine unstructured data. These techniques are usually limited in scope
and intended for use in specific scenarios.
We feel that the knowledge gathered by the different unstructured data
research efforts should be consolidated and propagated to the
empirical software engineering community at large.
TOPICS AND EXPECTED OUTCOMES
The MUD (Mining Unstructured Data) workshop aims to provide a highly interactive
forum for researchers and developers to put challenges of, solutions for and
experiences with mining unstructured data into a common reference frame and to
build connections between the various communities.
The topics addressed by this workshop include, but are not limited to:
* classifying techniques for extracting unstructured data
* identifying open research challenges
* dealing with imperfect data
* evaluating the performance of MUD extractors
* cross-linking unstructured data artifacts
The intended outcomes of the workshop are to:
* build connections between the various communities that mine unstructured data
* build a taxonomy of existing techniques and methodologies for mining unstructured data
* help practitioners find the right tool for their needs
* identify open problems and challenges for mining unstructured data
We hope that the outcomes of the workshop will provide the basis for a roadmap on
future research in mining unstructured data.
MUD is a half-day workshop, consisting of a keynote and a panel
session with semi-structured group discussions.
There is NO PAPER SUBMISSION. Instead, we strongly encourage active participation
in the panel and discussion sessions.
David Lo: "Mining Execution Traces and Bug Reports: Challenges and Opportunities"
There is a huge mass of unstructured data available to be mined including
execution traces, bug reports, software forums, etc. For these datasets,
traditional static analysis techniques could not be used. Thus there is a
need for new specialized techniques and the applications of techniques
borrowed from areas like data mining, natural language processing, and
information retrieval to help the task of analyzing these diverse varieties
of unstructured data.
This talk focuses on mining from traces collected during the execution of a
program and bug reports expressed in natural language. It highlights
various challenges due to the diversity of the data, the quality of the
data, the suitability of various off-the-shelf mining algorithms, and the
complexity of various mining techniques when applied to software engineering
datasets. The talk then proceeds to describe some potential generic
techniques that could be used to address the issues. Opportunities in terms
of open technical problems and potential benefits on mining execution traces
and bug reports will also be highlighted.
David Lo is an assistant professor in the School of Information Systems,
Singapore Management University. His research interests include dynamic program
analysis, specification mining, and pattern mining. He has worked on the
extraction of behavioural models from execution logs and the analysis of textual
bug reports, both of which involve mining of unstructured data. For these
problems, he has investigated the use of data mining, information retrieval, and
natural language processing techniques. Lo received a PhD in computer science
from the National University of Singapore.
The workshop is co-located with the 2010 Working Conference on Reverse
Engineering (WCRE), which is the premier research conference on the theory and
practice of recovering information from existing software and systems.
For further details, please see <http://reengineer.org/wcre2010/>.
* Nicolas Bettenburg, Software Analysis and Intelligence Lab (SAIL),
Queen's University, Kingston Canada
* Bram Adams, Software Analysis and Intelligence Lab (SAIL),
Queen's University, Kingston Canada
We look forward to welcoming you at the MUD workshop in Beverly!
-The MUD 2010 Organization Committee-
More information about the ecoop-info