Benno Stein is chair of the Web-Technology and Information Systems Group at the Bauhaus-Universität Weimar. His research focuses on modeling and solving data- and knowledge-intensive information processing tasks. He has developed algorithms and tools for information retrieval, data mining, knowledge processing, as well as for engineering design and simulation (patents granted). For several achievements of his research he has been awarded with scientific and commercial prizes. He serves on scientific boards, as reviewer in various relevant conferences and journals, and is the initiator and a co-chair of PAN, an excellence network and evaluation lab on text forensics with focus on authorship analysis, profiling, and reuse detection. He is cofounder and spokesman of the forthcoming Digital Bauhaus Lab Weimar, an interdisciplinary research lab for Computer Science, Media, and Engineering. He is also cofounder (1996) and scientific director of the Art Systems Software Ltd, a world leading company for simulation technology in fluidic engineering.
Professional background: Study at the University of Karlsruhe (1984-1989). Dissertation (1995) and Habilitation (2002) in computer science at the University of Paderborn. Appointment as a full professor for Web Technology and Information Systems at the Bauhaus-Universität Weimar (2005). Research stays at IBM, Germany, and the International Computer Science Institute, Berkeley.
To paraphrase means to rewrite content whilst preserving the original meaning. Paraphrasing is important in fields such as text reuse in journalism, anonymising work, and improving the quality of customer-written reviews, among other. Paraphrasing is often considered as an analysis problem – asking the following question: Are these two sentences (paragraphs) paraphrases?
In our talk we will take the synthesis view and consider the problem of automatically paraphrasing a text. To illustrate both the principles and the potential of our approach we consider the reformulation of a given text such that the text contains an – also given – acrostic. A text contains an acrostic, if the first letters of a range of consecutive lines form a word or phrase. Our approach turns this paraphrasing task into an optimization problem: we use various existing and also new paraphrasing techniques as operators applicable to intermediate versions of a text (e.g., replacing synonyms), and we search for an operator sequence with minimum text quality loss. The experimental analysis shows that we can solve the acrostic generation problem both effectively and efficiently. However, our main contribution lies in the presented technology paradigm: a novel and promising combination of methods from Information Retrieval, Computational Linguistics, and Artificial Intelligence. The approach naturally generalizes to related paraphrasing problems as they occur in shortening or simplifying a given text, writing style obfuscation, answer grading, or e-journalism.