|
Within the special research area Sonderforschungsbereich 732 (SFB 732) Lingenio has designed a
research prototype for underspecified syntactic-semantic analysis of corpus text and makes it available in particular for the the project B3 of SFB 732,
Disambiguation of nominalizations in the context of extracting linguistic data from corpus text.
The company takes part in research about sortal properties of nominalizations with -ung in German and the conditions which trigger (partial) disambiguation of such nominalizations in context.
The research prototype makes it possible to assign reliable analyses and (underspecified) representations to a very large percentage of sentences from the corpora looked at (which combine sentences and texts of very different areas).
This is possible because the grammars and dictionaries of the Lingenio systems have been permanently enlarged and improved, driven by the need to survive in the market so that, today, the coverage is so large that freely available software can hardly compete.
However, the research prototype also offers a number of other functions: It permits to partially disambiguate the analyses as appropriate with respect to the needs of the respective task and to compute the corresponding structural and sortal consequences. Furthermore the system is used to automatically extract context elements which are considered to be relevant with respect to disambiguating the considered sortal ambiguities and to integrate them into the lexicon of the system after manual classification in order to allow more detailed and specified analyses as justified by the corresponding increase of lexical information. This is used to test linguistic hypotheses (mainly about sortal consequences of nominal derivation). A compact description of the fundamental functions and intentions of the design can be found
in the presentation given at the DGfS conference in 2008;
for further descriptions and results also compare
the list of publications in B3.
The project is important for Lingenio as it allows to carefully evaluate robustness and quality of the analysis component against large corpora and to extend schemata for underspecified representations which can help to significantly improve translation quality. In addition, the project provides opportunities to work out a number of monolingual applications which increase the portfolio in an interesting and valuable way.
|