Institute for Information Transmission Problems of the RAS (Kharkevich Institute)
Phraseology in computational linguistics
We will explain how to handle idiomatic expressions, which do not obey the usual rules and therefore require special approaches. The usual sentence can be represented as a dependency tree, where there is a root word and all other words are dominated by the root directly or indirectly, with one and only one parent for each non-root word. However in phraseology the situation may be entirely different. Thus, in a Russian sentence of the type "видеть не видел, но..." the meaning can be described as that of perception that is less strong than just vision (such as "видеть я его не видел, но много слышал о нем" - "I heard a lot about him even if I have never seen him in person"), and this meaning is expressed not with specific words but by means of the repetitive construction as a whole. We will discuss such phenomena with an emphasis on algorithmic approaches to handling them when processing natural-language text.