DLIA99 Discussion summary

I: Document Retrieval

Discussion chair: Larry Spitz (Document recognition technologies)

 

How can we develop objective means of evaluating system performance
when there is so much human variability on judgments?

How well do humans do at this task? Some investigators have not taken
duplicate documents into account.

Even if we had perfect recognition of non-text, we do not have a query
technology that can effectively be used to access this information.

Users, in general, do not have consistent models of the functionality
that they really need to support their applications and they really do
not care about the underlying technology, hence it is difficult, at
best, to determine system capability requirements as a result of user
surveys.

We can divide the user needs into two categories which have quite
different characteristics: 1. The creator of a document wants to search
for a document he had earlier created. 2. The user is searching for
documents that are unknown to him.

Algorithms and techniques adopted from the computer vision or
psychophysics communities might provide some increased efficiency
(speed-up) of current capabilities but are insufficient to provide new
solutions.

Is it possible to use measurements of eye movement, related to
attractiveness (or interest), to layout analysis?

It is surprising that comparisons of pixel intensity histograms are
hard to beat in categorizing document images.

A system developed at Columbia was more favorably rated by users when
it returned random selections than when it returned selections thought
to be responsive to the query.

How can we encode the "points of interest" in a document?