|Associated Publication:||Gaurav Baruah, Mark D. Smucker, and Charles L. A. Clarke, "Evaluating Streams of Evolving News Events", SIGIR 2015, 10 pages [PDF]|
|Test Collection:||Temporal Summarization Track 2014|
|Corpus:||KBA Stream Corpus 2014|
People track news events according to their interests and available time. For a major event of great personal interest, they might check for updates several times an hour, taking time to keep abreast of all aspects of the evolving event. For minor events of more marginal interest, they might check back once or twice a day for a few minutes to learn about the most significant developments. Systems generating streams of updates about evolving events can improve user performance by appropriately filtering these updates, making it easy for users to track events in a timely manner without undue information overload. Unfortunately, predicting user performance on these systems poses a significant challenge. Standard evaluation methodology, designed for Web search and other adhoc retrieval tasks, adapts poorly to this context. In this paper, we develop a simple model that simulates users checking the system from time to time to read updates. For each simulated user, we generate a trace of their activities alternating between away times and reading times. These traces are then applied to measure system effectiveness. We test our model using data from the TREC 2014 Temporal Summarization Track (TST) comparing it to the effectiveness measures used in that track. The primary TST measure corresponds most closely with a modeled user that checks back once a day on average for an average of one minute. Users checking more frequently for longer times may view the relative performance of participating systems quite differently. In light of this sensitivity to user behavior, we recommend that future experiments be built around clearly stated assumptions regarding user interfaces and access patterns, with effectiveness measures reflecting these assumptions.
We present Modeled Stream Utility (MSU), a user-oriented evaluation measure for evaluation of systems that filter information from streams.
We demonstrate our metric using the data (runs, qrels) from the Temporal Summarization track 2014. The Temporal summarization track requires that runs submit the sentence-ids of sentences that constitute a temporal summary for a topic.
The lengths of all submitted sentences are required for computing MSU. We extracted these lengths from the KBA Stream Corpus 2014.
ts-2014-submitted-updates-length.tar.gz (14MB gzipped)