foodbaby + relevance   37

Click data as implicit relevance feedback in web search
Search sessions consist of a person presenting a query to a search engine, followed by that person examining the search results, selecting some of those search results for further review, possibly following some series of hyperlinks, and perhaps backtracking to previously viewed pages in the session. The series of pages selected for viewing in a search session, sometimes called the click data, is intuitively a source of relevance feedback information to the search engine. We are interested in how that relevance feedback can be used to improve the search results quality for all users, not just the current user. For example, the search engine could learn which documents are frequently visited when certain search queries are given.

In this article, we address three issues related to using click data as implicit relevance feedback: (1) How click data beyond the search results page might be more reliable than just the clicks from the search results page; (2) Whether we can further subselect from this click data to get even more reliable relevance feedback; and (3) How the reliability of click data for relevance feedback changes when the goal becomes finding one document for the user that completely meets their information needs (if possible). We refer to these documents as the ones that are strictly relevant to the query.

Our conclusions are based on empirical data from a live website with manual assessment of relevance. We found that considering all of the click data in a search session as relevance feedback has the potential to increase both precision and recall of the feedback data. We further found that, when the goal is identifying strictly relevant documents, that it could be useful to focus on last visited documents rather than all documents visited in a search session.
IR  relevance  LTR  click  data 
10 weeks ago by foodbaby
[1602.01137] A Dual Embedding Space Model for Document Ranking
A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs.
We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features.
word  embeddings  IR  relevance  papers  2016 
december 2017 by foodbaby
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
This article examines the reliability of implicit feedback generated from clickthrough data and query reformulations in World Wide Web (WWW) search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average. We find that such relative preferences are accurate not only between results from an individual query, but across multiple sets of results within chains of query reformulations.
click  evaluation  relevance  IR  papers 
december 2017 by foodbaby
Relevance Feedback-based Query Expansion Model using Ranks Combining and Word2Vec... | Request PDF
Query expansion is a well-known method for improving the performance of information retrieval systems. Pseudo-relevance feedback (PRF)-based query expansion is a type of query expansion approach that assumes the top-ranked retrieved documents are relevant. The addition of all the terms of PRF documents is not important or appropriate for expanding the original user query. Hence, the selection of proper expansion term is very important for improving retrieval system performance. Various individual query expansion term selection methods have been widely investigated for improving system performance. Every individual expansion term selection method has its own weaknesses and strengths. In order to minimize the weaknesses and utilizing the strengths of the individual method, we used multiple terms selection methods together. First, this paper explored the possibility of improving overall system performance by using individual query expansion terms selection methods. Further, ranks-aggregating method named Borda count is used for combining multiple query expansion terms selection methods. Finally, Word2vec approach is used to select semantically similar terms with query after applying Borda count rank combining approach. Our experimental results on both data-sets TREC and FIRE demonstrated that our proposed approaches achieved significant improvement over each individual terms selection method and other's related state-of-the-art method.
relevance  feedback  query  expansion  word  embeddings  IR  papers 
december 2017 by foodbaby
A Comparative Study of Pseudo Relevance Feedback for Ad-hoc Retrieval | SpringerLink
This paper presents an initial investigation in the relative effectiveness of different popular pseudo relevance feedback (PRF) methods. The retrieval performance of relevance model, and two KL-divergence-based divergence from randomness (DFR) feedback methods generalized from Rocchio’s algorithm, are compared by extensive experiments on standard TREC test collections. Results show that a KL-divergence based DFR method (denoted as KL1), combined with the classical Rocchio’s algorithm, has the best retrieval effectiveness out of the three methods studied in this paper.
IR  pseudo  relevance  feedback 
november 2017 by foodbaby

Copy this bookmark: