foodbaby + relevance   33

[1602.01137] A Dual Embedding Space Model for Document Ranking
A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs.
We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features.
word  embeddings  IR  relevance  papers  2016 
december 2017 by foodbaby
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
This article examines the reliability of implicit feedback generated from clickthrough data and query reformulations in World Wide Web (WWW) search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average. We find that such relative preferences are accurate not only between results from an individual query, but across multiple sets of results within chains of query reformulations.
click  evaluation  relevance  IR  papers 
december 2017 by foodbaby
Relevance Feedback-based Query Expansion Model using Ranks Combining and Word2Vec... | Request PDF
Query expansion is a well-known method for improving the performance of information retrieval systems. Pseudo-relevance feedback (PRF)-based query expansion is a type of query expansion approach that assumes the top-ranked retrieved documents are relevant. The addition of all the terms of PRF documents is not important or appropriate for expanding the original user query. Hence, the selection of proper expansion term is very important for improving retrieval system performance. Various individual query expansion term selection methods have been widely investigated for improving system performance. Every individual expansion term selection method has its own weaknesses and strengths. In order to minimize the weaknesses and utilizing the strengths of the individual method, we used multiple terms selection methods together. First, this paper explored the possibility of improving overall system performance by using individual query expansion terms selection methods. Further, ranks-aggregating method named Borda count is used for combining multiple query expansion terms selection methods. Finally, Word2vec approach is used to select semantically similar terms with query after applying Borda count rank combining approach. Our experimental results on both data-sets TREC and FIRE demonstrated that our proposed approaches achieved significant improvement over each individual terms selection method and other's related state-of-the-art method.
relevance  feedback  query  expansion  word  embeddings  IR  papers 
december 2017 by foodbaby
A Comparative Study of Pseudo Relevance Feedback for Ad-hoc Retrieval | SpringerLink
This paper presents an initial investigation in the relative effectiveness of different popular pseudo relevance feedback (PRF) methods. The retrieval performance of relevance model, and two KL-divergence-based divergence from randomness (DFR) feedback methods generalized from Rocchio’s algorithm, are compared by extensive experiments on standard TREC test collections. Results show that a KL-divergence based DFR method (denoted as KL1), combined with the classical Rocchio’s algorithm, has the best retrieval effectiveness out of the three methods studied in this paper.
IR  pseudo  relevance  feedback 
november 2017 by foodbaby

Copy this bookmark: