Studies of user typology on the site have revealed that some users answer from personal knowledge — "specialists" — while others use external sources to construct answers — "synthesists", with synthesists tending to accumulate more reward points.

Studies of user typology on the site have revealed that some users answer from personal knowledge — "specialists" — while others use external sources to construct answers — "synthesists", with synthesists tending to accumulate more reward points. They also show that answer length is a good predictor of "best answer" choice. Answers is not very deep. Answers' reputation of being a source of entertainment rather than a fact based question and answer platform, [31] [32] and for the reliability, validity, and relevance of its answers.

A study found that Yahoo! Answers is suboptimal for questions requiring factual answers and that the quality decreases as the number of users increases.

Answers provides, particularly the persistence of inaccuracies, the inability to correct them, and a point structure that rewards participation more readily than accuracy, all indicate that the site is oriented towards encouraging use of the site, not offering accurate answers to questions. Answers itself indicate that Yahoo!

Answers attracts a large number of trolls. The site does not have a system that filters the correct answers from the incorrect answers. Answers, once the "best answer" was chosen, there was no way to add more answers nor to improve or challenge the best answer chosen by the question asker; there is a display of thumbs down or thumbs up for each answer, but viewers cannot vote.

In Aprilthis was changed to allow for additional answers after a best answer is chosen, but the best answer can never be changed. Also, while "best answers" can be briefly commented upon, the comment is not visible by default and is hence hardly read. The only required information is the past behavior of users, such as their previous transactions or product ratings.

The two primary schools of CF are 1 latent factor models, which ex- plain ratings by characterizing both items and users on factors in- ferred from rating patterns, e.

The original neighborhood methods user-user compute the relationships among users, estimating un- known ratings based on recorded ratings of like minded users [11].

An analogous alternative is the item-item approach that estimates ratings by exploiting previous ratings made by the same user on similar items [17, 20].

Answers the items e. Hence, user-user methods are more relevant to our task. We must consider such scenarios as well, since the most interesting questions are often those that did not attract mass user interactions and have not yet been an- swered in a satisfactory manner. Hence, we also resort to the other class of recommender systems techniques, namely, content analy- sis. Our method introduces a novel, symmet- ric integration of CF with CA approaches that allows exploiting behavioral signals together with user- and question-attributes.

Some prior works also addressed the need to combine CA with CF, sometimes known as hybrid recommenders. Among the more recent notable works are [1, 10], which are both based on latent factor models, where the factors are initialized and centered on item and user attributes.

Our approach is different. We chose not to rely on latent factor modeling for two key reasons. First, learning such a model is non-trivial when dealing with volatile questions and relatively new users. Such an approach, while being critical to success, renders the known factor models less usable, hence the need to develop an alternate methodology. Our setup requires a different treatment: Our approach is very different from ex- pert search tasks such as [18, 4] that try to identify an authoritative answer that would satisfy most; in our context the key objective is to satisfy the sole asker, in a variety of questions, some factoid but many being subjective where the notion of expertise is irrelevant.


Finally, an interesting related work deals with Google news rec- ommendation [6]. The problem there is similar to ours in terms of scalability and volatility of items with a high churn rate. The au- thors use a blend of three separate CF methods in a highly scalable fashion. We follow a different direction, in combining CA with CF, and in addressing separate types of user feedback.

Be- yond these, the task of recommending questions brings additional, less well addressed challenges, which induce the unique design cri- teria for our model.

We believe that such challenges are and will continue being typical in other recommendation scenarios, espe- cially those involving user generated content. A question may be described by textual attributes, which come from the question itself and from its answers.

Another question descriptor may be its category, as selected by the asker. It seems that incorporating different kinds of signals somehow compensates for the severe sparseness of the data. This becomes even more challenging when considering the multiple channels through which users can interact with a question, as detailed below. A second factor comes from the need to account for the multi- ple kinds of interactions of different intensities between users and questions.

When data per user and item is scarce, exploiting these diverse types of user- item interactions is vital. Yet, care should be taken when integrating different signals together, since some types of interactions should be more indicative than others. For example, we would expect that answering be a more meaningful interaction than mailing. Our ap- proach induces hundreds of interactions, so we must rely on an automated procedure for combining them together effectively.

Finally, we have a major interest in new questions, which are those requiring solicitation of answerers. Such a setup usually dedicates distinct parameters for each item, e.