![]() Multi-word synonyms won’t be matched in queriesįinally, and most seriously, the SynonymFilterFactory will simply not match multi-word synonyms in user queries if you do any kind of tokenization. Index-time expansion supposedly fixes this problem by giving the same IDF values for “dog,” “hound,” and “pooch,” regardless of what the document originally said. This could create havoc for your poor users, who may be wondering why weird documents about hounds and pooches are appearing so high in their search for “dog.” Since “hound” and “pooch” are much less common words, though, this means that documents containing them will always be artificially high in the search results, regardless of the query. In this case, a query for any one of the three will be expanded into: Consider our “dog,” “hound,” and “pooch” example. +((breast breast breast breast breast cancer cancer) (cancer neoplasm neoplasms tumor tumors) breast breast)~4Įven if you don’t have multi-word synonyms, the Solr docs mention a second good reason to avoid query-time expansion: unintuitive IDF boosting. ![]() In the example above, setting mm=100% will require that all four terms be matched: Similarly, the mm parameter ( minimum “should” match) in the DisMax and EDisMax query parsers will not work as expected. Intuitively, if we were to represent this as a finite-state automaton, you might think that Solr is building up something like this (ignoring plurals):Īnd your poor, unlikely document must match all four terms in sequence. Huh? What’s going on here? Well, it turns out that the SynonymFilterFactory isn’t expanding your multi-word synonyms the way you might think. “breast cancer” with the quotes), your document must literally match something like “breast cancer breast breast” in order to work. However, this also means that, if you’re doing a phrase query (i.e. …which matches documents containing “breast neoplasms,” “cancer of the breast,” etc. +((breast breast breast breast breast cancer cancer) (cancer neoplasm neoplasms tumor tumors) breast breast) So in a normal SynonymFilterFactory setup with expand=”true”, a query for “breast cancer” becomes: Consider, for example, the synonyms for “breast cancer”: MeSH is a medical ontology that works pretty well to provide some sensible synonyms for the health domain. Multi-word synonyms won’t work as phrase queriesĪt Health On the Net, our search engine uses MeSH terms for query expansion. This is kind of complicated, so it’s worth stepping through each of these problems in turn. Multi-word synonyms won’t be matched in queries.The IDF of rare synonyms will be boosted, causing unintuitive results.Multi-word synonyms won’t work as phrase queries.They explain that query-time synonym expansion has two negative side effects: However, according to the Solr docs, this is a Very Bad Thing to Do(™), and apparently you should put the SynonymFilterFactory into the index analyzer instead, despite what your instincts would tell you. Synonyms work instantly there’s no need to re-index.Your synonyms can be swapped out at any time, without having to update the index. ![]() In theory, this should have several advantages: Your first, intuitive choice might be to put the SynonymFilterFactory in the query analyzer. Our problem is specific to Solr, but the choice between these two approaches can apply to any information retrieval system. ![]() The graphic below summarizes the basic differences between index-time and query-time expansion. Where it gets complicated is when you have to decide where to fit the SynonymFilterFactory: into the query analyzer or the index analyzer? Index-time vs. You can even choose whether to expand your synonyms reciprocally or to specify a particular directionality.įor instance, you can make “dog,” “hound,” and “pooch” all expand to “dog | hound | pooch,” or you can specify that “dog” maps to “hound” but not vice-versa, or you can make them all collapse to “dog.” This part of the synonym handling is very flexible and works quite well. Solr provides a cool-sounding SynonymFilterFactory, which can be a fed a simple text file containing comma-separated synonyms. And there are lots of good ways to shoot yourself in the foot. These can be free in the WordPress.A Rover by any other name would taste just as sweet.Īs it turns out, though, Solr doesn’t make synonym expansion as easy as you might like. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. They can extend functionality or add new features to your WordPress websites. Plugin Plugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |