wiki:Drupal/SemanticSearch

Semantic Search

one of the OrganikComponents

The Semantic Search Component supports browsing, searching, retrieving and displaying knowledge resources. Determined by the approach of knowledge commonality which is driven by adding much content and automatically tagging the content, it is expected that OrganiK will store a large amount of knowledge items. Users will access this information through a semantic search engine. It provides advanced search functionality based on the available semantics of taxonomies.

One case will be to find new information by the user, for instance when a user searches for knowledge on the process of preparing marketing material and did not know about it before. To find only relevant links, a ranking algorithm will give more value to results that are tagged with relevant tags (tagging component), are accessed and tagged by many users (collaborative), and match the search terms well (content analyser). Additionally, the entered search terms will be compared with previously identified synonyms for query expansion. More results can then be retrieved – for example when searching for “marketing”, documents containing the words “promotion” or “advertisement” can also be found. Similar, synonym tags will be found.

In a different case, users will try to re-find information that they know to exist in the system already. This is a different process from finding new information and can be supported by other means. A personal ranking of items is employed to find items that the users often access (user behavior analysis) or that user tagged often.

Personalization and information awareness shorten the list of search results after the execution of the query. To this end the knowledge about the user (i.e. knowledge level in a particular topic) and the recognized knowledge of the user (i.e. well known documents) is used. Additionally, all OrganiK client tools will employ RSS (Really Simple Syndication or Rich Site Summary, a set of XML-based web-content distribution and republication/syndication protocols) in order to announce recent additions of content/updates to a knowledge worker web-based workspace.

Features

  • Query expansion based on synonyms and relations (broader, narrower, related) managed by the SemanticApi.
  • Enrichment of the words linked to documents in the search index, using the latent topics extracted from the ContentAnalyser
  • Ranking search results by on an overall relevancy rank based on TF/IDF.
  • Ranking search results by an individual relevancy based on observations recorded by UserBehaviourAnalyser.
  • Provide an RSS interface to search results.

In the node-search user interface of Drupal, query expansion using alternative spellings from the taxonomy module are used. This implements the SemanticSearchComponent?.

How to use it

Semantic search only works in the advanced search window. The quickest way to come from the start page to the advanced search window is by pressing the "search" button next to the search field

  • Check "semantic" in the search gui.
  • Enter a term that is a word in the taxonomy. For example, enter "cardiology" - if your taxonomy has a term "cardiology" with sub-terms "heart sounds" then the search will also search from "heart desease"
  • If you have taxonomy entries consisting of multiple words, you must put quotations (") around the word, such as around "heart sounds". Then, the semantic search will also include "cardiology" in your search.
  • The semantic search also includes alternative spellings, if you have configured alternative spellings of taxonomy words.

source:trunk/demo/semanticsearchScreenshot.png

How it works

Each word in the query is compared to an entry in the taxonomy, the query is expanded with alternative spellings of the word. Also, the direct parents and children of the term will be included in the query.

Semantic search is achieved by implementing Drupal's hook_search in modules organik_semantic_search, organik_tag_search and organik_shoutbox in order to search all pages, all tags and all shoutbox posts respectively.

There, keywords entered by the user are expanded based on the broader/narrower relationships between taxonomy terms.

(DEPRECATED It is implemented as patch to the core drupal node.module. The node_search function is extended and adds additional "OR" statements before the query is processed. The user interface is adapted in node_form_alter.)

Developers

Implemented by RemziCelebi and LeoSauermann for Drupal.

Code

Last modified 7 years ago Last modified on 09/09/10 17:41:22