From 8e14a01686922b438eb0f724ac63ed9b247452de Mon Sep 17 00:00:00 2001 From: Kasun Gajasinghe Date: Sat, 14 Aug 2010 11:11:42 +0000 Subject: [PATCH] Developer Docs: Search --- xsl/webhelp/docsrc/readme.xml | 73 +++++++++++++++++++++++++++++++++-- 1 file changed, 70 insertions(+), 3 deletions(-) diff --git a/xsl/webhelp/docsrc/readme.xml b/xsl/webhelp/docsrc/readme.xml index c2654474a..ad448501b 100755 --- a/xsl/webhelp/docsrc/readme.xml +++ b/xsl/webhelp/docsrc/readme.xml @@ -1,6 +1,6 @@ +"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> Web-based Help from DocBook XML Readme @@ -900,7 +900,74 @@ persist: "cookie" - - +
+ Search + Overview design of Search mechanism. + + The searching is a fully client-side implementation of querying texts for + content searching, and no server is involved. That means when a user enters a query, + it is processed by JavaScript inside the browser, and displays the matching results by + comparing the query with a generated 'index', which too reside in the client-side web browser. + + Mainly the search mechanism has two parts. + + + Indexing: First we need to traverse the content in the doc/content folder and index + the words in it. This is done by nw-cms.jar. You can invoke it by + ant index command from the root of webhelp of directory. You can recompile it + again and build the jar file by ant build-indexer. Indexer has some extensive + support for such as stemming of words. Indexer has extensive support for English, German, + French languages. By extensive support, what I meant is that those texts are stemmed + first, to get the root word and then indexes them. For CJK (Chinese, Japanese, Korean) + languages, it uses bi-gram tokenizing to break up the words. (CJK languages does not have + spaces between words.) + + + When we run ant index, it generates five output files: + + + htmlFileList.js - This contains an array named fl which stores details + all the files indexed by the indexer. + + + + htmlFileInfoList.js - This includes some meta data about the indexed files in an array + named fil. It includes details about file name, file (html) title, a summary + of the content.Format would look like, + fil["4"]= "ch03.html@@@Developer Docs@@@This chapter provides an overview of how webhelp is implemented."; + + + + + index-*.js (Three index files) - These three files actually stores the index of the content. + Index is added to an array named w. + + + + + + + + + Querying: Query processing happens totally in client side. Following JavaScript files handles them. + + + nwSearchFnt.js - This handles the user query and returns the search results. It does query + word tokenizing, drop unnecessary punctuations and common words, do stemming if docbook language + supports it, etc. + + + {$indexer-language-code}_stemmer.js - This includes the stemming library. + nwSearchFnt.js file calls stemmer method in this file for stemming. + ex: var stem = stemmer(foobar); + + + + + + + + +
-- 2.40.0