From d762c2edba3eb0ae4d250d6b19c57de941a963f8 Mon Sep 17 00:00:00 2001 From: Kasun Gajasinghe Date: Fri, 9 Sep 2011 18:40:30 +0000 Subject: [PATCH] webhelp - some updates to the documentation about search --- xsl/webhelp/docsrc/readme.xml | 49 +++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/xsl/webhelp/docsrc/readme.xml b/xsl/webhelp/docsrc/readme.xml index 7d703a9c6..69280d3d3 100755 --- a/xsl/webhelp/docsrc/readme.xml +++ b/xsl/webhelp/docsrc/readme.xml @@ -117,10 +117,15 @@ - Provides content search of the documentation. Shows the search results with + Provides full content search of the documentation. Shows the search results with links to chunked pages, and descriptions taken from the abstract in the chapters or from a para with role="summary" + + Word scoring/rating - For a particular word, the pages are weighted according to + how much that word appears in it, is it bold or not, is in index terms etc. The + score out of 5 is shown by small colored boxes after each search-result + Stemming support for English, French, and German. Stemming support can be added for other languages by implementing a stemmer. @@ -130,7 +135,7 @@ Support for Chinese, Japanese, and Korean using code from the Lucene search - engine. + engine Search highlighting shows where the searched for term appears in the results. @@ -749,26 +754,31 @@ persist: "cookie"
Search Overview design of Search mechanism. - The searching is a fully client-side implementation of querying texts for content - searching, and no server is involved. That means when a user enters a query, it is processed - by JavaScript inside the browser, and displays the matching results by comparing the query - with a generated 'index', which too reside in the client-side web browser. Mainly the search - mechanism has two parts. + The serching is a fully client-side implementation of querying texts for content + searching. There's no server involved. So, the search queries by the users are processed by + JavaScript inside the browser, and displays the matching results by comparing the query with + a simplified 'index' that too resides in JavaScript. Mainly the search mechanism has two + parts. Indexing: First we need to traverse the content in the docs/content folder and - index the words in it. This is done by nw-cms.jar. You can invoke - it by ant index command from the root of webhelp of directory. You can - recompile it again and build the jar file by ant build-indexer. Indexer - has some extensive support for such as stemming of words. Indexer has extensive - support for English, German, French languages. By extensive support, what I meant is - that those texts are stemmed first, to get the root word and then indexes them. For - CJK (Chinese, Japanese, Korean) languages, it uses bi-gram tokenizing to break up the - words. (CJK languages does not have spaces between words.) - When we run ant index, it generates five output files: + index the words in it. This is done by webhelpindexer.jar in + xsl/extentions/ folder. You can invoke it by ant + index command from the root of webhelp of directory. The source of + webhelpindexer is now moved to it's own location at + trunk/xsl-webhelpindexer/. Checkout the Docbook trunk svn + directory to get this source. Then, do your changes and recompile it by simply running + ant command. My assumption is that it can be opened by Netbeans IDE by + one click. Or if you are using IntelliJ Idea, you can simply create a new project from + existing sources. Indexer has extensive support for features such as word scoring, + stemming of words, and support for languages English, German, French. For CJK + (Chinese, Japanese, Korean) languages, it uses bi-gram tokenizing to break up the + words (since CJK languages does not have spaces between words). + When ant index is run, it generates five output files: htmlFileList.js - This contains an array named fl which stores details all the files indexed by the indexer. - + Further, the doStem in it defines whether stemming should be used. It defaults + to false. htmlFileInfoList.js - This includes some meta data @@ -783,8 +793,7 @@ persist: "cookie" actually stores the index of the content. Index is added to an array named w. - - + Querying: Query processing happens totally in client side. Following JavaScript @@ -872,7 +881,7 @@ private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko", docbook-webhelp/indexer/src/com/nexwave/stemmer/snowball/ext/. - initialize correct stemmer based on the + <title>Initialize correct stemmer based on the <code>webhelp.indexer.language</code> specified SnowballStemmer stemmer; -- 2.40.0