|
Technote 1141
|
CONTENTS
| Mac OS 8.5 includes several enhanced searching capabilities, known collectively as Sherlock. Previously, the Mac OS Find application allowed users to search mounted disk volumes for files based on information such as name, modification date, and file type. Sherlock retains this functionality, but also extends the user's search options to include both the content of files and the Internet. |
OverviewTo perform an Internet search, the Sherlock application sends query information to one or more Internet search sites. The information returned by the search sites is interpreted by the Sherlock application and then displayed for perusal. As each Internet search site has its own particular format for query and response information, the Sherlock application uses plug-ins that describe data formats expected and provided by individual Internet search sites for formatting queries and parsing response data. Internet search site providers interested in building their own Internet search site plug-ins will find directions for doing so in the Internet Search Plug-ins section. AppleScript commands for accessing the new content-based search and Internet search facilities provided by the Sherlock application are available. These include commands for searching by content, a command for indexing volumes, and commands for performing Internet searches. These commands are discussed in greater detail in the AppleScript Support section. The Sherlock application, when asked to open a file that was found by way of a content-oriented search, attaches information about the search and why the file was selected to the 'odoc' Apple event it passes to the Finder. The Finder passes this information along to applications as a property associated with the Find By Content is a new system-level facility implemented as a Code Fragment Manager library. The Sherlock application is a client of Find By Content and utilizes its search facilities for performing content-based searches. Developers interested in using the Find By Content services from within their applications may do so by linking against the PowerPC Code Fragment Manager library named "Find By Content" (without the quotes). Routine descriptions and examples are provided in the Find By Content section below. Internet Search Plug-insThe "Search Internet" feature in the Sherlock application allows users to perform Internet searches using one or more Internet search engines. The Sherlock application itself contains no information about the exact data formats expected or generated by individual Internet search engines; when accessing any particular Internet search site, the Sherlock application uses a search plug-in file that describes the data formats both expected by the site for queries and produced by the site in its responses to queries. Internet Search Interface Language (ISIL) is the language used in search plug-in files so that Internet search site administrators may provide their own search plug-in files. ASCII text describing the search site is contained in a search plug-in's data fork. The resource fork may be used for custom icons, Finder strings, et cetera. Search plug-in files have the creator code ISIL is modeled closely after the HTML it is used to describe, so HTML authors familiar with the syntax should have little or no trouble creating their own search plug-in files. An exact specification of the language can be found in the Internet Search Interface Language BNF section, and the sections that follow discuss the language in greater detail. To create a search plug-in file, you will need a text editor program -- Simple Text will do -- and a utility that will allow you to change the plug-in's file type. The basic steps for editing a search plug-in file are:
If your text editor edits any file regardless of type and does not change the types of the files it edits, you can skip steps 3 and 5. The Sherlock application scans the "Internet Search Sites" only once when it is starting up. You should restart the Sherlock application each time you would like to test your search site file. |
Listing 1. Typical layout for a SEARCH block in a search plug-in file:
<SEARCH name = "<search engine name>" method = ["get" | "post"] action = "<url to address>" [update = "<url containing update file>"] [updateCheckDays = "<days between update pings>"] [description = "<human-readable-description">] [bannerImage = "<url containing banner image>"] [bannerLink = "<url to load when banner clicked>"]> .... <INPUT name = "<input name>" value = "<value>" [mode = "results"]> <INPUT name = "<input name>" value = "<value>" [mode = "browser"] > .... <INPUT name = "<input name>" user> .... <INTERPRET [bannerStart = "<text>"] [bannerEnd = "<text>"] [relevanceStart = "<text>"] [relevanceEnd = "<text>"] [resultListStart = "<text>"] [resultListEnd = "<text>"] [resultItemStart = "<text>"] [resultItemEnd = "<text>"] [skipLocal=true] [charset = "<text>"] [resultEncoding = <integer>] [resultTranslationEncoding = <integer>] [resultTranslationFont = "<text>"]> .... </SEARCH>
Search blocks begin with the <SEARCH ....> tag (containing a number of attributes, as described in Table 1) and end with a </SEARCH> tag. Within a typical search block describing an Internet search site, there will be one or more INPUT tags and an INTERPRET tag. The SEARCH block attributes describe the search site, how it is to be accessed, and where updates to the search plug-in file can be found. |
Table 1. SEARCH block attributes
Attribute Name
Description
name
This is a human-readable name for the search plug-in.
method
The
method
attribute specifies the type of HTTP command that should be used for communications with the HTTP server. Currently, either"GET"
or"POST"
can be specified as the communications method.
action
Specifies the full URL to access the search server. Any relative links in the result list will be localized using this URL.
update
This is an optional attribute specifying where the most recent version of the search plug-in file is kept. If provided, the Sherlock application will periodically check this URL for changes. If the file at this URL is found to be more recent than the one that is currently installed, the Sherlock application will prompt the user to download the new file and automatically install it. The file located at this URL should be in BinHex format (but not otherwise compressed or encoded).
dateCheckDays
This is an optional attribute specifying the number of days between times when the
update
URL is checked for more recent versions of the search plug-in file. If this attribute is not present, the default value of 30 days is used.
description
This is an optional attribute containing text describing the search engine, its capabilities, and the content type of the search results. This text may be used for display in user interface facilities.
bannerImage
This is an optional attribute specifying an URL for an image that will be displayed in the details pane when any result from a query using this search plug-in is selected. Note: the banner properties of the
INTERPRET
tag will override this setting when there is a conflict.
bannerLink
This is an optional attribute specifying an URL that will be loaded when the banner image is clicked. Note: the banner properties of the
INTERPRET
tag will override this setting when there is a conflict.
The
<input name="sv" value="AP" mode = "results"> <input name="sv" value="IS" mode = "browser"> Here, &sv=AP will be sent to the server when the Sherlock application will be used to display the results, and &sv=IS will be sent to the server when a web browser will be used to display the results. The |
Table 2.INTERPRET
tag attributes
Attribute Name
Description
resultListStart
Specifies the text pattern present at the beginning of the list of search results in the result page returned by the server. If
resultListStart
is not specified, then the Sherlock application will assume the result list begins at the top of the result page.
resultListEnd
Specifies the text pattern present at the end of the list of search results in the result page returned by the server. If
resultListEnd
is not specified, then the Sherlock application will assume the result list ends at the bottom of the result page. TheresultListStart
andresultListEnd
attributes are used to define text patterns delimiting the list of results.
resultItemStart
Specifies a text pattern present at the beginning of each individual item in the list of results. When the text specified is matched in the result page, only links immediately following the text pattern will be included in the list of results displayed for the user.
resultItemEnd
Specifies a text pattern present at the end of the text used to describe an item in the list of results. Text between a result's link and this text pattern will be presented in the details pane. The
resultItemStart
andresultItemEnd
attributes are used to define text patterns delimiting individual items in the list of results returned by the server.
bannerStart
Specifies a text pattern used to locate the banner image to be displayed for the search results. The first link following the text pattern will be used as the
bannerLink
and the first image following the text pattern will be used as thebannerImage
. If thebannerStart
attribute is specified and the text pattern is matched, then thebannerLink
andbannerImage
will override those attributes specified in theSEARCH
tag.
bannerEnd
Specifies a text pattern marking the end of the banner information. The search for a
bannerImage
andbannerLink
will not proceed beyond this text pattern in the result page. The text patterns defined in thebannerStart
andbannerEnd
attributes are used to delimit the banner information that may be present in the result page. If banner information is found in the result page, then it will be used instead of any banner information specified in theSEARCH
tag; otherwise, if no banner information is found, then the default banner information specified in theSEARCH
tag will be used.
relevanceStart
Specifies a text pattern marking the beginning of the relevance information provided for each item in the list of results. When present, the first numeric text found after the pattern will be interpreted as the relevance of the item. Note: the numbers used to represent relevance scores should be between 0 and 100.
relevanceEnd
Specifies a text pattern marking the end of the relevance information. The search for relevance information will not proceed beyond this text pattern. The text patterns defined in the
relevanceStart
andrelevanceEnd
attributes are used to delimit the relevance score for each individual search result. Note: the numbers used to represent relevance scores should be between 0 and 100.skipLocal
skipLocal
is a boolean attribute. IfskipLocal
is true, then the Sherlock application will ignore links that refer to the same host as specified in theACTION
attribute in theSEARCH
tag.
charset
The expected encoding of the HTML results. This attribute may be set to any value appropriate for the
charset
HTML meta tag.
resultEncoding
The encoding that the HTML results are in. This may be any integer constant defined in
<TextCommon.h>
.
resultTranslationEncoding
The encoding that the HTML results should be translated to. This may be any integer constant defined in
<TextCommon.h>
.
resultTranslationFont
the preferred font for the translated text
The attributes It is possible, though, that the Sherlock application will not be able to recognize a text encoding by name. For these cases, search plug-in creators can explicitly specify the character encoding that will be used in responses to queries by using the For example, if a result page returned from a search site was encoded using the "euc-jp" character set (in <interpret resultEncoding = 2336 resultTranslationEncoding = 1 resultTranslationFont = "Osaka" >
|
An ExampleIn this hypothetical example, we assume the Internet search site that we are writing the search plug-in file for is located at the URL <http://clarus.apple.com>. (As of this writing, this site does not exist, although the following text is written as if the site does exist. If the site did exist, it would presumably enable visitors to search for information regarding Clarus the Dogcow. An explanation of how visitors other than dogcattle would make use of the search results is beyond the scope of this document and is left as an exercise for the reader.) |
Step 1: Describe the site in the opening |
Step 2: Define the INPUT tags.There are two ways to determine what inputs are expected by an Internet search site. The first method is to manually perform a query and look at the URL that is sent to the server. The second is to pick through the HTML to discover the information. The Query Method. Looking at the query information is the simplest method. For example, if we go to the search site in our web browser and type the query string "coffee" and start a search, then we may observe a URL that looks like this: http://clarus.apple.com/Titles?qt=coffee&nh=10 From which, we can locate the inputs. The inputs come after the "?" and are separated by ampersand characters [&]. In this query, the inputs are as follows: qt=coffee nh=10 Using this information, we can construct the following two INPUT tags: <input name="qt" user> <input name="nh" value="10"> There may be some optional parameters available on a search site, so trying different options and queries may yield more useful information. The HTML Method. If the inputs are not present in the URL then they must be determined by looking at the HTML source. Here, we look for the <form action="/Titles" method="get" name="Search"> <table width="100%" cellspacing=0 cellpadding=3 border=0> <tr><td colspan=4> Search</td> <td align=center><a href="/Help?pg=Help.HTML"><b>Tips</b></a> </td></tr> <tr><td colspan=5> <input type="text" name="qt" value="" size="25" MAXLENGTH=255> </td></tr> <INPUT TYPE=hidden NAME="nh" VALUE="10"> </table> </form> Between the <input type="text" name="qt" value="" size="25" MAXLENGTH=255> <INPUT TYPE=hidden NAME="nh" VALUE="10"> Again, this information can be used to construct the following two <input name="qt" user> <input name="nh" value="10"> Experimenting with these input parameters and writing different types of query URLs can provide useful information about their meaning and use. For instance, after writing several variations of the query URL, we discovered that <input name="qt" user> <input name="nh" value="25"> Now that the inputs have been determined, there is enough information to put together a complete search plug-in file: <search name="Clarus Test" description = "The Clarus Search Site" action="http://clarus.apple.com/Titles/" method=get> <input name="qt" user> <input name="nh" value="25"> </search> However, in this form, although it will be possible for queries to be sent and results to be displayed, the lack of an |
Step 3: Describe the results in the |
Listing 2. A sample HTML response to a query:
<HTML> <HEAD><TITLE>Sample Results</TITLE></HEAD> <BODY> <A HREF="http://www.apple.com"> <IMG SRC="http://www.apple.com/main/elements/apple.gif" ALT="Apple Computer" </A> <P> <SMALL>90%</SMALL> <A HREF="http://www.apple.com/hotnews/">Hot News</A> Apple Hot News - http://www.apple.com/hotnews <BR><A HREF="http://www.apple.com">Apple Computer</A> </P> <P> <SMALL>85%</SMALL> <A HREF="http://www.apple.com/products/">Apple Products</A> Apple - Products - http://www.apple.com/products <BR><A HREF="http://www.apple.com">Apple Computer</A> </P> </BODY> </HTML>
From this information, we can see that the banner section is delimited by the text patterns "<BODY>" and "<P>" as follows: bannerStart="<BODY>" bannerEnd="<P>" The List of results are delimited by the text patterns "</A>" and "</BODY>": resultListStart="</A>" resultListEnd="</BODY>" Each item in the list of results is bracketed by the text patterns "<P>" and "</P>": resultItemStart="<P>" resultItemEnd="</P>" And, the relevance score for each item is bracketed by the text patterns "<SMALL>" and "</SMALL>": relevanceStart="<SMALL>" relevanceEnd="</SMALL>" Putting this all together, the complete search plug-in file would have the following contents: <search name="Clarus Test" description = "The Clarus Search Site" action="http://clarus.apple.com/Titles/" method=get> <input name="qt" user> <input name="nh" value="25"> <interpret bannerStart="<BODY>" bannerEnd="<P>" resultListStart="</A>" resultListEnd="</BODY>" resultItemStart="<P>" resultItemEnd="</P>" relevanceStart="<SMALL>" relevanceEnd="</SMALL>"> </search> |
Internet Search and XML Search ResultsIt is possible that a search engine may provide a separate machine-readable interface such as Extensible Markup Language (XML). |
Listing 3. A sample XML document:
<searchResponse> <advertisement> <a href="http://www.advertiser.com"> <img src="ad.gif"> </a> </advertisement> <searchResults> <resultItem> <b><relevance>67%</relevance></b> <link><a href="http://www.foo.com">Title</a></link><br/> <summary>Summary</summary> </resultItem> </searchResults> </searchResponse>
At the time of this document's creation, the XML specification is still under development; however, using the current state of the standard, the Internet Search Interface can be easily configured to interpret XML result lists. For example, the <interpret bannerStart = "<advertisement>" bannerEnd = "</advertisement>" resultListStart = "<searchResults>" resultListEnd = "</searchResults>" resultItemStart = "<resultItem>" resultItemEnd = "</resultItem>" relevanceStart = "<relevance>" relevanceEnd = "</relevance>"> |
Listing 4. A simple HTML response to a query that includes delimiting comments:
<HTML> <HEAD><TITLE>Sample Results</TITLE></HEAD> <BODY> <!-- BANNER START --> <A HREF="http://www.apple.com"> <IMG SRC="http://www.apple.com/main/elements/apple.gif" ALT="Apple Computer" </A> <!-- BANNER END --> <!-- RESULT LIST START --> <!-- RESULT ITEM START --> <P> <SMALL> <!-- RELEVANCE START --> 90% <!-- RELEVANCE END --> </SMALL> <A HREF="http://www.apple.com/hotnews/">Hot News</A> Apple Hot News - http://www.apple.com/hotnews <BR><A HREF="http://www.apple.com">Apple Computer</A> </P> <!-- RESULT ITEM END --> <!-- RESULT ITEM START --> <P> <SMALL> <!-- RELEVANCE START --> 85% <!-- RELEVANCE END --> </SMALL> <A HREF="http://www.apple.com/products/">Apple Products</A> Apple - Products - http://www.apple.com/products <BR><A HREF="http://www.apple.com">Apple Computer</A> </P> <!-- RESULT ITEM END --> <!-- RESULT LIST END --> </BODY> </HTML>
Banner AdvertisementsThe Sherlock application uses the first HTML anchor (that includes a hypertext jump and an image) found in the banner section as the banner image. For best results, banner advertisements should be enclosed in an HTML anchor that includes both an hypertext jump (HREF attribute) and an IMG tag that includes a SRC attribute and, preferably, an ALT attribute. For example, the HTML anchor shown below illustrates the suggested format for banner advertisements: <A HREF="http://www.apple.com"> <IMG SRC="http://www.apple.com/main/elements/apple.gif" ALT="Apple Computer" </A> Result ListsWhen interpreting search results, the Sherlock application identifies results by looking for HTML anchors containing hypertext jump attributes. At least one anchor including an hypertext jump (HREF attribute) should occur between the text patterns specified in |
Searching FilesTwo AppleScript commands are provided for access to the Find by Content facilities in the Sherlock application. The first command allows AppleScript scripts to perform searches based on contents of files and the second allows AppleScript scripts to create or update index files on particular volumes that are used by Find By Content. The AppleScript dictionary entry for the "search" command is shown in Definition 2 and the "index volumes" command is shown in Definition 3. The "search" command allows AppleScript scripts to perform searches based on file contents. Definition 2. The "search" dictionary entry from the Sherlock application In the "search" command, the parameters "for", "similar to", and "using" are mutually exclusive parameters and may not be used together in the same command. As in the Internet search command, the "using" parameter allows query information stored in a file to be used rather than a query string. To create such a file, use the "Save Search Criteria" command in the Sherlock application's File menu. The direct object to the "search" command is a list of volumes or folders to search. If no list of volumes is provided and either the "search for" or the "search similar to" parameter is used, then the "search" command will search all local, indexed volumes. When the "using" parameter is specified, the list of volumes is ignored. |
Indexing VolumesBefore the Find By Content facilities can be used to search a volume, the volume must contain an index. Index files are stored in an invisible folder called "TheFindByContentFolder" located in a volume's root directory and they contain necessary information for performing content-based searches. A volume cannot be searched by the Find By Content facilities unless it contains an index. AppleScript scripts can ask the Sherlock application to either update or create an index file for one or more volumes. Definition 3. The "index volumes" dictionary entry from the Sherlock application. |
The Optional |
Listing 5. Retrieving the search words from and'odoc'
Apple event:
OSErr GetSearchWordsFromAppleEvent(AppleEvent* inAppleEvent, char* theText, long *ioLength) { OSErr err; DescType outType; AERecord propData = {typeNull, NULL}; /* set up our variables */ if (ioLength == NULL || theText == NULL) return paramErr; /* get the property data from the Apple event */ err = AEGetParamDesc(inAppleEvent, keyAEPropData, typeAERecord, &propData); /* extract the search words information */ if (err == noErr) err = AEGetKeyPtr(&propData, 'srwd', typeChar, &outType, theText, *ioLength, ioLength); /* clean up and return */ AEDisposeDesc(&propData); return err; }
The Example shown in Listing 5 illustrates how an application may extract the query information from an Note: It is possible for The presence of this additional parameter will not affect the behavior of existing applications built according to the guidelines set forth in the "Responding to Apple Events" chapter of Inside Macintosh: Interapplication Communication. However, developers may choose to take advantage of this new information when it is present in an Apple event as a clue pointing to the part of the document that the user would like to see first. (The presence of the In some cases, however, it is possible that some or all of the words in the query string may not appear in the document being opened. In a normal search based on a query phrase, Find By Content will locate files that contain one or more of the words in the query. But, when a user selects one or more documents found in a previous search and requests "similar" documents, then it is possible that some of the documents found may not contain any of the words from the query string specified in the original search. Developers accessing the |
Find By ContentThe Find By Content (FBC) facilities provided in Mac OS 8.5 are implemented in a PowerPC Code Fragment Manager library that resides in the "Extensions" folder. The Sherlock application is a client of FBC, accessing FBC services through this shared library. Developer applications can also access the search facilities provided by this library. This section describes how developers can create products that access the new FBC facilities through this shared library. Compiler interfaces to FBC are found in the C header file |
Determining if Find By Content is AvailableFBC defines two enum { gestaltFBCVersion = 'fbcv', gestaltFBCCurrentVersion = 0x0011 }; The enum { gestaltFBCIndexingState = 'fbci', gestaltFBCindexingSafe = 0, gestaltFBCindexingCritical = 1 }; The |
Working with Search SessionsFBC allows client applications to open and close a "search session". A search session contains all of the information about a search, including the list of matched files after the search is complete. Clients of FBC can obtain references to search sessions, modify them, and query their state using the routines defined in this section. References to search sessions are defined as an opaque pointer type owned by the FBC library. typedef struct OpaqueFBCSearchSession* FBCSearchSession; Developers should only access the search session structure using the routines defined herein. This includes using the appropriate FBC routines for duplicating and disposing of search sessions. Search sessions are complex memory structures that contain pointers to other data that may need to be copied when a search session is duplicated or disposed of when a search session is deallocated. The normal sequence of actions one takes when using the FBC library is to create a search session, configure the search session to target specific volumes, perform the search, query the search results, and dispose of the search. Other possibilities for searches include the ability to reinitialize a search session and use it over again for another search, to provide backtracking by cloning search sessions and performing additional searches using the clones, or to limit search results to files found in particular directories. |
Setting up a Search SessionCreating a new session and preparing it for a search, as shown in Listing 6, requires at least two calls to the FBC library. In this example, a new search session is created and it is configured to search all local volumes that contain index files. The call to |
Listing 6. Setting up a search session to search all local, indexed volumes:
/* SimpleSetUpSession allocates a new search session and returns a FBCSearchSession value in the *session parameter. if an error occurs, *session is left untouched. */ OSErr SimpleSetUpSession(FBCSearchSession* session) { OSErr err; FBCSearchSession newsession; /* set up our local variables */ err = noErr; newsession = NULL; if (session == NULL) return paramErr; /* create the new session */ err = FBCCreateSearchSession(&newsession); if (err != noErr) goto bail; /* search all available local volumes */ err = FBCAddAllVolumesToSession(newsession, false); if (err != noErr) goto bail; /* store our result and leave */ *session = newsession; return noErr; bail: if (newsession != NULL) FBCDestroySearchSession(newsession); return err; }
FBC provides a complete set of routines for developers wanting more control over what volumes will be searched by the search session. Listing 7 illustrates how a new search session could be configured to search a particular set of volumes. |
Listing 7. Setting up a session to search a particular set of volumes:
/* SetUpVolumeSession allocates a new search session and returns a FBCSearchSession value in the *session parameter. if vCount is not zero, then vRefNums points to an array of volume reference numbers for volumes that are to be searched. if any of the vRefNums refer to a volume without an index, paramErr is returned. */ OSErr SetUpVolumeSession (FBCSearchSession* session, UInt16 vCount, SInt16 *vRefNums) { OSErr err; UInt16 i; FBCSearchSession newsession; /* set up our local variables */ err = noErr; newsession = NULL; if (vCount == 0) return paramErr; if (session == NULL) return paramErr; if (vRefNums == NULL) return paramErr; /* create the new session */ err = FBCCreateSearchSession(&newsession); if (err != noErr) goto bail; /* search the volumes specified in vRefNums */ for (i=0; i<vCount; i++) { if (!FBCVolumeIsIndexed(vRefNums[i])) { err = paramErr; goto bail; } else { err = FBCAddVolumeToSession(newsession, vRefNums[i]); if (err != noErr) goto bail; } } /* store our result and leave */ *session = newsession; return noErr; bail: if (newsession != NULL) FBCDestroySearchSession(newsession); return err; }
In this example, the Once a search session has been configured to search a number of volumes, it can be used again after a search has been conducted without having to reconfigure its target volumes. After performing a search and examining the results, the search session can be prepared for another search by calling the routine Making a copy of a search session using the routine |
Performing SearchesWhen FBC performs a search, it will generate a list of files that were matched. This list is referred to as the "hits", and it is stored inside of the search session. FBC can be asked to perform a content-based search using a query string containing a list of words, a similarity search based on one or more hits obtained in a previous search, or a similarity search based on a list of example files. Listing 8 illustrates how a query-based search can be performed. Here, the query is used to search for matching files on all local indexed volumes. |
Listing 8. A Query based search of all local, indexed volumes:
OSErr SimpleFindByQuery (char *query, FBCSearchSession *session) { OSErr err; FBCSearchSession newsession; /* set up locals, check parameters... */ if (query[0] == 0) return paramErr; if (session == NULL) return paramErr; newsession = NULL; /* allocate a new search session */ err = SimpleSetUpSession(&newsession); if (err != noErr) goto bail; /* Here is the call that does the actual search, storing the results in the search session. */ err = FBCDoQuerySearch(newsession, query, NULL, 0, 100, 100); if (err != noErr) goto bail; /* save the results and return */ *session = newsession; return noErr; bail: if (newsession != NULL) FBCDestroySearchSession(newsession); return err; }
Searches conducted using either the routine All three of the search routines-- |
Listing 9. Searching a particular set of directories:
enum { kMaxVols = 20, maxHits = 10, maxHitTerms = 10 }; OSErr RestrictedFindByQuery (char *query, UInt16 dirCount, FSSpec* dirList, FBCSearchSession* session) { UInt16 vCount, i; SInt16 vRefNums[kMaxVols], normalVol; FBCSearchSession newsession; vCount = 0; newsession = NULL; if (dirList == NULL || dirCount == 0) return paramErr; if (query == NULL) return paramErr; if (*query == 0) return paramErr; if (session == NULL) return paramErr; /* collect all of the unique volume reference numbers from the list of FSSpecs provided in the parameters. */ for (i=0; i<dirCount; i++) { Boolean found; HParamBlockRec pb; /* ensure the vRefNum is a volume reference number */ pb.volumeParam.ioVRefNum = dirList[i].vRefNum; pb.volumeParam.ioNamePtr = NULL; pb.volumeParam.ioVolIndex = 0; if ((err = PBHGetVInfoSync(&pb)) != noErr) goto bail; normalVol = pb.volumeParam.ioVRefNum; /* make sure it's not already in the list */ for (found = false, j=0; j<vCount; j++) if (vRefNums[j] == normalVol) { found = true; break; } /* add the volume to the list */ if (!found && vCount < kMaxVols) vRefNums[vCount++] = normalVol; } /* set up a session to use the volumes we found */ err = SetUpVolumeSession(&newsession, vCount, vRefNums); if (err != noErr) goto bail; /* Here is the call that does the actual search, storing the results in the search session. */ err = FBCDoQuerySearch(newsession, (char*)queryTxt, dirList, dirCount, maxHits, maxHitTerms); if (err != noErr) goto bail; /* save the result and return */ *session = newsession; return noErr; bail: if (newsession != NULL) FBCDestroySearchSession(newsession); return err; }
Here, volume reference numbers extracted from the array of |
Retrieving Information from a Search SessionAfter a search is conducted using a search session, the search session may contain information about one or more matching files. Clients can access information about individual hits including the file's |
Listing 10. Enumerating all of the files found in a search session:
typedef OSErr (*HitProc) (FSSpec theDoc, float score, UInt32 nTerms, FBCWordList hitTerms); /* SampleHandleHits can be called after a search to enumerate the search results. For each search hit, the hitFileProc function parameter is called with information describing the target. */ OSErr SampleHandleHits (FBCSearchSession session, HitProc hitFileProc) { OSErr err; UInt32 hitCount, i; FSSpec targetDoc; float targetScore; FBCWordList targetTerms; UInt32 numTerms; /* set up locals, check parameters */ targetTerms = NULL; if (hitFileProc == NULL) return paramErr; if (session == NULL) return paramErr; /* count the number of hits in this session */ err = FBCGetHitCount(session, &hitCount); if (err != noErr) goto bail; /* iterate through the hits */ for (i = 0; i < hitCount; i++) { /* get the target document's FSSpec */ err = FBCGetHitDocument(session, i, &targetDoc); if (err != noErr) goto bail; /* get the score for this document */ err = FBCGetHitScore(session, i, &targetScore); if (err != noErr) goto bail; /* get a list of the words matched in this document */ numTerms = maxHitTerms; err = FBCGetMatchedWords(session, i, &numTerms, &targetTerms); if (err != noErr) goto bail; /* call the call back routine provided as a parameter to do something with the information. */ err = hitFileProc(&targetDoc, score, numTerms, targetTerms); if (err != noErr) goto bail; /* clean up before moving to the next iteration. */ FBCDestroyWordList(targetTerms, numTerms); targetTerms = NULL; } return noErr; bail: if (targetTerms != NULL) FBCDestroyWordList(targetTerms, numTerms); return err; }
Find By Content ReferenceThis section provides a description of the CFM-based interfaces to the PowerPC FBC library. PowerPC applications using these routines link against the library named "Find By Content" (without the quotes). |
Data TypesFBC provides the following data types. Storage management for these types is provided by the FBC library. Clients should not attempt to allocate or deallocate these structures using calls to the Memory Manager.
Search sessions created by FBC are referenced through pointer variables of this type. The internal format of the data referred to by this pointer is internal to the FBC library. Clients should not attempt to access or modify this data directly.
An ordinary C string. This type is used when retrieving information about hits from a search session.
An array of |
Allocation and Initialization of Search SessionsThe following routines can be used to allocate and dispose of search sessions. Storage occupied by search sessions is owned by the FBC library, and these are the only routines that should be used to allocate, copy, and dispose of search sessions.
|
Configuring Search SessionsSearch sessions can be configured to limit searches to a particular set of volumes. These routines allow clients access to the set of volumes that will be searched by FBC.
|
Executing a SearchFBC provides three different routines for conducting searches that are described in this section.
If any of the example files are not indexed, then the search will proceed with the remainder of the files, and the error code |
Getting Information About HitsOnce a search is complete, a search session will contain a list of hits that were found during the search. The routines described in this section allow clients to access information about hits stored in a search session. Hit records are indexed 0 through count-1.
The matched words for a hit are stored in the hit itself, so retrieving them is fast.
The list of topical words for a particular hit must be generated through the index file, so this call is significantly slower than
|
Summarizing TextThis call produces a summary containing the "most relevant" sentences found in the input text.
|
Getting Information About VolumesFBC provides the following utility routines for accessing information about volumes.
|
Reserving Heap SpaceClients of FBC can reserve space in their heap zone for their callback routine before conducting a search.
|
Application-Defined RoutineClients can provide a routine that will be called periodically during searches. This routine will provide clients with both information about the status of a search, and opportunity to cancel a search before it is complete. Call back routines are defined as follows:
To avoid locking up the system while a search is in progress, the callback should either directly or indirectly call An ongoing search will be canceled if the call back function returns
|
Find By Content C Summary
Constantsenum { gestaltFBCIndexingState = 'fbci', gestaltFBCindexingSafe = 0, gestaltFBCindexingCritical = 1 }; enum { gestaltFBCVersion = 'fbcv', gestaltFBCCurrentVersion = 0x0011 }; enum /* error codes */ { kFBCvTwinExceptionErr = -30500, /* miscellaneous error */ kFBCnoIndexesFound = -30501, kFBCallocFailed = -30502, /*probably low memory*/ kFBCbadParam = -30503, kFBCfileNotIndexed = -30504, kFBCbadIndexFile = -30505, /*bad FSSpec, or bad data in file*/ kFBCtokenizationFailed = -30512, /*couldn't read from document or query*/ kFBCindexNotFound = -30518, kFBCnoSearchSession = -30519, kFBCaccessCanceled = -30521, kFBCindexNotAvailable = -30523, kFBCsearchFailed = -30524, kFBCsomeFilesNotIndexed = -30525, kFBCillegalSessionChange = -30526, /*tried to add/remove vols */ /*to a session that has hits*/ kFBCanalysisNotAvailable = -30527, kFBCbadIndexFileVersion = -30528, kFBCsummarizationCanceled = -30529, kFBCbadSearchSession = -30531, kFBCnoSuchHit = -30532 }; enum /* codes sent to the callback routine */ { kFBCphSearching = 6, kFBCphMakingAccessAccessor = 7, kFBCphAccessWaiting = 8, kFBCphSummarizing = 9, kFBCphIdle = 10, kFBCphCanceling = 11 }; Data Types/* A collection of state information for searching*/ typedef struct OpaqueFBCSearchSession* FBCSearchSession; /* An ordinary C string (used for hit/doc terms)*/ typedef char* FBCWordItem; /* An array of WordItems*/ typedef FBCWordItem* FBCWordList; Allocation and Initialization of Search SessionsOSErr FBCCreateSearchSession( FBCSearchSession* searchSession); OSErr FBCDestroySearchSession( FBCSearchSession theSession); OSErr FBCCloneSearchSession( FBCSearchSession original, FBCSearchSession* clone); Configuring Search SessionsOSErr FBCAddAllVolumesToSession( FBCSearchSession theSession, Boolean includeRemote); OSErr FBCSetSessionVolumes( FBCSearchSession theSession, const SInt16 vRefNums[ ], UInt16 numVolumes); OSErr FBCAddVolumeToSession( FBCSearchSession theSession, SInt16 vRefNum); OSErr FBCRemoveVolumeFromSession( FBCSearchSession theSession, SInt16 vRefNum); OSErr FBCGetSessionVolumeCount( FBCSearchSession theSession, UInt16* count); OSErr FBCGetSessionVolumes( FBCSearchSession theSession, SInt16 vRefNums[ ], UInt16* numVolumes); Executing a SearchOSErr FBCDoQuerySearch( FBCSearchSession theSession, char* queryText, const FSSpec targetDirs[ ], UInt32 numTargets, UInt32 maxHits, UInt32 maxHitWords); OSErr FBCDoExampleSearch( FBCSearchSession theSession, const UInt32* exampleHitNums, UInt32 numExamples, const FSSpec targetDirs[ ], UInt32 numTargets, UInt32 maxHits, UInt32 maxHitWords); OSErr FBCBlindExampleSearch( FSSpec examples[ ], UInt32 numExamples, const FSSpec targetDirs[ ], UInt32 numTargets, UInt32 maxHits, UInt32 maxHitWords, Boolean allIndexes, Boolean includeRemote, FBCSearchSession* theSession); Getting Information About HitsOSErr FBCGetHitCount( FBCSearchSession theSession, UInt32* count); OSErr FBCGetHitDocument( FBCSearchSession theSession, UInt32 hitNumber, FSSpec* theDocument); OSErr FBCGetHitScore( FBCSearchSession theSession, UInt32 hitNumber, float* score); OSErr FBCGetMatchedWords( FBCSearchSession theSession, UInt32 hitNumber, UInt32* wordCount, FBCWordList* list); OSErr FBCGetTopicWords( FBCSearchSession theSession, UInt32 hitNumber, UInt32* wordCount, FBCWordList* list); OSErr FBCDestroyWordList( FBCWordList theList, UInt32 wordCount); OSErr FBCReleaseSessionHits( FBCSearchSession theSession); Summarizing TextOSErr FBCSummarize( void* inBuf, UInt32 inLength, void* outBuf, UInt32* outLength, UInt32* numSentences); Getting Information About VolumesBoolean FBCVolumeIsIndexed (SInt16 theVRefNum); Boolean FBCVolumeIsRemote(SInt16 theVRefNum); OSErr FBCVolumeIndexTimeStamp(SInt16 theVRefNum, UInt32* timeStamp); OSErr FBCVolumeIndexPhysicalSize(SInt16 theVRefNum, UInt32* size); Reserving Heap Spacevoid FBCSetHeapReservation(UInt32 bytes); Application-Defined Routinetypedef Boolean (*FBCCallbackProcPtr)( UInt16 phase, float percentDone, void *data); void FBCSetCallback(FBCCallbackProcPtr fn, void* data); |
Downloadables |
AcknowledgmentsSpecial thanks to David Casseres, Pete Gontier, Tim Holmes, Ingrid Kelly, Michael J. Kobb, Eric Koebler, Alice Li, and Wayne Loofbourrow. To contact us, please use the Contact Us page. |
Updated: 5-April-99Previous Technote | Contents | Next Technote |