UK Union Catalogue of Chinese Books
INDEXING
Title strings | Exact titles | Authors | Subjects | Shelfmarks | ISBN/ISSN
The following information applies to the current state of the database. It is likely that improvements and enhancements will be made from time to time.
The title field looks like this:
![]()
First the whole title is taken, then two bytes (the length of a Chinese character) are chopped off the beginning, and the resulting string taken. This process continues until the whole string has been dealt with, as follows:

The same is then done for the romanisation, the cut-off point being in this case the space, and then all spaces are eliminated:

The resulting title strings are then all filed in the same index.
The first section of the search interface makes truncated searches in this index. It follows from this logic that the most economical way of searching for a known title is by the most unusual sequence of characters within it, in this case, simply:

Titles sometimes contain not only Chinese, but also alpha-numeric characters. If the indexing logic were applied uncritically, nonsense would be produced in cases where the alpha-numeric strings contained an odd number of bytes. It is therefore necessary to strip out the alpha-numeric characters before indexing the Chinese character strings - easily done by the removal of any byte lower than ASCII 161 dec. (all the bytes in GB2312-80 fall in the range 161-254). It follows that a single search box in this part of the interface must not contain a mixture of Chinese and alpha-numeric characters.Thus

and

will both find

But

will find nothing, and

will find only
![]()
as the term "1925 nian" is only found as a title string in this record.
Although the Union Catalogue is designed primarily to be searched in original script, romanised title and author access points have also been provided. The China MARC standard has been taken as orthodox, so that all such access points (subfield $r in the fields where they occur) are romanised syllable by syllable in lower case, for example:
tian an men guang chang li shi dang an
and the data is then indexed by the routine described above.
However, as readers often enter romanised search terms in an endless variety of forms, such as:
Tian-an-men Guang-chang li-shi dang-an
Tian'an Men Guangchang li shi dang'an
Tian'anmen Guangchang lishi dang'an
instead of requiring the China MARC norm, the search interface converts all upper case letters to lower case and strips out all punctuation and spaces before searching the title string index. So the above permutations, and many more, will all be reduced to the search term
tiananmenguangchanglishidangan
which will currently locate the following records in the database:

If the complete title of a book is a word such as zhong guo, wen xian, or some other term which is commonly found in other titles, truncated searching is of little use as the result set will be too large. A separate section of the search interface has therefore been provided for such cases, and this section makes an exact search in a separate title index. For example:

will currently find

The first of the four titles has been found because guo wen occurs as an added title entry in the record. If the same search term is entered as a title-string search, over 2,000 records are located.
As with most catalogues, searches in the author section of the interface will often yield extremely big result sets. For example, the search

will currently locate over 450 records.
However, as the principal purpose of the interface is to enable readers to gain rapid access to known titles through the entry of minimal search terms, author searches are truncated, not exact, so that very precise results can be obtained with surprisingly little effort. For example, the simple, romanised search

will find

In future, access to a browsable author index may be provided. Note that if authors are entered in romanisation, the procedure is the same as for titles, as described above, so that
luo zhen yu
Luo, Zhenyu
Luo Zhen-yu
luo-zhen-yu
luozhenyu
&c
will all find the same author, the standard romanised form being, of course, the first.
Not all the records in the database have subject headings (the allegro users are not currently producing them), and those that do use different systems: Cambridge uses National Library of China subject headings, and the libraries that use US systems (SOAS, Durham and Edinburgh) use Library of Congress Subject Headings. Both have been loaded into the subject index, and access to this index is provided in the Further Options part of the search interface. Search terms must in this case be entered exactly in their standard format: Chinese characters for the NLC system, and romanisation (American spellings, and observing capitalisation) for LCSH.
Example of an NLC subject search:

Example of an LCSH subject search:

The figures to the left of the index entry show how many records are currently attached to that entry. The result will obviously provide only a partial picture of the national book stock, owing to the different systems - if any - in use. But having obtained a result for one library, the title may then be sought in another. In future, the interface may be enhanced to enable this to be done by a simple click rather than manually entering the title as a search term.
The shelfmark index works in exactly the same way as the subject index. Search terms must be entered in their orthodox form to achieve the desired result.
The final part of the search interface makes a truncated search in the ISBN/ISSN index. Hyphens may be input, but are ignored. A full ISBN will obviously locate a unique title, but a truncated ISBN can be used for example to locate all titles with a particular publisher code, for example:

or
