Scope: [...] The objective of this project is to investigate the European needs and problems with searching and browsing, in relation to character sets, transliteration, matching and ordering rules and other cultural specific elements. The needs for a European set of requirements in this area at the present state of technology will be investigated.Subject and justification: The Global Information Infrastructure must be able to cover European Culturally specific requirements for searching and browsing. Browsing and searching refers to the fast-developing activity around search engines and personal agents operating on large amount of data, implemented mainly as the World Wide Web.
Ultimately, the objective must be that searching and browsing may be carried out in the multilingual environment of Europe.
Technology is moving fast in this area and there are few standards available, although a first generation of products (AltaVista, Yahoo, Lycos, etc.) is available. Consortia such as the W3C or FIPA (Personal Agents) are working in this area. This activity is considered as a key one for GIS (Global Information Society) and one that should see huge developments in the next future. The project team has also committed to its business plan (CEN/TC304/N860) which defines the scope of the project as follows:
This study aims "to investigate the European needs and problems with searching and browsing in relation to character sets, transliteration matching and ordering rules and other cultural specific elements." It will present an overview about current research projects, their approaches and results, try to isolate major problems and suggest some tentative paths towards their solution. It will try to establish where solutions seem within easy reach, were approaches are likely to be fruitful in medium term and where no solutions can be expected in the foreseeable future.The project team will collaborate closely not only with research institutes but also with major enterprises working in the field. Search engines require here special attention. Furthermore, it will get in contact with relevant consortia such as the W3C and FIPA.
The aim of this study is to give an overview of current practice in Europe, of ongoing research projects and of desiderata.
In accordance with the schedule the project team shall deliver a draft report by week 16 in 1999 to be presented to the plenary of CEN TC304. This report can purposefully only serve as a starting point for further research which, subject to approval, this project team aims to pursue thereafter.
Consequences from these targets
For some of the aforementioned fields, the project team has leading experts amongst its very members, amongst othersFor these fields the task of scoping the needs can be done within the pt itself.
- ordering;
- transliterations;
- character sets.
The project team has, however, does not have the expertise to cover all relevant topics by itself, and, furthermore, both N739 and N860 demand that the pt get in contact with the W3C and FIPA.
Furthermore, the pt has committed to a contacting research institutes and major enterprises in the field. To do that the pt needs to draw up a list of such institutes and enterprises and establish a hierarchie among them, as it is patently out of scope for this pt to contact or even list all potentially relevant institutitons (this would be the task of the TAP/LE project). Some of these are obvious, however: The European branches of AltaVista, Lycos etc. Also Hans van der Laans' task force belongs to this category.
Steps taken towards accomplishing the targets
- The principal editor in cooperation with the assistent editors is drawing up a statement of European requirements in the fields of ordering, transliteration and character sets;
- contact has been established with Mr. Dürst, the editor of W3C's "Requirements for String Identity Matching and String Indexing" (cf. references), and with the relevant experts at the Unicode Consortium;
- TERENA and other relevant institutions have been involved;
- the findings of the "werkgroep IRT" (the search engine task force) have been taken into account.
Working modalities of the pt
The pt tries its best to follow the guidelines of CEN/TC304/N780, though with some beforehand agreed upon modifications: Due to the problems arising from the resignation of the designated editors Neuville / La Bonté, the post of the principal editor had to be unified with that of the pt manager and the positions of two advising editors were created. Part of their task is indeed to prepare "written comments" on the draft, but it is also their explicit task to offer written "contributions and discussions" which the principal editor will then integrate into the report itself.The remainder of the general rules are, of course, taken into account and are taken to be the expression of sound general working principles. Especially tasks such as offering "an introduction, explaining the background, context and application of the document" are seen as greatly enhancing the potential usefulness of the future report.
Time plan
There can be no doubt that the pt is behind schedule. It has not succeeded to deliver a first draft in time for the Tübingen meeting of CEN/TC304 for the following reasons:
- The illness of Hans van der Laan whose end is, as yet, not clearly foreseeable though his health seems to be improving;
- the unexpected amount of time it took for the EOR pt (on which John Clews and Marc Küster are serving) to complete its draft due to large change in its base repertoire.
The EOR draft is now out for ballot, so both John Clews and Marc Küster can now concentrate on progressing the work as fast as possible and to assemble the existing results into a readable report. The situation with Hans van der Laan may be different but seems also to be changing for the better.
There are plans to use part of the not yet allocated funding to get high-level Russian input. The names of Mr. Bolotov (designeted expert from GOST on ordering) and of Mr. Schapkin (Moscow State University with special interest in human-computer interfaces) have been suggested and should be added to the pt with 2 pt days each if possible. The information thus gained might be of considerable interest for IT industry in the CEN countries.
The project manager therefore commits to the following revised schedule (cf. CEN/TC304 N860)
- Contact with relevant institutions: Ongoing;
- Continuous discussion among team members: Ongoing;
- Presentation of first draft: Mid September 1999 (end of week 37);
- Open discussion of draft within TC304: Till October 12th, 1999;
- Presentation of the modified first draft to plenary of TC304: Week 42, 1999;
- Presentation of final draft: Week 45, 1999.
Future physical meetings of the pt will include a meeting in Brussels during the TC304 meeting. More meetings are currently not planned, but might become necessary at short notice beforehand, but would be closed pt meetings only.