Personalized news content programming (PENG): A system architecture

July 5, 2017 | Autor: Gabriella Pasi | Categoria: System Architecture

Descrição do Produto

Personalized News Content Programming (PENG): A System Architecture Gabriella Pasi

Robert Villa

Università degli Studi di Milano Bicocca Via Bicocca degli Arcimboldi 8 [email protected]

ITC , Consiglio Nazionale delle Ricerche Via Edoardo Bassini 15, Milano [email protected]

Abstract In this paper an overview of the PENG project and its system architecture is presented. The objective of the project is to define and develop a news content composition and programming environment, to provide news professionals and general users with an interactive and personalised tool for multimedia news gathering and delivery. This is achieved by defining and developing a flexible system for a personalised filtering, retrieval and composition of multimedia news.

1. Introduction The huge quantity of multimedia information available on the World Wide Web has continued to stimulate the development of systems that support the easy access to information relevant to a specific users’ needs. These systems try to find a solution to the decision-making problem: how can the information items that correspond to the users’ information preferences be identified? The PENG project is focused on creating an environment which offers interactive and personalised tools for multimedia news gathering and delivery, to both professionals and general users. Currently most efforts in this direction have concerned the management of textual news. In recent years various newspapers and publishers have increasingly looked at the Web as a viable publishing medium, and started to place some of their material online. More than 230 supplemental online services are operated or under development by newspapers worldwide, an increase of approximately 130% since the end of 1994. In this context, online newspapers have been conceived and developed as a means to provide a user-tailored service, generating personalised newspapers. Some of these services are available to all

on the WWW while other services address more specific communities of users. In addition to textual news, the use of video, audio, and other multimedia news has also significantly increased in recent years, with the adoption of broadband networks and with the willingness of news providers to supply content on the WWW. These other media, such as images, speech and video, can be much more expressive than text in certain application domains, and are of particular importance in a society where the impact of visual information is more directly accessible to a larger portion of population. For this reason the need to provide user-tailored services for a personalised composition and delivery of multimedia news is of extreme importance. Traditional newspapers, magazines, radio and TV news programs are universally accepted products that have developed into highly advanced presentation media. However, this kind of presentation has deficiencies due to the pre-selection activity: all the copies of a newspaper edition contain the same information, selected by the editors based on the perceived interests of the readers. In the same way a TV service can present a given event in only a single way, for the potentially large number of viewers watching. Electronic information has removed the barrier of the pre-selection, allowing easy and cheap access to information that is increasingly generated and distributed over a network. The side effect of this information overload is the availability of an enormous quantity of distributed information, thus requiring systems that help users in the selection of their own relevant information. Personalised multimedia news composition and delivery services able to selectively create customised news are of great help in the communication of news to a user. Customised News should be generated differentiated for distinct users, based on users' profiles, which encode the users' preferences and interests over time. While in a paperbased newspaper a user may read only about 10% of

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05) 1529-4188/05 $20.00 © 2005 IEEE

the news collected, a personal newspaper will aim to contain only news which is of interest to the user.

2. Related projects Many personalized news services and systems have been developed, and are in active use. They vary in numerous ways, including how the user profile is stored, the personalized user interface operates, and in the way the information source or sources are handled. The information flow underlying these systems make the following assumptions: • •

•

the information sources contain “edited” news, i.e. news filtered and presented by journalists; the phase of information gathering is aimed at selecting, in a personalized way, a “subset” of the news that would normally appear in a paper-based newspaper; the phase of presentation is based on the use of techniques which allow the emphasis of some news with respect to others. This is another aspect of personalization, which can benefit from feedback from the user.

In the following a short description is given of some projects related to the PENG project. Fishwrap is an experimental electronic newspaper system developed at MIT. It employs "Glue", an automated news model, to compose an individual’s personalised news. The Krakatoa Chronicle is a personalized newspaper service available on the WWW, where user profiles can be tailored to alter both the newspapers contents and layout. A single information source is used, and the underlying system is based on the vector space model: both the information items and the users’ profiles are represented as weighted keywords. MyGlobalNews is an agent-based personalised news public service. In MyGlobalNews registered users receive the news they want from one or several sources based on parameters chosen by them in their respective profiles. It allows a user to see other individuals with similar interests, allowing the identification of user communities too information producers, who can also broadcast to the standard user communities identified by the system. The ELIN project (funded by the European Community) is centered on the development of the ELIN toolkit, an add on to the publishing systems of media companies. The core technologies in the project are interactive video (MPEG 4 and 7), personalisation and interactive animation. These technologies are

employed to support the use and production of interactive news and advertisement on demand. BORGES was an European project with the objective of building a web-based news and WWW filtering system. The underlying system is based the SMART retrieval system, sued to access and store news articles and user profilesc.

3. Main Objectives of the PENG project The main objective of the PENG project is to define an innovative technological solution to the personalised multimedia news access, composition and presentation, with an emphasis on personalised filtering, retrieval and composition of multimedia news. Indeed, the proposed system aims to collect news from both newsfeeds and specialised archives in a personalised way. This is performed by pushing personalised news towards the user, and by allowing her or him to expand a selected topic by searching for additional information or editing the final news through a multi-document summarisation approach. This involves the fusion of personalised filtering, search and summarisation, with the final automatic editing phase being seen as a first but very important aid to a journalist's writing activity (the initial target group for PENG). The target users of PENG are classed according to a bi-dimensional schema defined in terms of their level of interest in the news, and their topical interest. Possible user targets include information-intensive workers, students of communication faculties, journalists interested in sport, culture, economy, etc. An important characteristic that will be ensured by the system will be the flexibility in modelling the user's topical interests and context. This means modelling the capability to be both tolerant to the vagueness and uncertainty in the user-system interaction and adaptive in the learning of users' changing preferences over time. Initially, the PENG project will be aimed at news professionals, such as journalists and editors, with the view to extend the system for more general use in the future. In this context, with the term news we refer to any kind of news, including information regarding leisure and entertainment. This initial system is conceived as a personal assistant, supporting journalists in all stages of the news lifecycle. Information (text, images, and videos) is gathered from different sources (including the Web) using a combination of push and pull technologies and is presented to the user in a personalised way.

c

www.cordis.lu/libraries/en/projects/borges.html

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05) 1529-4188/05 $20.00 © 2005 IEEE

4. PENG system's architecture The main functionalities of the PENG system are separated into three main phases: a push phase, a pull phase and a presentation phase. In the push phase a filtering system will be developed, by means of which a first selection of news can be selected from newswires and other news archives. This filtering will be based both on a dynamic user profile including the personal user’s trust in the information sources. In the pull phase a user query and the user profile will be used to retrieve further and more specific information from both the same sources used in the pull phase, and also from additional sources automatically selected in relation to the content of the query and the user profile. A distributed information retrieval approach will be used, where the query can be automatically generated from user feedback on the information presented by the pull phase. The presentation phase uses multi-document and multi-media visualisation and summarisation to present the results from the push and pull phases to the user. This will take into account the trust a user places in the information sources and can be viewed as the definition of a basic "electronic writer". Consequently, the summary produced by the system will be personalised not only to the user information need but also to visualisation preferences and the subjective interpretation of the users trust in sources of information. Figure 1, below, presents a simplified view of the PENG architecture, with the modules corresponding to the three main phases highlighted in gray.

PENG system

Information Presentation

Database and communication layer

Information Filtering

Information Retrieval

User profile database

rd

3 party Information sources

Figure 1: PENG Conceptual Architecture

The user accesses the system locally through an interface provided by the presentation module. Each of the three main modules communicate via an intermediary layer which also manages access to the common databases required by the system, the most important of which is the user profile database (the other databases are not shown for clarity). This database and communication layer is composed of a user profile manager (which manages the user profile database) and a common database manager (which manages other common databases and coordinates the communication between the modules). In the PENG system, a user profile will contain all data relating to a single user, split into four rough categories (based on [10]): personal information (such as name, email etc), information preferences (what information is relevant to the user, from where), presentation preferences (how this information is to be displayed) and interaction history (the history of the user's interaction with the PENG system). Since a user may be interested in numerous different subjects, the information and presentation preferences will have be split into a set of different user interests. Each interest is personal to the user to which it belongs, and plays an important part in the filtering module (which is intended to not only filter documents to users, but to the correct user interest) and information retrieval modules (as providing a context in which a search can take place). Importantly, the profile will store the degree to which a user trusts different information sources ('trust scores'), information hypothesised as being important in news gathering and filtering. Trust scores are conceived as indications of the potential reliability of the information sources to a specific user (or category of users) with respect to a given topical area.

4.1. Information filtering (IF) The Information Filtering (IF) module aims at pushing relevant information to the user, and has two main elements. A non-personalised categorisation of current news aims to identify clusters of topically similar documents independent of user preferences. A personalised filtering stage will then select not only individual news stories based on a set of filtering criteria, but may also select clusters of news stories, aiding the user in categorising and understanding the news landscape at that time. It may also allow easier access to topically relevant stories which nonetheless would not otherwise be classified as relevant to the user in a standard document oriented filtering algorithm. A gathering sub-module will first collect news stories from a range of information sources. This

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05) 1529-4188/05 $20.00 © 2005 IEEE

gathering may be passive (such as data arriving over a newsfeed) or active (such as from information scraped from web sites). Gathered material will be tagged with the information source the data came from, the time it arrived, and other metadata, and will be stored and indexed by the common database module (part of the common database and communication layer in figure 1). This will ensure that the news can also be easily accessed by the presentation and IR modules, for display, summarisation, query creation, etc. The next stage is the fuzzy clustering sub-module, which groups news stories into different topics, independent of individual user needs. This operates periodically, and will generate a hierarchy of recently arrived news. The fuzzy k-means algorithm is used, applied recursively to documents until a hierarchy of the required depth is created (the depth of the hierarchy is set by a PENG administrator). The use of a fuzzy algorithm is important, since many news stories may naturally be placed in more than one category. The clusters will again be stored as part of the common database manager, ready for the next stage of the process, the personalised filtering. The personalised filtering sub-module will route documents or clusters to the relevant interests of each user (where each user may have multiple overlapping interests). The training of these personalised filters (one per user interest) is carried out by the final sub-module, the learning. This takes relevance feedback from a user to train the filters to better identify relevant information for that user. For this explicit relevance information, such as the user marking a document as relevant to a user interest may be supplemented by implicit information (e.g. reading time as a gauge of the relevance of a document). This above process, with the exception of the filter training, will occur continuously through the day, with or without user involvement. Filtering results, stored in each user profile, will be kept up-to-date and ready for when a user logs onto the PENG system.

4.2. Information Retrieval (IR) The Information Retrieval (IR) module will use distributed information retrieval techniques to enable a PENG user to 'pull' new information from a range of information sources. The Query Formulation sub-module is responsible for the generation of new queries, which may include using contextual information from the user profile. For example, a user searching within the context of a 'sport' interest may be presented with sport-biased search results. Queries may be generated by example (such as by the user selecting a document, or group of

documents as being representative of an information need) or they may be explicitly entered in a conventional way. The output query will then be sent to the Broker sub-module, explained later. Resource selection is the process by which the 'best' information sources are selected to be for searched. This may be based on the query or other user profile information. A resource description store is used to hold information about the different resources available, and resource selection will select a subset of these resources for any given query. This selection may also be based on other criteria, such as the monetary cost of searching a resource. The Broker sub-module is passed both the formulated query and the subset of the resources to be searched. The Broker is responsible for translating the query into a suitable form for each resource's search system. The translated queries may then be sent to the relevant resources for searching, the results from each individual search then being passed to the data fusion sub-module. Data Fusion combines the retrieved documents from each resource into one unified list. This fusion step will merge and rank documents, forming a final ranked list which can be displayed to the user by the presentation module. In the case of multimedia information, separate result lists for different media types will be created, one for each different media.

4.3. Information Presentation (IP) The final part of PENG is the presentation module, which provides an integrated interface for the results of the filtering and retrieval modules, and facilities for the summarisation and organisation of the retrieved material. The interface is intended to provide a flexible and customisable environment for a journalist's work. A profile editor will allow the user to explicitly edit the user profile at any time, allowing the user to select and edit any other element of the user profile. The query editor and generation interface allow the user to edit and otherwise interact with the system to create queries which will be sent to the IR module. In addition to conventional text queries this will allow the selection of documents as examples of an information need, and the use of contextual information such as the reference to an individual user interest. The results explorer displays results from either the IR or IF modules. Utilising the same interface to display both filtering and retrieval results aims to both cut down in the complexity of the system design, and more importantly, in the complexity of the user interface. Results may be displayed in an order defined by the user (location, date, source, topic, clusters, etc.)

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05) 1529-4188/05 $20.00 © 2005 IEEE

and may also be displayed individually at various levels of details (title, summary, full text, etc.) Results may be marked as relevant, either to a user interest for use by the IF module, or to a query for use in a relevance feedback (for the IR module). The working space manager allows the user to prepare a draft of an article from elements extracted from selected documents. Documents may be imported from the results explorer allowing the user to reorder selected documents (for instance, to reflect the order in which he/she will use the documents in writing). The working space maybe saved to the personal repository or exported to a file for further editing outside the PENG system. Both the results explorer and working space manager can make use of two other parts of the presentation module: a summarisation tool for automatically generating summaries of documents, and a clustering tool which can cluster the results from the IF or IR stages, grouping similar documents. This optional clustering stage is distinct from the IF clustering, intended purely as a visualisation tool. It is envisioned that a standard partitioning style clustering algorithm, such as k-means, may be used at this stage, in a similar manner to a number of existing search systems. Finally, the personal repository manager allows the user to organize and browse documents which have been saved for future use, operating in a similar manner to the 'bookmarks' facility in many web browsers.

5. Innovation in the PENG project PENG has the potential to greatly contribute to the continuing development of filtering and retrieval systems, for the benefit of the journalists, and ultimately for all users of news services. Professionals, such as journalists or editors, can tune the contribution of the distinct sources to their information gathering, filtering and editing tasks. This is achieved by specifying queries expressing constraints on the multimedia and time-dependent content of the news so as to focus on a particular event, and by associating distinct trust scores with the information sources. The trust scores are interpreted as indications of the reliability of an information source to a user. This enables the tuning of a personalised gathering and presenting of news that expresses an individuals view and opinion on an event, a condition for journalism that has become predominant in recent years and a very important condition for a personalised presentation of news to the general user. In fact, while this can greatly reduce the time needed for a journalist to consult the distinct sources and to report on a given topic of

interest, it also enables the presentation of news tailored to a specific users interests. The automatic classification of the news into thematic clusters represented by sets of keywords can be coupled by successively using PENG to yield personalised summaries on up to date topics. This can help in drafting a personalised multimedia newspaper and can thus be a powerful tool for the editorial staff of a journal.

Acknowledgments PENG “PErsonalised News content programminG Information” is a Specific TARGETED RESEARCH PROJECT (IST-004597) funded within the Sixth Framework Program of the European Research Area.

References [1]

M. Agosti, F. Crestani and G. Pasi eds., "Lectures on Information Retrieval", Springer-Verlag, 2001. [2] G. Bordogna, G.Pasi, R. Yager, “Soft Approaches to information Retrieval on the WEB”, Int. Journal of Approximate Reasoning, 34, 105-120, (2003). [3] G. Bordogna, G.Pasi, “Personalised indexing and retrieval of heterogeneous structured documents”, Information Retrieval Journal, in press (2004). [4] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. netes, M. Sartin, “Combining content-based and collaborative filters in an online newspaper”, ACM sigir workshop on recommemder systems Aug. 19, Berkeley. (1999). [5] F. Crestani and G. Pasi, editors. “Soft Computing in Information Retrieval: Techniques and Applications.” Physica-Verlag (Springer-Verlag), Heidelberg, Germany, 2000. [6] F. Kilander, “A brief comparison of News filtering Software”, http://www.glue.umd.edu/enee/medlab/filter/filter.ht ml. [7] D. Moraru, L. Besacier, P. Mulhem and G. Quénot, "CLIPS-IMAG at TREC-11 : Experiments in Video Retrieval", 11th Text Retrieval Conference, Gaithersburg, MD, USA, 19-22 November, 2002. [8] P. Mulhem, J. Gensel and H. Martin, “Adaptive Video Summarization”, in Handbook on Video Databases, CRC Press, to appear, 2003 [9] G. Pasi, “Modelling users’ preferences in systems for information access”, International Journal of Intelligent Systems, 18, 793-808, (2003). [10] G. Amato and U. Straccia (1999) “User Profile Modeling and Applications to Digital Libraries”, 3rd European Conference on Digital Libraries, ECDL99, Paris, France, September 22-24, LNCS 1696

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05) 1529-4188/05 $20.00 © 2005 IEEE

Lihat lebih banyak...

Personalized news content programming (PENG): A system architecture

Descrição do Produto

Comentários