ARTICLE PART 2
DATA HIDING
A major concern for creators of digital content, whether it's Web content, music, or movies on digital disc, is to protect their work from unauthorized copying and distribution. IBM researchers have developed a technology called Data Hiding that enables the embedding of data (such as digital IDs or captions) invisibly.
Hiding Information
Data Hiding is a technology that allows owners of content to embed data invisibly or inaudibly into other digital media, such as video, audio data, or still images. When the data is embedded, it is not written as part of the data header, but rather embedded directly into the digital media itself. This process changes the original digital data, but the change is so slight that it cannot be visually detected, and has no impact on the performance of the original data.
The beauty of Data Hiding is that it gives content creators a powerful means of marking their digital creations without affecting the end user's appreciation of it. The embedded information does not change the size of the original data, and it survives normal image processes such as compression and format transformation.
Who Can Use It
Data Hiding can be a useful tool for anyone associated with creating or distributing digital content. In addition to authors, it could be used by copyright agencies to detect unauthorized copies of digital content, by authorized reproduction studios to ensure that that have valid copies of digital content, or even for comments that can be inserted in various revisions of digital content.
One of the biggest markets of Data Hiding may be in the emerging DVD market. Content providers have been concerned about the ability of casual users to make a clear copy illegally. Data Hiding technology could enable content creators and distributors to build systems that could effectively prevent illegal copying.
- M. V. Ramanamurthy - I MSC (IT)
FRACTAL IMAGE COMPRESSION
Data compression is nothing new; it's used by most modems. If you download information from the Internet, you probably will be using some sort of utility, such as WinZip or Stuff It, to decompress the information. These utilities preserve all of the information in the original file, performing what's technically called lossless compression which is obviously important if you're compressing a program file or formatted text document.
Compression of graphic images, on the other hand, does not preserve all of a file's data. Lossy compression sacrifices precision in order to make the resulting file more compact. The assumption is that most people don't notice the loss of small details, especially if they're watching a video or looking at a newspaper style photograph. The standard method of lossy compression employs the JPEG technology, named for the Joint Photographic Experts Group, which first approved the standard. JPEG breaks down an image into a grid and uses a fairly simple mathematical formula to simplify the visual information contained in each square of the grid. This reduces the space needed to store the image, but degrades the quality of the image, often making it look blocky. A higher compression ratio equals greater image distortion.
Fractal compression could change the assumptions behind lossy and lossless compression. Invented in the 1980s by Michael Barnsley and Alan Sloan, two mathematicians at Georgia Tech, fractal compression is based on the discovery by Benoit Mandelbrot, an IBM scientist that a hidden geometry exists in apparently random patterns of nature. Further studies of fractals revealed that images, from mountains to clouds to snowflakes can be built from simple fractal patterns. In fractal theory, the formula needed to create part of the structure can be used to build the entire structure. For example, the formula to create the pattern for a tiny piece of a fern can be used to create the entire fern leaf. Barnsley's discovery was that the process could be used in reverse. Barnsley patented a technology that takes real world images, analyses them, and breaks them down into groups of fractals, which can be stored as a series of fractal instructions. These instructions take up much less space than the bit-mapped images used in jpeg technology.
It took Barnsley's company, Iterated Systems, almost six years to perfect the technique of fractal compression to the point where it was commercially viable. The company's claims that it can achieve compression ratios of 20,000 to 1. Fractal compression technology from Iterated Systems does indeed provide higher compression ratios and better image quality than anything else on the market. Photographic images can be compressed from 20:1 and 50:1 with no noticeable loss in resolution, and the company also claims that it can compress images with a ratio of more than 200:1 and maintain acceptable resolution. This is unmatched by jpeg or any other current technology and holds a tremendous amount of promise for delivering a wide range of graphics and multimedia technologies, from color fax transmission to full motion video over telephone lines.
Because fractal images are stored as mathematical formulas rather than as bit maps, they can be decompressed to resolutions that are higher or lower than those of the original. The ability to scale images without distortion is one of fractal compression's important advantages over jpeg. Fractal compression can also improve as you apply more processing power; you can improve both the amount of compression as well as the quality of the image by just letting the system process the image longer. This upfront processing requirement is fractal compression's biggest drawback. On a typical microcomputer, it would take about 900 hours to compress a single hour of video. This underscores the fact that fractal compression is an asymmetric system, it takes ages to compress, but decompressing is quick. Jpeg, on the other hand, is a symmetric compression system, it takes the same amount of time to compress and decompress a file. This makes jpeg more suitable for some applications, but makes fractal compression ideal for applications like video on demand.
Iterated also has stumbled on another revolutionary aspect of the technology called fractal image enhancement a process that can actually add details missing from the uncompressed scanned image or digital file. The process works by calculating what information was probably left out of the image when it was originally broken down into a grid of pixels. This technique could also allow images to be greatly enlarged without showing pixel chunks or otherwise losing detail, think wall sized HDTV. Microsoft was so impressed with Iterated System’s advances, it licensed the company's fractal compression technology for use in its Encarta CD-ROM, a multimedia encyclopedia that contains more than 10,000 color images. And the US Commerce Department recently granted the company $2 million to develop a low cost fractal decompression chip that can keep pace with the frame rate of television.
It may be possible to improve fractal compression technology even further by refining the formulas that recognize fractal patterns. There's a problem, though, Iterated Systems has obtained a patent on its compression technology, but is currently unwilling to reveal the exact nature of the algorithms (which are trade secrets) used in the process. This means that the technology will only advance at whatever rate a single company - Iterated - decides to set.
Future Compression Technologies
Over the last couple of years there has been a great increase in the use of video in digital form due to the popularity of the Internet. We can see video segments in Web pages, we have DVD’s to store video and HDTV will use a video format for broadcast. To understand the video formats, we need to understand the characteristics of the video and how they are used in defining the format. Video is a sequence of images which are displayed in order. Each of these images is called a frame. We cannot notice small changes in the frames like a slight difference of color so video compression standards do not encode all the details in the video, some of the details are lost. This is called lossy compression. It is possible to get very high compression ratios when lossy compression is used. Typically 24 to 30 frames are displayed on the screen every second. There will be lots of information repeated in the consecutive frames. If a tree is displayed for one second then 30 frames contain that tree. This information can be used in the compression and the frames can be defined based on previous frames. So consecutive frames can have information like "move this part of the tree to this place". Frames can be compressed using only the information in that frame (intra frame) or using information in other frames as well (inter frame). Intra frame coding allows random access operations like fast forward and provides fault tolerance. If a part of a frame is lost, the next intra frame and the frames after that can be displayed because they only depend on the intra frame. Every color can be represented as a combination of red, green and blue. Images can also be represented using this color space. However this color space called RGB is not suitable for compression since it does not consider the perception of humans. YUV color space where only Y gives the grayscale image. Human eye is more sensitive to changes is Y and this is used in compression. YUV is also used by the NTSC, PAL, SECAM composite color TV standards.
Compression ratio is the ratio of the size of the original video to the size of the compressed video. To get better compression ratios, pixels are predicted based on other pixels. In spatial prediction, a pixel can be obtained from pixels of the same image, in temporal prediction; the prediction of a pixel is obtained from a previously transmitted image. Hybrid coding consists if a prediction in the temporal dimension with a suitable decor relation technique in the spatial domain. Motion compensation establishes a correspondence between elements of nearby images in the video sequence. The main application of motion compensation is providing a useful prediction for a given image from a reference image.
DCT (Discrete Cosine Transform) is used in almost all of the standardized video coding algorithms. DCT is typically done on each 8x8 block. 1-D DCT requires 64 multiplications and for an 8x8 block 8 1-D DCTs are needed. 2-D DCT requires 54 multiplications and 468 additions and shifts. 2-D DCT is used in MPEG, there is also hardware available to do DCT. When DCT is performed, the top left corner has the highest coefficients and bottom right has the lowest, this makes compression easier. The coefficients are numbered in a zigzag order from the top left to bottom right so that there will be many small coefficients at the end. The DCT coefficients are then divided by the integer quantization value to reduce precision. After this division it is possible to loose the lower coefficients if they are much smaller than the quantization. The coefficients are multiplied by the quantization value before IDCT (inverse DCT).
MPEG-2
MPEG-2 is designed for diverse applications which require a bit rate of up to 100Mbps. Digital high-definition TV (HDTV), DVD, interactive storage media (ISM), cable TV (CATV) are sample applications. Multiple video formats can be used in MPEG-2 coding to support these diverse applications. MPEG-2 has bitstream scalability: it is possible to extract a lower bit stream to get lower resolution or frame rate. Decoding MPEG-2 is a costly process, bit stream scalability allows flexibility in the required processing power for decoding. MPEG-2 is upward, downward, forward and backward compatible. Upward compatibility means the decoder can decode the pictures generated by a lower resolution encoder. Downward compatibility implies that a decoder can decode the pictures generated by a higher resolution encoder. In a forward compatible system, a new generation decoder can decode the pictures generated by an existing encoder and in a backward compatible system, existing decoders can decode the pictures generated by new encoders.
In MPEG-2 the input data is interlaced since it is more oriented towards television applications. Video sequence layers are similar to MPEG-1 the only improvements are field/frame motion compensation and DCT processing, scalability. Macro blocks in MPEG-2 has 2 additional chrominance blocks when 4:2:2 input format is used. 8x8 block size is retained in MPEG-2, in scaled format blocks can be 1x1, 2x2, 4x4 for resolution enhancement. P and B frames have frame and field motion vectors.
MPEG-4
Success of digital television, interactive graphics applications and interactive multimedia encouraged MPEG group to design MPEG-4 which allows the user to interact with the objects in the scene within the limits set by the author. It also brings multimedia to low bitrate networks.
MPEG-4 uses media objects to represent aural, visual or audiovisual content. Media objects can be synthetic like in interactive graphics applications or natural like in digital television. These media objects can be combined to form compound media objects. MPEG-4 multiplexes and synchronizes the media objects before transmission to provide quality of service and it allows interaction with the constructed scene at receiver’s machine.
MPEG-4 organizes the media objects in a hierarchical fashion where the lowest level has primitive media objects like still images, video objects, audio objects. MPEG-4 has a number of primitive media objects which can be used to represent 2 or 3-dimensional media objects. MPEG-4 also defines a coded representation of objects for text, graphics, synthetic sound, talking synthetic heads.
MPEG-4 provides a standardized way to describe a scene. Media objects can be places anywhere in the coordinate system. Transformations can be used to change the geometrical or acoustical appearance of a media object. Primitive media objects can be grouped to form compound media objects. Streamed data can be applied to media objects to modify their attributes and the user’s viewing and listening points can be changed to anywhere in the scene.
The visual part of the MPEG-4 standard describes methods for compression of images and video, compression of textures for texture mapping of 2-D and 3-D meshes, compression of implicit 2-D meshes, compression of time-varying geometry streams that animate meshes. It also provides algorithms for random access to all types of visual objects as well as algorithms for spatial, temporal and quality scalability, content-based scalability of textures, images and video. Algorithms for error robustness and resilience in error prone environments are also part of the standard.
For synthetic objects MPEG-4 has parametric descriptions of human face and body, parametric descriptions for animation streams of the face and body. MPEG-4 also describes static and dynamic mesh coding with texture mapping, texture coding with view dependent applications.
MPEG-4 supports coding of video objects with spatial and temporal scalability. Scalability allows decoding a part of a stream and construct images with reduced decoder complexity (reduced quality), reduced spatial resolution, reduced temporal resolution., or with equal temporal and spatial resolution but reduced quality. Scalability is desired when video is sent over heterogeneous networks, or the receiver can not display the video at full resolution.
Robustness in error prone environments is an important issue for mobile communications. MPEG-4 has 3 groups of tools for this. Resynchronization tools enables the resynchronization of the bitstream and the decoder when an error has been detected. After synchronization data recovery tools are used to recover the lost data. These tools are techniques that encode the data in an error resilient way. Error concealment tools are used to conceal the lost data. Efficient resynchronization is key to good data recovery and error concealment.
Fractal-Based coding
Fractal coding is a new and promising technique. In an image values of pixels that are close are correlated. Transform coding takes advantage of this observation. Fractal compression takes advantage of the observation that some image features like straight edges and constant regions are invariant when rescaled. Representing straight edges and constant regions efficiently using fractal coding is important because transform coders cannot take advantage of these types of spatial structures. Fractal coding tries to reconstruct the image by representing the regions as geometrically transformed versions of other regions in the same image.
Model-based Video Coding
Model based schemes define three dimensional space structural models of the scene. Coder and decoder use an object model. The same model is used by coder to analyze the image, and by decoder to generate the image. Traditionally research in model based video coding focuses on head modeling, head tracking, local motion tracking, and expression analysis, synthesis. Model based video coding have bean mainly used for video conferencing and video telephony since mostly the human head is modeled. Model based video coding has concentrated in modeling of images like the head and shoulders because it is impossible to model every object that may be in the scene. There is lots of interest in applications such as speech driven image animation of talking heads and virtual space teleconferencing.
In model-based approaches a parameterized model is used for each object in the scene. Coding and transmission is done using the parameters associated with the objects. Tools from image analysis and computer vision is used to analyze the images and find the parameters. This analysis provides information on several parameters like size, location, and motion of the objects in the scene. Results have shown that it is possible to get good visual quality at rates as low as 16kbps.
Scalable Video Coding
Multimedia communication systems may have nodes with limited computation power to be used for decoding and heterogeneous networks such as combination of wired and wireless networks. In these cases we need the ability to decode at a variety of bit rates. Scalable coders have this property. Layered multicast has been proposed as a way to provide scalability in video communication systems.
MPEG-2 has basic mechanisms to achieve scalability but it is limited. Spatiotemporal resolution pyramids is a promising approach to provide scaleable video coding. Open loop and closed loop pyramid coders both provide efficient video coding and inclusion of multiscale motion compensation. Simple filters can be used for spatial down sampling and interpolation operations and fast and efficient codecs can be implemented. Morphological filters can also be used to improve image quality.
Pyramid coders have multistage quantization scheme. Bit allocation to the various quantisers depending on the image is important to get efficient compression. Optimal bit allocation is optimally computationally infeasible when pyramids with more than two layers are used. Closed loop pyramid coders are better suited for practical applications then open loop pyramid coders since they are less sensitive to suboptimal bit allocations and simple heuristics can be used.
There are several ways to utilize multistage motion compensation. Efficiently computing motion vectors and then encoding them by hierarchical group estimation is one way. When video is sent over heterogeneous networks scalability is utilized by offering a way to reduce the bit rate of video data in case of congestion. By using priorities the network layer can reduce bitrate without knowing the content of the packet or informing the sender.
Wavelet-based Coding
Wavelet transform techniques have been investigated for low bit rate coding. Wavelet based coding has better performance than traditional DCT based coding. Much lower bit rate and reasonable performance are reported based on the application of these techniques to still images. A combination of wavelet transform and vector quantization gives better performance. Wavelet transform decomposes the image into a multi frequency channel representation, each component of which has its own frequency characteristics and spatial orientation features that can be efficiently used for coding. Wavelet based coding has two main advantages: it is highly scaleable and a fully embedded bitstream can be easily generated. The main advantage over standard techniques such as MPEG is that video construction is achieved in a fully embedded fashion. Encoding and decoding process can stop at a predetermined bit rate. The encoded stream can be scaled to produce the desired spatial resolution and frame rate as well as the required bit rate. Vector quantization makes use of the correlation and the redundancy between nearby pixels or between frequency bands. Wavelet transform with vector quantization exploits the residual correlation among different layers if the wavelet transform domain using block rearrangement to improve the coding efficiency. Further improvements can also be made by developing the adaptive threshold techniques for classification based on the contrast sensitivity characteristics of the human visual system. Joint coding of the wavelet transform with trellis coded quantization as a joint source channel coding is an area to be considered.
Additional video coding research applying the wavelet tranform on a very low bit rate commmunication channel is performed. The efficiency of motion compensated prediction can be improved by overlapped motion compensation in which the candidate regions from the previous frame are windowed to obtain a pixel value in the predicted frame. Since the wavelet transform generates multiple frequency bands, multifrequency motion estimation is available for the transformed frame. It also provides a representation of the global motion structure. Also, the motion vectors in lower frequency bands are predicted with the more specific details of higher frequency bands. This hierarchical motion estimation can also be implemented with the segmentation technique that utilises edge boundaries from the zero crossing points in the wavelet transform domain. Each frequency band can be classified as temporal activity macroblocks or no temporal activity macroblocks. The lowest band may be coded using DCT, and the other bands may be coded using vector quantization or trellis coded quantization.
FUTURE USER INTERFACES
Several new user interface technologies and interaction principles seem to define a new generation of user interfaces that will move off the flat screen and into the physical world to some extent. Many of these next generation interfaces will not have the user control the computer through commands, but will have the computer adapt the dialogue to the user's needs based on its inferences from observing the user.
Most current user interfaces are fairly similar and belong to one of two common types: Either the traditional alphanumeric full screen terminals with a keyboard and function keys, or the more modern WIMP workstations with windows, icons, menus, and a pointing device. In fact, most new user interfaces released after 1983 have been remarkably similar. In contrast, the next generation of user interfaces may move beyond the standard WIMP paradigm to involve elements like virtual realities, head mounted displays, sound and speech, pen and gesture recognition, animation and multimedia, limited artificial intelligence, and highly portable computers with cellular or other wireless communication capabilities. It is hard to envision the use of this hodgepodge of technologies in a single, unified user interface design, and indeed, it may be one of the defining characteristics of the next generation user interfaces that they abandon the principle of conforming to a canonical interface style and instead become more radically tailored to the requirements of individual tasks.
The fundamental technological trends leading to the emergence of several experimental and some commercial systems approaching next generation capabilities certainly include the well known phenomena that CPU speed, memory storage capacity, and communications bandwidth all increase exponentially with time, often doubling in as little as two years. In a few years, personal computers will be so powerful that they will be able to support very fancy user interfaces, and these interfaces will also be necessary if we are to extend the use of computers to larger numbers than the mostly penetrated markets of office workers.
Traditional user interfaces were function oriented, the user accessed whatever the system could do by specifying functions first and then their arguments. For example, to delete a file in a line-oriented system, the user would first issue the delete command in some way such as typing delete. The user would then further specify that the name of the item to be deleted. The typical syntax for function oriented interfaces was a verb noun syntax.
In contrast, modern graphical user interfaces are object oriented, the user first accesses the object of interest and then modifies it by operating upon it. There are several reasons for going with an object oriented interface approach for graphical user interfaces. One is the desire to continuously depict the objects of interest to the user to allow direct manipulation. Icons are good at depicting objects but often poor at depicting actions, leading objects to dominate the visual interface. Furthermore, the object oriented approach implies the use of a noun verb syntax, where the file is deleted by first selecting the file and then issuing the delete command (for example by dragging it into the recycle bin). With this syntax, the computer has knowledge of the operand at the time where the user tries to select the operator, and it can therefore help the user select a function that is appropriate for that object by only showing valid commands in menus and such. This eliminates an entire category of syntax errors due to mismatches between operator and operand.
A further functionality access change is likely to occur on a macro level in the move from application oriented to document oriented systems. Traditional operating systems have been based on the notion of applications that were used by the user one at a time. Even window systems and other attempts at application integration typically forced the user to use one application at a time, even though other applications were running in the background. Also, any given document or data file was only operated on by one application at a time. Some systems allow the construction of pipelines connecting multiple applications, but even these systems still basically have the applications act sequentially on the data.
The application model is constraining to users who have integrated tasks that require multiple applications to solve. Approaches to alleviate this mismatch in the past have included integrated software and composite editors that could deal with multiple data types in a single document. No single program is likely to satisfy all computer users, however, no matter how tightly integrated it is, so other approaches have also been invented to break the application barrier. Cut and paste mechanisms have been available for several years to allow the inclusion of data from one application in a document belonging to another application. Recent systems even allow live links back to the original application such that changes in the original data can be reflected in the copy in the new document (such as Microsoft’s OLE technology). However, these mechanisms are still constrained by the basic application model that require each document to belong to a specific application at any given time.
An alternative model is emerging in object oriented operating systems where the basic object of interest is the user's document. Any given document can contain sub objects of many different types, and the system will take care of activating the appropriate code to display, print, edit, or email these data types as required. The main difference is that the user no longer needs to think in terms of running applications, since the data knows how to integrate the available functionality in the system. In some sense, such an object oriented system is the ultimate composite editor, but the difference compared to traditional, tightly integrated multi-media editors is that the system is open and allows plug and play addition of new or upgraded functionality as the user desires without changing the rest of the system.
Even the document oriented systems may not have broken sufficiently with the past to achieve a sufficient match with the users' task requirements. It is possible that the very notion of files and a file system is outdated and should be replaced with a generalized notion of an information space with interlinked information objects in a hypertext manner. As personal computers get multi Gigabyte harddisks, and additional Terabytes become available over the Internet, users will need to access hundreds of thousands or even millions of information objects. To cope with this mass of information, users will need to think of them in more flexible ways than simply as files, and information retrieval facilities need to be made available on several different levels of granularity to allow users to find and manipulate associations between their data. In addition to hypertext and information retrieval, research approaching this next generation data paradigm includes the concept of piles of loosely structured information objects, the information workspace with multiple levels of information storage connected by animated computer graphics to induce a feeling of continuity, personal information management systems where information is organized according to the time it was accessed by the individual user, and the integration of fisheye hierarchical views of an information space with feedback from user queries. Also, several commercial products are already available to add full text search capabilities to existing file systems, but these utility programs are typically not integrated with the general file user interface.
HOTVIDEO MULTIMEDIA INTERFACE
Information at the touch of a button. That’s what the Internet gives you. Going from Web site to home page to in depth information has been possible due to the use of hypertext, which indicates links to related screens. With the onslaught of videos and graphics on the Internet, hypervideo was the natural next step. Now, IBM Research has a new technology that takes the connection even further.
HotVideo is an innovative implementation of hypervideo. It extends the concept of hyper links from text and images to any dynamic object within a video. With HotVideo, sections of a digital video can be linked to various locations, such as home pages, other video clips, audio clips, images, text or executables. HotVideo transforms the two dimensional space of ordinary videos into true multimedia information space.
The video source for HotVideo can reside in disks, DVD`s, server files or from a web server over the Internet. When a video is played, HotVideo synchronises data that resides in different files. Since the information is stored separately, the video itself never changes. What appears is an indicator that highlights a hot link location. A green indicator means there is no link. When the indicator turns red, it means there is a hot link. Hot Video is not intrusive, the user can either connect to another location, or continue to view the video.
HotVideo includes a highly flexible authoring tool that enables users to customize the look of their hot links and even the way the interface works. For example, a colour or highlighting change can be set up so that it is revealed on demand by the user by clicking on the right mouse button. Another way to customize may even require that the user find the hot link, a feature which could be Used in applications such as games.With various options and the ability to use the authoring tool to customize, HotVideo may be used for virtually any application that contains a video. It can be easily adapted to other programs, which opens up endless possibilities. It is considered a platform for interactive videos. Naturally, it could enhance the depth and effectiveness of Web pages as well as presentation videos, DVD’s or digital videos. TV networks could use HotVideo to explore new forms of programming, giving them the ability to learn about customer preferences or sell subsidiary services. For example, a travel program might sell HotVideo links to the sites of travel agencies, airlines, shops or other businesses that were featured in that program. HotVideo may have its roots in the Internet, but its possible uses go far beyond the world wide web.Letizia is a user interface agent that assists a user browsing the World Wide Web. As the user operates a conventional Web browser such as Netscape or Internet Explorer, the agent tracks user behavior and attempts to anticipate items of interest by doing concurrent, autonomous exploration of links from the user's current position. The agent automates a browsing strategy consisting of a best first search augmented by heuristics inferring user interest from browsing behavior.
The recent explosive growth of the World Wide Web and other on line information sources has made critical the need for some sort of intelligent assistance to a user who is browsing for interesting information. Past solutions have included automated searching programs such as WAIS or Web crawlers that respond to explicit user queries. Among the problems of such solutions are that the user must explicitly decide to invoke them, interrupting the normal browsing process, and the user must remain idle waiting for the search results.
The agent tracks the user's browsing behavior following links, initiating searches, requests for help and tries to anticipate what items may be of interest to the user. It uses a simple set of heuristics to model what the user's browsing behavior might be. Upon request, it can display a page containing its current recommendations, which the user can choose either to follow or to return to the conventional browsing activity. Interleaving Browsing With Automated Search. The model adopted by Letizia is that the search for information is a cooperative venture between the human user and an intelligent software agent. Letizia and the user both browse the same search space of linked Web documents, looking for interesting documents. No goals are predefined in advance. The difference between the user's search and Letizia's is that the user's search has a reliable static evaluation function, but that Letizia can explore search alternatives faster than the user can. Letizia uses the past behavior of the user to anticipate a rough approximation of the user's interests.
Critical to Letizia's design is its control structure, in which the user can manually browse documents and conduct searches, without interruption from Letizia. Letizia's role during user interaction is merely to observe and make inferences from observation of the user's actions that will be relevant to future requests. In parallel with the user's browsing, Letizia conducts a resource limited search to anticipate the possible future needs of the user. At any time, the user may request a set of recommendations from Letizia based on the current state of the user's browsing and Letizia's search. Such recommendations are dynamically recomputed when anything changes or at the user's request. Letizia is in the tradition of behavior based interface agents. Rather than rely on a preprogrammed knowledge representation structure to make decisions, the knowledge about the domain is incrementally acquired as a result of inferences from the user's concrete actions.Letizia adopts a strategy that is midway between the conventional perspectives of information retrieval and information filtering. Information retrieval suggests the image of a user actively querying a base of mostly irrelevant knowledge in the hopes of extracting a small amount of relevant material. Information filtering paints the user as the passive target of a stream of mostly relevant material, where the task is to remove or de-emphasize less relevant material. Letizia can interleave both retrieval and filtering behavior initiated either by the user or by the agent.
Modeling The User's Browsing Process
The user's browsing process is typically to examine the current HTML document in the Web browser, decide which, if any, links to follow, or to return to a document previously encountered in the history, or to return to a document explicitly recorded in a hot list, or to add the current document to the hot list.
The goal of the Letizia agent is to automatically perform some of the exploration that the user would have done while the user is browsing these or other documents, and to evaluate the results from what it can determine to be the user's perspective. Upon request, Letizia provides recommendations for further action on the user's part, usually in the form of following links to other documents.
Letizia's leverage comes from overlapping search and evaluation with the idle time during which the user is reading a document. Since the user is almost always a better judge of the relevance of a document than the system, it is usually not worth making the user wait for the result of an automated retrieval if that would interrupt the browsing process. The best use of Letizia's recommendations is when the user is unsure of what to do next. Letizia never takes control of the user interface, but just provides suggestions.
Because Letizia can assume to be operating in a situation where the user has invited its assistance, its simulation of the user's intent need not be extremely accurate for it to be useful. Its guesses only need be better than no guess at all, and so even weak heuristics can be employed.
Inferences from the User's Browsing Behavior
Observation of the user's browsing behavior can tell the system much about the user's interests. Each of these heuristics is weak by itself, but each can contribute to a judgment about the document's interest.
One of the strongest behaviors is for the user to save a reference to a document, explicitly indicating interest. Following a link can indicate one of several things. First, the decision to follow a link can indicate interest in the topic of the link. However, because the user does not know what is referenced by the link at the time the decision to follow it has been made, that indication of interest is tentative, at best. If the user returns immediately without having either saved the target document, or followed further links, an indication of disinterest can be assumed. Letizia saves the user considerable time that would be wasted exploring those dead end links.
Following a link is, however, a good indicator of interest in the document containing the link. Pages that contain lots of links that the user finds worth following are interesting. Repeatedly returning to a document also connotes interest, as would spending a lot of time browsing it relative to its length.
Since there is a tendency to browse links in a top to bottom, left to right manner, a link that has been passed over can be assumed to be less interesting. A link is passed over if it remains unchosen while the user chooses other links that appear later in the document. Later choice of that link can reverse the indication.
Letizia does not have natural language understanding capability, so its content model of a document is simply as a list of keywords. Partial natural language capabilities that can extract some grammatical and semantic information quickly, even though they do not perform full natural language understanding could greatly improve its accuracy.
Letizia uses an extensible object oriented architecture to facilitate the incorporation of new heuristics to determine interest in a document, dependent on the user's actions, history, and the current interactive context as well as the content of the document.
An important aspect of Letizia's judgment of interest in a document is that it is not trying to determine some measure of how interesting the document is in the abstract, but instead, a preference ordering of interest among a set of links. If almost every link is found to have high interest, then an agent that recommends them all isn't much help, and if very few links are interesting, then the agent's recommendation isn't of much consequence. At each moment, the primary problem the user is faced with in the browser interface is which link should I choose next?, And so it is Letizia's job to recommend which of the several possibilities available is most likely to satisfy the user. Letizia sets as its goal to recommend a certain percentage settable by the user of the links currently available.
An Example
In the example, the user starts out by browsing home pages for various general topics such as artificial intelligence. The user is particularly interested in topics involving agents, so he or she zeros in on pages that treat that topic. Many pages will have the word agent in the name, the user may search for the word agent in a search engine, and so the system can infer an interest in the topic of agents from the browsing behavior.
At a later time, the user is browsing personal home pages, perhaps reached through an entirely different route. A personal home page for an author may contain a list of that author's publications. As the user is browsing through some of the publications, Letizia can concurrently be scanning the list of publications to find which ones may have relevance to a topic for which interest was previously inferred, in this case the topic Agents. Those papers in the publication list dealing with agents are suggested by Letizia.
Letizia can also explain why it has chosen that document. In many instances, this represents not the only reason for having chosen it, but it selects one of the stronger reasons to establish plausibility. In this case, it noticed a keyword from a previous exploration, and in the other case, a comparison was made to a document that also appeared in the list returned by the bibliography search.
Persistence Of Interest
One of the most compelling reasons to adopt a Letizia like agent is the phenomenon of persistence of interest. When the user indicates interest by following a link or performing a search on a keyword, their interest in that topic rarely ends with the returning of results for that particular search.
Although the user typically continues to be interested in the topic, he or she often cannot take the time to restate interest at every opportunity, when another link or search opportunity arises with the same or related subject. Thus the agent serves the role of remembering and looking out for interests that were expressed with past actions.
Persistence of interest is also valuable in capturing users preferred personal strategies for finding information. Many Web nodes have both subject oriented and person oriented indices. The Web page for a university or company department typically contains links to the major topics of the department's activity, and also links to the home pages of the department's personnel. A particular piece of work may be linked to by both the subject and the author.
Some users may habitually prefer to trace through personal links rather than subject links, because they may already have friends in the organization or in the field, or just because they may be more socially oriented in general. An agent such as Letizia picks up such preferences, through references to links labeled as people, or through noticing particular names that may appear again and again in different, though related, contexts.
Indications of interest probably ought to have a factor of decaying over time so that the agent does not get clogged with searching for interests that may indeed have fallen from the user's attention. Some actions may have been highly dependent upon the local context, and should be forgotten unless they are reinforced by more recent action. Another heuristic for forgetting is to discount suggestions that were formulated very far in distance from the present position, measured in number of web links from the original point of discovery.
Further, persistence of interest is important in uncovering serendipitous connections, which is a major goal of information browsing. While searching for one topic, one might accidentally uncover information of tremendous interest on another, seemingly unrelated, topic. This happens surprisingly often, partly because seemingly unrelated topics are often related through non obvious connections. An important role for the agent to play is in constantly being available to notice such connections and bring them to the user's attention.
Search Strategies
The interface structure of many Web browsers encourages depth first search, since every time one descends a level the choices at the next lower level are immediately displayed. One must return to the containing document to explore brother links at the same level, a two step process in the interface. When the user is exploring in a relatively undirected fashion, the tendency is to continue to explore downward links in a depth first fashion. After a while, the user finds him or herself very deep in a stack of previously chosen documents, and especially in the absence of much visual representation of the context this leads to a lost in hyperspace feeling.
The depth first orientation is unfortunate, as much information of interest to users is typically embedded rather shallowly in the Web hierarchy. Letizia compensates for this by employing a breadth first search. It achieves utility in part by reminding users of neighboring links that might escape notice. It makes user exploration more efficient by automatically hiding many of the dead-end links that waste a users time.
The depth of Letizia's search is also limited in practice by the effects of user interaction. Web pages tend to be of relatively similar size in terms of amount of text and number of links per page, and users tend to move from one Web node to another at relatively constant intervals. Each user movement immediately refocuses the search, which prevents it from getting too far a field.
The search is still potentially combinatorially explosive, so a resource limitation is placed on search activity. This limit is expressed as a maximum number of accesses to non local Web nodes per minute. After that number is reached, Letizia remains quiescent until the next user initiated interaction.
Letizia will not initiate further searches when it reaches a page that contains a search form, even though it could benefit enormously by doing so, in part because there is as yet no agreed upon Web convention for time bounding the search effort. Letizia will, however recommend that a user go to a page containing a search form.
In practice, the pacing of user interaction and Letizia's internal processing time tends to keep resource consumption manageable. Like all autonomous Web searching robots, there exists the potential for overloading the net with robot generated communication activity.
Related Work
Work on intelligent agents for information browsing is still in its infancy. Letizia differs in that it does not require the user to state a goal at the outset, instead trying to infer goals from the user's browsing behavior.
Automated Web crawlers have neither the knowledge based approach nor the interactive learning approach. They use more conventional search and indexing techniques. They tend to assume a more conventional question and answer interface mode, where the user delegates a task to the agent, and then waits for the result. They don't have any provision for making use of concurrent browsing activity or learning from the user's browsing behavior.
Machine Translation
Serious efforts to develop machine translation systems were under way soon after the ENIAC was built in 1946, and the first known public trial took place in January 1954 at Georgetown University. We've made remarkable progress in the past fifty years, but machine translation involves so many complex tasks that current systems give only a rough idea or indication of the topic and content of the source document. These systems tend to reach a quality barrier beyond which they cannot go, and work best if the subject matter is specific or restricted, free of ambiguities and straightforward, the typical example for this type of text are computer manuals. We'll need more advanced systems to handle the ambiguities and inconsistencies of the real world languages, no wonder that translation, whether performed by machine or by human, is often regarded as an art discipline, and not as an exact science.
Today we have an entire range of translation methods with varying degrees of human involvement. Two extreme ends of the translation spectrum are fully automatic high quality translation which has no human involvement and traditional human translation which has no machine translation input. Human aided machine translation and machine aided human translation lie in between these extremities, with human aided machine translation being primarily a machine translation system that requires human aid, whereas machine aided human translation is a human translation method that is utilizing machine translation as an aid or tool for translation. The term Computer Assisted Translation is often used to represent both human aided machine translation and machine aided human translation.
So, what's so hard about machine translation? There's no such thing as a perfect translation, even when performed by a human expert. A variety of approaches to machine translation exist today, with direct translation being the oldest and most basic one of them. It translates texts by replacing source language words with target language words, with the amount of analysis varying from system to system. Typically it would contain the correspondence lexicon, lists of source language patterns and phrases and mappings to their target language equivalents. The quality of the translated text will vary depending on a size of the system's lexicon and on how smart the replacement strategies are. The main problem with this approach is its lack of contextual accuracy and the inability to capture the real meaning of the source text.
Going a step further, syntactic transfer systems use software parsers to analyze the source-language sentences and apply linguistic and lexical rules (or transfer rules) to rewrite the original parse tree so it obeys the syntax of the target language. On the other hand, interlingual systems translate texts using a central data representation notation called an interlingua. This representation is neutral to any languages in the system and breaks the direct relationship that a bilingual dictionary approach would have. Statistical systems use standard methods for translation, but their correspondence lexicon are constructed automatically, using advanced alignment algorithms from a large amount of text for each language, usually available in online databases.
Neural Networks
Artificial neural networks (ANNs) are computational paradigms which implement simplified models of their biological counterparts, biological neural networks. Biological neural networks are the local assemblages of neurons and their dendritic connections that form the (human) brain. Accordingly, artificial neural networks are characterized by
Local processing in artificial neurons (processing elements) Massively parallel processing, implemented by rich connection pattern between processing elements
The ability to acquire knowledge via learning from experience
Knowledge storage in distributed memory, the synaptic processing elements connections
The attempt of implementing neural networks for brain like computations like patterns recognition, decisions making, motor control and many others is made possible by the advent of large scale computers in the late 1950's. Indeed artificial neural networks can be viewed as a major new approach to computational methodology since the introduction of digital computers.
Although the initial intent of artificial neural networks was to explore and reproduce human information processing tasks such as speech, vision, and knowledge processing, artificial neural networks also demonstrated their superior capability for classification and function approximation problems. This has great potential for solving complex problems such as systems control, data compression, optimizations problems, pattern recognition, and system identification
Artificial neural networks were originally developed as tools for the exploration and reproduction of human information processing tasks such as speech, vision, touch, knowledge processing and motor control. Today, most research is directed towards the development of artificial neural networks for applications such as data compression, optimization, pattern matching, system modeling, function approximation, and control. One of the application areas to which artificial neural networks are applied is flight control. Artificial neural networks give control systems a variety of advanced capabilities.
Since artificial neural networks are highly parallel systems, conventional computers are unsuited for neural network algorithms. Special purpose computational hardware has been constructed to efficiently implement artificial neural networks. Accurate Automation has developed a Neural Network Processor. This hardware will allow us to run even the most complex neural networks in real time. The neural network processor is capable of multiprocessor operation in Multiple Instruction Multiple Data (MIMD) fashion. It is the most advanced digital neural network hardware in existence. Each neural network processor system is capable of implementing 8000 neurons with 32,000 interconnections per processor. The computational capability of a single processor 140 million connections per second. An 8 processor neural network processor would be capable of over one billion connections per second. The neural network processor architecture is extremely flexible and any neuron is capable of interconnecting with other neuron in the system.
Conventional computers rely on programs that solve a problem using a predetermined series of steps, called algorithms. These programs are controlled by a single, complex central processing unit, and store information at specific locations in memory. Artificial neural networks use highly distributed representations and transformations that operate in parallel, have distributed control through many highly interconnected neurons, and store their information in variable strength connections called synapses.
There are many different ways in which people refer to the same type of neural networks technology. Neural networks are described as connectionist systems, because of the connections between individual processing nodes. They are sometimes called adaptive systems, because the values of these connections can change so that the neural network performs more effectively. They are also sometimes called parallel distributed processing systems, which emphasise the way in which the many nodes or neurons in a neural network operate in parallel. The theory that inspires neural network systems is drawn from many disciplines, primarily from neuroscience, engineering, and computer science, but also from psychology, mathematics, physics, and linguistics. These sciences are working toward the common goal of building intelligent systems.
Scene Based Graphics Rendering
In the pursuit of photo realism in conventional polygon based computer graphics, models have become so complex that most of the polygons are smaller than one pixel in the final image. At the same time, graphics hardware systems at the very high end are becoming capable of rendering, at interactive rates, nearly as many triangles per frame as there are pixels on the screen. Formerly, when models were simple and the triangle primitives were large, the ability to specify large, connected regions with only three points was a considerable efficiency in storage and computation. Now that models contain nearly as many primitives as pixels in the final image, we should rethink the use of geometric primitives to describe complex environments.
An alternative approach is being investigated that represents complex 3D environments with sets of images. These images include information describing the depth of each pixel along with the colour and other properties. Algorithms have been developed for processing these depth enhanced images to produce new images from viewpoints that were not included in the original image set. Thus, using a finite set of source images, it is now possible to produce new images from arbitrary viewpoints.
The potential impact of using images to represent complex 3D environments includes:
Naturally photo-realistic rendering, because the source data are photos. This will allow immersive 3D environments to be constructed for real places, enabling a new class of applications in entertainment, virtual tourism, telemedicine, telecollaboration, and teleoperation.
Computation proportional to the number of output pixels rather than to the number of geometric primitives as in conventional graphics. This should allow implementation of systems that produce high quality, 3D imagery with much less hardware than used in the current high performance graphics systems.
A hybrid with a conventional graphics system. A process called post rendering warping allows the rendering rate and latency to be decoupled from the user's changing viewpoint. Just as the frame buffer decoupled screen refresh from image update, post-rendering warping decouples image update from viewpoint update. This approach will enable immersive 3D systems to be implemented over long distance networks and broadcast media , using inexpensive image wrappers to interface to the network and to increase interactivity.
Design of current graphics hardware has been driven entirely by the processing demands of conventional triangle based graphics. It is possible that very simple hardware may allow for real-time rendering using this new paradigm. It should be possible to write software only implementations that produce photo realistic renderings at much higher rates than with conventional graphics primitives.
The 3D Graphical User Interface
The interface environment is a fashionable phrase right now, but it's worth pointing out that we're really talking about two types of environments here: one is the metaphoric 3D space conjured up on the screen, with its avatars and texture mapping, the other is the real estate of real offices, populated by real people.
Is a VRML style 3D interface likely to improve any day to day productivity applications? Aren't there cases where 3D metaphors confuse more than they clarify? If so, will future users feel comfortable hopping back and forth between 2D and 3D environments over the course of an average workday? Are DOOM and Quake the killer apps of the 3D interface, or is there something more enriching lurking around the corner?
Most new technologies start out declared as the answer to every problem. Then, people find out what pieces of the world's problem they actually solve and they quietly take their place among the other solutions. For example, microwave ovens were heralded as replacements for every appliance in your kitchen.
Instead, microwave ovens became yet another tool in the kitchen. Microwave ovens became indispensable as a minor adjunct to primary cooking processes taking place somewhere else.
Similarly, 3D, along with handwriting, gesture, and voice recognition, will take its place as one technology among many, there to perform whatever large or small measure of an overall activity may be needed.
An area of 3D that has not yet been fully appreciated is the illusion of depth that can be generated on a normal, 2D screen. Xerox PARC has a cone tree allowing rapid document retrieval. MIT Media Lab has interesting infinite spaces within a finite boundary 3D worlds for document retrieval.
3D on screens will not work well until we drastically improve resolution. Sun Microsystems has built a 3D camera that can feed onto a 2D screen. Users wearing shutter glasses would then see a 3D image that would move vantage point as the user moved in space before the screen.
The only way we can achieve enough depth on our display to enable users to explore the depths of a 3D image, with or without shutter glasses, is to drastically increase the resolution of our screens. 300 dpi is a start. 2400 to 4800 is what is needed.
Which brings us to VRML, it doesn't work, and it won't work until the resolution is drastically improved. Sun Microsystems has built a VR room with three high resolution projectors throwing millions of pixels on three screens that wrap around you. The resolution is close to approaching adequate. The resolution in head mounted displays is very poor, and it will be for a long time to come.
People flipping back and forth between 2D and 3D graphic designs on 2D screens will occur not over the course of an average working day, but over the course of an average working minute. People will also adopt and adapt to simple hand held 3D devices, even if they must wear light weight shutter glasses to use them. People will not walk around the office wearing head mounted displays. Mainstream VRML is a long way away.
When does 3D fail us? When it's misused to represent documents. It's understandable that we'd first want to use 3D as a front end to our document spaces, after all, document spaces are the bedrock of the 2D GUI's. But try reading a textual document in a 3D space, it's almost impossible. You need to face it head on, get close enough, and probably pitch your head up and down to get all the text. That is the misuse of a powerful tool, trying to adapt it backward into a preexisting paradigm, rather than jumping forward into a new one.
On the other hand, if you're trying to understand the relations between a set of documents, considering such metadata as age, size, creator, geographical region of interest, language, specific subject being covered you can see that very soon we'd exhaust the limits of representability within a 2D GUI, and would have to resort to some pretty fancy tricks used by both the Windows 95 and Macintosh desktops to tame this chaotic sea of metadata complexity. Already the desktop metaphor is breaking down, precisely because it was designed around a document centric view of the world.
But the World Wide Web isn't just a swirling sea of documents. It's an active space which conforms itself to requests made of it, a far cry from a database of documents. The Internet is not a plug in. And the desktop is not reaching out to the Internet, rather, the Internet is reaching out to the desktop. But the desktop can't cope with the scope of this transformation, furthermore it's unrealistic to think that it should. In fact the Internet confronts us with a riot of information, and right now 3D interfaces are the best hope for interfaces which can "tame" the information overload.
It is necessary to realize that the navigation of VRML worlds is moving us in a direction, based on a need for different points of view of solid spaces and objects. It seems that from the cognitive research, people use the extra z axis, forward and back, for remembering where objects are placed around their work surfaces, as well as in x laterally and for a limited range of y. As we walk around our office environments we use the z dimension to move into additional spaces, but there is little relative change in our y axis knowledge (height). What is critical is to limit the users options somehow to avoid additional axes of control confusion, encountered in real flight from six degrees of control. We still need to do some quality perceptual cognitive research in this domain. The ability to provide true 3-D or stereo perspectives is still computationally and hardware limiting.
We will also see more interfaces use concepts such as layering or overlapping transparency, which has been termed 2.5 D, as seen in MITs spatial data based management system in the 70s. Current work at the Media Lab in the Visual Language workshop has started to show great promise in applications of transparency and focus pull for navigating libraries of data sets. The smoothness and speed of response of these systems shows the hope for systems such as VRML, when we have more computational power available on set top boxes. Clearly the use of 3D is much clearer when we know there is some inherent 3Dness to the data. Exploring geographical data is much easier given the known 3D world metaphor, moving over body data is made clear when using a real 3D body, and then there is the presentation of 3-n D data across time. Animation is a useful tool to show data relationships, but additional axis of display representation can facilitate the understanding of their interrelationships too. It depends on the types of data being displayed, and the users task as to whether additional axes show any real benefit in performance or learning.
Applications that make the best use of 3D need more time to develop. At the moment there is very limited special purpose uses only for 3D. Sometimes designers use 3D either for feature appeal, or as an excuse for poor design when they cannot think of a 2D solution. With a new generation of game playing kids, the controls over 3D spaces will possibly be needed to keep interest high in work applications. The limitation of the flat existing metaphors is somewhat constrained by the current I/O devices. The pervasiveness of the keyboard will be limited to work processing activities, and will change when we think of a computer as being more than a heavy box shaped screen that we sit at everyday. The keyboard does not have a place in the living room, which will make it take alternate form factors along with the navigational control over n dimensional custom spaces.
Manipulation of 3D spaces and 3D objects isn't easy with just a mouse and keyboard, and will need new hardware inputs to become commonplace before people can effortlessly control themselves and objects in 3D spaces.
Certainly, though, 3D offers a tremendous new opportunity for the creation of entirely new editorial content, and so it will become prevalent in the environment quickly. As the obvious follow on to integrating linear textual works and planar graphical pieces of art we'll create whole populated worlds. 3D objects will be widely used within the growing desktop metaphor given a great rendering engine, we can create super rich entities that are cheap to download because the author just provides the instructions to create them.
Electronic Digital Paper
Xerox Corporation has announced that it has selected 3M as the manufacturer to bring to market its Electronic Paper, a digital document display with the portability of a plain sheet of paper.
Developed at the Xerox Palo Alto Research Center (PARC), electronic paper represents a new kind of display, falling somewhere between the centuries old technology of paper and a conventional computer screen. Like paper, it is user friendly, thin, lightweight and flexible. But like a computer display, it is also dynamic and rewritable. This combination of properties makes it suitable for a wide range of potential applications, including:
Electronic paper newspapers offering breaking news, incoming sports scores, and up to the minute stock quotes, even as the paper is being read.
Electronic paper magazines that continually update with breaking information and make use of animated images or moving pictures to bring stories to life. Electronic paper textbooks, which could amalgamate a number of textbooks into one book, allowing students to thumb through the pages, scan the information and mark up pages as they would a regular book.
Electronic paper displays in the form of wall size electronic whiteboards, billboards and portable, fold up displays. The technology, supported by a portfolio of Xerox patents, has been prototyped at PARC on a limited scale. Xerox' collaboration with 3M establishes a means by which the electronic paper material, essentially the paper pulp of the future can be manufactured in the volumes necessary to meet market demands and make the development of a wide range of supporting applications commercially viable.
In moving from the research laboratory to licensed manufacturing, electronic paper is taking its first step to the commercial market. It will not be long before a single renewable sheet of electronic paper offers a never ending parade of news and information.
How it works
Electronic paper utilises a new display technology called a gyricon, invented by Xerox. A gyricon sheet is a thin layer of transparent plastic in which millions of small beads, somewhat like toner particles, are randomly dispersed. The beads, each contained in an oil-filled cavity, are free to rotate within those cavities. The beads are bichromal, with hemispheres of contrasting colour (e.g. black and white), and charged so they exhibit an electrical dipole.
Under the influence of a voltage applied to the surface of the sheet, the beads rotate to present one colored side or the other to the viewer. A pattern of voltages can be applied to the surface in a bit wise fashion to create images such as text and pictures. The image will persist until new voltage patterns are applied to create new images.
There are many ways an image can be created in electronic paper. For example, sheets can be fed into printer like devices that will erase old images and create new images. Used in these devices, the electronic paper behaves like an infinitely reusable paper substitute.
Although projected to cost somewhat more than a normal piece of paper, a sheet of electronic paper could be reused thousands of times. Printer like devices can be made so compact and inexpensive that you can imagine carrying one in a purse or briefcase at all times. One such envisioned device, called a wand, can be pulled across a sheet of electronic paper by hand to create an image. With a built in input scanner, this wand becomes a hand-operated multi function device, a printer, copier, fax, and scanner all in one.
For applications requiring more rapid and direct electronic update, the gyricon material might be packaged with a simple electrode structure on the surface and used more like a traditional display. An electronic paper display could be very thin and flexible. A collection of these electronic paper displays could be bound into an electronic book. With the appropriate electronics stored in the spine of the book, pages could be updated at will to display different content.
For portable applications, an active matrix array may be used to rapidly update a partial or full page display, much like is used in today's portable devices. The lack of a backlight and eliminated requirement to refresh the display (since it is bistable), along with improved brightness compared to today's reflective displays, will lead to utilisation in lightweight and lower power applications.
Xerox has had significant activity in developing this technology for some time. Although not yet perfected, the technology is currently at the state where it is suitable for development for the first set of applications. They are currently engaging partners in both manufacturing and application areas and see a bright future for this technology.
Solid State Storage Technologies
Omni Dimensional Systems plans to create a 2 Gigabyte solid state memory by integrating thin film transistors and diodes onto a substrate that is formed from the flexible foil used to store information optically on CD-ROM. The intent is to substitute thin film electronics for the slow and unreliable mechanical parts used in optical drives, enabling subsystems that promise virtually instantaneous access to very large databases.
The company is combining the solid state memory with a special encoding technique that it says can pack three times the normal amount of information into a given space. The company uses the basic data encoding and compression scheme, called autosophy, in its data communications products.
What they've done is marry two mature technologies to fill a need for cheap, large associative memories. Autosophy can do the same thing for rewritable optical memories, by using a secondary grid.
CD-ROM and similar rewritable optical media modify the surface of a thin sheet of foil with various aberrations, which are normally sensed by a photodiode's picking up light from a laser diode on the read head. In the Omni Dimensional approach, the read head is replaced with an array of integrated thin film transistors and diodes of the kind used in active matrix liquid crystal displays (LCD). Autosophy encodings simplify the reading electronics by ensuring that only one output row activates at a time.
The company believes the associative operation of the memory will enable autosophy theory to expand from the RAM based telecommunications code it is today to become a mainstream solid state memory technology.
Autosophy theory enables graphical data to be compressed but requires the use of associative memories to do real time lookups of dictionary entries. The process is simpler for serial telecommunications, because the bit stream is artificially divided into 8 bit characters (plus start and stop bits), which can be kept in a dynamically maintained library in RAM.
For instance, autosophy as used in telecommunications only transmits a full set of information once. The second time, only the address of the information is transmitted. But with graphical data which is two dimensional and not neatly divided into characters, autosophy needs associative memories to perform real time lookup in a dictionary of pieces of the image.
As in telecommunications, the first time an image is sent there is no savings, but the second time only the addresses of the tiles from which it is made need be sent. With premade associative ROM's installed in TVs, perfect error corrected digital HDTV sized images could be sent over ordinary TV channels.
Autosophy permits you to build enormous systems. The larger the memory, the larger the compression with autosophy, even though every item can still be retrieved in nanoseconds. You can play it forward or backward, skip around or just let the memory act like a normal RAM.
The normal RAM mode divides the 64 inputs and outputs from the associative ROM into address and data lines. Then the 32 bit address can be input and 32 bit data retrieved from the ROM merely by ignoring the input data lines and the output address lines. Because of the associative operation, data can be entered, and the memory will retrieve its address.
The autosophy algorithm also enables the memory technology to map automatically around defects in the foil, just as it error corrects in telecommunications systems. The first level of the dictionary comprises the letters of the alphabet, but has expanded the entries in the dictionaries beyond serial data to parallel data with live video.
HTTP - The Next Generation
The World Wide Web is a tremendous and growing success and HTTP has been at the core of this success as the primary substrate for exchanging information on the Web. However HTTP 1.1 is becoming strained modularity wise as well as performance wise and those problems are to be addressed by HTTP-NG.
Modularity is an important kind of simplicity, and HTTP 1.x isn't very modular. If we look carefully at HTTP 1.x, we can see it addresses three layers of concerns, but in a way that does not cleanly separate those layers: message transport, general-purpose remote method invocation, and a particular set of methods historically focused on document processing (broadly construed to include things like forms processing and searching).
The lack of modularity makes the specification and evolution of HTTP more difficult than necessary and also causes problems for other applications. Applications are being layered on top of HTTP, and these applications are thus forced to include a lot of HTTPS design, whether this is technically ideal or not. Furthermore, to avoid some of the problems associated with layering on top of HTTP, other applications start by cloning a subset of HTTP and layering on top of that.
The HTTP-NG protocol is a new architecture for the Web infrastructure based on a layered approach where HTTP is split up in layers as depicted in the diagram below:
Multiple data streams
The single HTTP-NG connection is divided up into multiple virtual sessions, each of which can carry a requested object in parallel. These are asynchronous, so a client can fire off multiple requests without waiting for the request to be acknowledged, let alone wait for the object to be sent.
There is a dedicated control session, similar in application to the separate control connection in the FTP protocol, which is used to send all the requests and to receive meta information (author, copyright details, costs, or redirection requests to use a special external connection, for example for live video over an ATM link).
Binary protocol
HTTP is a text based protocol. This makes it easy to debug. However all those text headers means that there is considerable overhead when transporting small objects around. The transport is 8 bit capable, so it could cope with binary data and indeed does so for the object bodies. Only the headers are text based.
HTTP-NG uses a binary protocol, encoded in ASN.1 and made even more compact by using Packed Encoding Rules (PER). What this means is that a typical request is very small, but looks like random garbage until you decode it.
Authentication and charging
Each individual message within an HTTP-NG connection can be authenticated. The security method is not part of the protocol, so any method that both the client and the server support can be used, individual messages can use different authentication schemes if needed. The encrypted data can be securely carried across untrusted intermediate proxies.
Related to the challenge response model of authentication, a server can notify a client that a requested service will incur a fee, sending the cost and a list of acceptable payment methods.
All initial HTTP-NG requests are also valid HTTP 1.0 requests, but use an invalid access method. This means that an HTTP-NG enabled client can initiate a request with a server and continue it with HTTP-NG (if that is supported) or receive an error response and try again with HTTP 1.0 on the same port.
- COLLECTIVE CONTRIBUTION BY STUDENTS OF I-YR MSC.IT
Comments