Technology Behind CofaxLast Updated: 2002/Jan/21
The Cofax Framework
Cofax has an modular, object-oriented application design. Each module is independent from the next, and is made up of components that are replaceable so you can apply a best-of-breed or customized approach as necessary.
Cofax gives you a solution that allows you the freedom to pick and choose the modules that best fit your infrastructure and plans for the future. For example - utilizing the best database for your organization.
The framework is the building blocks and core of the it's design. It's been built to be flexible, extensible and modular to enable you to change, add, or delete functionality as you require.
It comprises of four main tiers:
The Cofax CFS - Cofax Feed System
Designed with flexibility and simplicity in mind, the Cofax Feed System utilizes a series of Java classes to import a by design simple XML format into the Cofax Data Warehouse, from newsroom systems.
In a newsroom environments, all the information for an article is often not sent in one batch. The basic article with basic meta data from the publishing system may come from one source in a batch feed or in real time. The very simple article dtd is used for this since most newsprint publishing systems have only this information.
Section/Mapping meta-data may come from yet another source at another time. For example, from an automatic AI categorization engine like autonomy which may take a while to process the feed. In many cases, we want the newspapers to post online before this processing is completed. This information can be added to the Cofax data store when it is ready.
In addition, other information may have been entered by Editors the night before or later on. This information comes in another batch.
Cofax automatically does versioning if an article is inserted twice into the system. Once Cofax has imported the data, it is immediately available for delivery to a variety of clients ranging from web browsers via a web application server to export feeds to other systems.
This system, by design, is a fully seperate tier/component and can be modifed to accept different import sources quickly and easily without modifying the other tiers.
One requirement of this system is that data be pre-formatted to the simple XML format prior to import. Most newsrooms utilized Perl scripts to do this. The new second generation content management system utilized by Knight Ridder, uses a tool named XMultra to solve this problem.
The Cofax CDW - Cofax Data Warehouse
The Data Warehouse is a fully seperate tier that exposes it's own API to the other tiers. This allows for a high degree of maintenance and extensibility freedom without having to modify source code in the other tiers. This is largely thru the defining of 'packageTags' - data sets used by editors and designers - done entirely within the database layer.
Currently there are two data warehouse implementations: Microsoft SQL Server and MySQL. But by design, the CDW can be ported to any number of systems from almost any vendor. Oracle, Sybase, Object Store, XML/flat files for example. For the system administrator, switching between database implementations is as simple as modifying the configuation of the system.
While designed primarily to store and manage newspaper content, the schema of the CDW has proven to be simple, yet flexible, to handling the demanding needs of a variety of uses.
The Cofax CMS - Content Management System
Editorial tools can be built using any data store aware environment. ASP, JSP, Servlets can all be employed. Editor's tools are server based and are interfaced through a web browser.
First versions of the editorial tools were actually deployed in ASP. The newest versions of the editorial toolset are developed in Java utilizing Servlets and JSP, taking advantage of classes developed for the CDS.
The Cofax CDS - Content Display System
The CDS is a strict implementation of the Model-View-Controler design pattern. Design, Logic and Content are seperate and each can be modified without needing to change the the other two.
Design layouts are developed the usual way, designers working with HTML templates with a simple markup language that requires minimal training.
The controler servlet utilizes caching and database pooling to keep performance at acceptable levels.
The CDS is designed in a modular fashion and utilizing different templating languages, database pooling, and caching is easy and in most cases does not require changes to code.
The Cofax Java Framework
Across the tiers a shared library of Java classes are utilized that make up the Cofax API.
The API allows any developer to add functionality to Cofax without having to modify the base source code. A vendor can write all sorts of plug-in modules to add new features to Cofax.
This API includes, but is not limited to, the following:
Template Loading Classes
Templates can be used from various sources. By abstracting out this process, a Template Loading Class can be implemented to load a from a web server across the internet, a SQL Server you have access to, or in our default implementation, from your file system.
Template Application Classes
These parse and apply templates to content. Different template classes can be used to work with different types of templates. For example, we currently use our WSIWYG Templates. Our WSIWYG Templates are an easy way use HTML files marked up with special comment tags. Using WSIWYG Templates, a designer can "view source" a live page, static or dynamic, make changes, ensure that the comment tags are in place and the template is immediately usable. Another class can use XSL templates instead. Other classes could be easily written to use other conventions/systems for templates. Our WYSIWYG Template system also works with templates that are applied in a hierarchical structure. For example, a master template is applied to all Inquirer stories if no section-specific templates are found. If a story-specific template is found, then it overrides the section template. This makes the system very flexible. A designer can either apply one template for everything, or use specific templates for section (i.e. sports) and even individual items. In addition different hierarchies of templates can be applied to the same content set for different sites! (In Rajiv's bias opinion, no other system in the market has these features.)
Data Store Handling Classes
These can load and parse data from various sources and create a content object. One class is able to load the content from XML files. Another class is able to load the content from Cofax' s SQL database. Other custom classes could be written to load content directly from the back-end newsprint-publishing system's export data or by directly querying the newsprint system's database. One of the features of these classes is that they can be used to convert/transfer content from one source to another. Any pair of these classes serves as a conversion-filter program. Another advantage is that the Cofax System is not tied to any one particular type of backend system. It can already use SQL, ODI, flat-file-XML as the data store. Other classes can be written to work with other kinds of data stores. If computer-resources permit, Cofax can even use some newsprint publishing systems as the data store.
One of the test modules/classes, which was developed for Cofax could load data from MediaStream's remote web server. Such a class with remote content wrapping and caching technology can even enable Cofax to use a remote web site as a data store.
The data store has been carefully designed. It has several architectural advantages. Here is an example:
Cofax stores where an article goes in the sections/mappings container (articles to sections/mapping table in case of an SQL data store) and not in the articles container (articles table in case of an SQL data store).
A sections/mapping entry describes which section/channel page the article will show up in (the mapping code), and where on that page the article will show (the position/ranking code), and from when to when the article will show up there (the start and end dates). Using this scheme, Cofax is able to make an article appear on different pages at different places at different times. Since the position/ranking code is specific to the section/mapping page, it can mean placement on the top cell on the home page or show at the bottom on another given page.
These provide a wrapper and to the Cofax Functionality and the interface to the presentation system in use. We currently have two versions: Java Servlets that will work on any Servlet enabled web server. COM objects for use from ASP. Other classes could be written to work with other front-ends.
View GIF image Download MS Visio format
View MS PowerPoint format
View GIF image Download MS Visio format
View GIF image | Download MS Visio format
View GIF image | Download MS Visio format
(Diagram by Jeff Spicher Team Leader, Product Development InfiNet)
View GIF image | Download MS PowerPoint format
What are the technological advantages of Cofax?
"If you persuade, speak of interest, not reason." -- Benjamin Franklin
For the tech community, here is why Cofax is nice from a technological viewpoint.
Configurable URLs - Great for search engines and bookmarking!
Cofax was designed so that it can be implemented at any existing Web site without changing any existing URLs, thus preventing confusion and lost traffic. Once a reliable feed is established from the content source, using Cofax is easy. And because the documents appear to be flat HTML pages (even though they are not), Cofax will work seamless with existing Web traffic usage software. The configurable URLs feature allows a Cofax driven site to look just like a static html web site with folders and files to the outside.
Modular DesignCofax was designed keeping best of breed philosophy in mind. It is a framwork which, at every level, components can be added, modified, and replaced with little effort.
Development was done using open industry standards, which are widely known and used. The system is not based on properitary technologies or dependent upon the original developers.
For example, Cofax, as implemented in Philadelphia, utilizes a SQL Data Store class that allows you to use any suitable database with a JDBC driver. But that is not where it ends. The SQL Data Store class itself can be replaced with an XML Data Store, ODI Data Store, or file system based Data Store. Alternate classes specified in the configuration and dynamically loaded.
Scalable SystemAdding system resources can enhance Cofax's performance. More storage will allow content to be archived longer. Faster or multiple machines with more processors and memory serve content faster. It can be run one PC for a small site or distributed across a few dozen dedicated servers for a large high traffic site.
PortableCan be easily installed at any hosting provider. Cofax can be used on any server that supports Java. The preferred way to use the Cofax modules is via a Java Servlet, but they can be called as ASP/COM compopents from Active Server Pages.
When using an SQL database as the data store, a JDBC driver is required for the database which are available from various vendors for almost every database now.
For KnightRidder.com sites hosted at InfiNet, Cofax can be easily installed at InfiNet.
Saves System Resources
Cofax saves system resources by storing articles in only one location. In previous static systems articles would have to be copied into multiple locations in order to be part of more than one section or special package.
A sections/mapping entry describes which section/channel page the article will show up in (the mapping code), and where on that page the article will show (the position/ranking code), and from when to when the article will show up there (the start and end dates). Using this scheme, Cofax is able to make an article appear on different pages at different places at different times! That's great flexibility.
Speed of Delivery
The module to cache frequently requested pages in memory or disk the delivery of dynamic articles is as fast as that of static html pages.
Editor's tools in use at Philly.com integrate with the network security.
Cofax was developed with Java
Cofax's modular design is such that it could be written in many object oriented languages such as Python. Click here to read why we chose Java.
CofaxServlet - a presentation class for web delivery
CofaxServlet utilizes the Cofax framework to present content. It can be replaced with programs in other environments that can take advantage of objects written in Java, for example, Microsoft's Active Server Pages.
Performs delivery of content, dictated by the request, from cache if available.
Objects accessed beyond configured counts are cached and refreshed in CofaxCache at configured intervals or when changed by editors tools. If a content object to return is cached, it is returned immediately and processing ends here.
Performs delivery of request dynamically.
When objects are first requested, or force-refreshed, or expired, from cache, objects are assembled from the request and retured.
Builds and utilizes a glossary.
A glossary is a user accessable hashtable of key/value pairs. A variable space. The glossary values are used throughout the course of content assembly and can be modified/added to within the template, either manually or through packageTag requests, via QueryString, Form values, and the requesting URL. Variable names are prefixed with a namespace designating their originator. For example, glossary values added specifically from the request are prefixed with "request:".
Some example glossary entries:
request:section = sports
Picks a template from template cache or dynamically.
Instantiates a TemplateProcessor that contains logic on template parsing and picking. Instantiates a DataStore that provides processing for packageTags and packageTags.
Templates are cached in CofaxServlet when first used. Cached templates are refreshed if changed.
When loading a new template, TemplateProcessor is given the request data and determines which template to use and loads it.
Adds to or modifies the glossary from template packageTag requests.
TemplateProcessor collects packageTags from the template, and using DataStore, adds to or modifies glossary entries. packageTags are pre-defined DataStore requests that return a single hashtable of key/value pairs. If multiple rows are returned from the data store, all except the last row are ignored. packageTags are NOT defined within the DataStore object, but with the actual data store itself. This provides complete abstraction from the program. Someone familiar with the data store can add new packageTags quickly and easily, in the context of that data store. The glossary can be used as parameters in packageTag requests. The action is sequential. An earlier packageTag request can pull data that a later packageTag request requires.
The designer can override or add new glossary entries at this point by adding the tag name and value within a packageTag as a parameter.
Parses the template with glossary data.
In accordance with the patterns defined in the TemplateProcessor, glossary data is parsed into the template where the designer has put the proper tags.
Collects and processes packageTags from template requests.
TemplateProcessor collects packageTags from the template, and using DataStore, returns packageData. These packageTags return multiple rows and utilize a block of template designer defined formatting code for the output of those rows.
Parses the template with package data.
Utilizing the format code defined in the template's packageTag block, rows are parsed into the template in accordance with the patterns defined in the TemplateProcessor.
Caches the content if meeting configured requirements.
If the parsed content now meets configured requirements CofaxServlet will utilize CofaxCache to store the page for faster retrieval in the future.
Returns the content.