Monday, December 31, 2007

Resolution for 2008: Lets cut down on XML abuse

When XML started off it, it was envisioned to be a super set of HTML where custom tags could get created and content in custom document strutures can be circulated in the World Wide Web. I, however, believe that XML technology is being misused. It is popping up in Semantic Web, SOA, and in pure data modeling efforts. It is true that XML is human AND machine readable however computing processors are paying a heavy price parsing and ingesting the content in complex XML documents. I recently attended an Ontology conference in Maryland and the biggest theme from the conference is that there isn't enough machine power to process all of the complex business logic to make content inferences. Even though XML is human and machine readable, it is extremely verbose and it is not practically to have machine process all of the tags and get the data from a XML document. The next question to ask is: "When should you use XML?" The answer is that when the XML documents are not overly complicated which might include nested tags, and a large amount of data. If the data is quite big then other binary content should be passed around in COM objects or Enterprise Java Beans (EJB)s. Here are some arguments regarding XML
  1. XML is interoperable - True XML is interoperable for cross platform communication. However please don't send massive XML documents which can bough the system down. XML may be interoperable but it is not performance friendly.
  2. XML is a good format for web service communication - Web Services (SOAP and REST) largely deal with XML based content however Web Services are NOT reusable if the XML format is propriatory between the Service Provider and its clients. Please avoid sending large amounts of data over the wire. SOAP with attachments is a good alternative for sending large chunks of data.
  3. XML is human readable - This is true however if a developer decides to create a XML format which he only understands then the XML is unreadable and it needs to be released. For example, a developer decides a create a XML document which looks like this: Person. This XML document is a complete waste since not every human understands it. However if tag names are defined properly and their definitions captured then the XML is human reable.
  4. XML is the future in Semantic Web- Recently I came across OWL-S editor software and I tested it out. The software generates an OWL and RDF documents however these XML documents are verbose and they cannot be processed. Instead of a XML based technology, the Semantic Web should look at other options rather than just XML.
In conclusion, XML is a great technology however people tend to misuse it and then tag it as a wasted technology. XML is not going to go away so lets use it properly and take care of it.

Saturday, December 15, 2007

Spock.com: Identity Manager

You ever wonder if there are sites which manage people information which is found on the web. It is an interesting concept. If you type my name, Enoch Moses, in Google.com, you get Christian theological sites which discuss about Enoch and Elijah. You also get my entry in Linkedin.com. However Google displays a number of my blog entries, my linkedin profile, myspace page, etc., and etc. Sites like Spock.com let you manage your information on the web and this way you can define and identify which is your content, content about you or content involving you. It's a neat site. Here is my Spock page.

Enjoy!

Tuesday, December 11, 2007

Would I invest in a dot com startup?

Every Joe on the street thinks he can make a quick million or two by starting his own dot com company. Hey Chad Hurley and Steve Chen, founders of YouTube.com did it. Unfortunately there is a saturation of the web sites which promise functionality however they are dependent on your, the user's, data. The first question to ask is why would I want to put my information on a third party web site so that web site founder can make some money. It is the sad truth but most of these companies will not last long. For example let's look at various Social Networks:
  • MySpace.com - one of the first site which actually picked up in popularity. Now it is mired in mediocrity and I don't see anything new and exciting happening on the site. Did I mention that they have added Facebook like functionality.
  • Facebook.com - I have to say that I was a skeptic when I joined this site however this site offers neat functionality where the user can actually spend time on the site. I am a big fan of Facebook's scrabble application
  • High 5 - Started by an east indian and it's being marketed heavily in the east indian community.
  • Orkut - I have to say this is probably one of the worst social networks I have come across. It is probably Google's worst purchase.
  • Kadoo - A new social network site whose UI looks promising however I still don't have an incentive to join this group. I am not into propogating my identity across the internet
  • Linkedin.com - I like this website. It's a social network for your professional contacts
  • Xing.com - It is a similiar network like Linkedin however this one is popular in Europe
  • PageFlakes.com - Someone emailed me asking me to experience this site. Once again I ask the question. Why should I sign up on PageFlakes.com?
  • NING.com - This is the UBER social network where anyone can create their own social network. UI is not that great and it is meant as a research application.
After I mentioned all of these social networks, how would a investor invest in these kind of businesses? Frankly everyone of these social networks offer similiar if not identical functionaliy. I would probably want to invest in networks which have alot of users and the network has a niche like Linkedin and Xing which only work with professional social networks.

Friday, December 7, 2007

Items of poor design...

For the last six months or so, I have been in involved in a project where is the system has been growing organically. It has been growing like fungus. Fungus is a unique living organism. It does not have a head, legs, hands or a body. It just exists and it keeps evolving as long as there are enough nutrients and dampness. The system I have been working on is an IT fungus. The system has evolved over the last four years or so. Developers come and go but this system still exists. This system has no requirements documents, design documents, and no test plans. The software is poorly documented and I wonder how this system still exists. Well it does exist and it seems to be growing larger and larger. The system is composed of subsystems which have evidence that developers have tried to improve the system but they have miserably failed. The failed implementations were not cleaned and they tend to future the process this system's evolution. This system is a J2EE system and you will notice the following things in this system:
  • 1500 to 7000 line JSP page which have scriplets embedded in javascript which inturn invokes JDBC calls
  • Partially implemented hibernate framework. This is evident with *.hbm.xml files in the source code
  • Numerous properties files which are now neatly packaged in a oracle database
  • Spring Web Flow - The developer who implemented this portion did a great job.
  • Prototype Ajax
  • JSP pages with scriptlets and jstls
  • Same piece of code in numerous web apps which have been customized for each web app
  • Field level filtering in the database for each user (not role but user)
  • User is authenticated between each web-app even though each web-app is part of the bigger system.
  • Custom API integration for each COTS product.
  • Outdated Stored Procedures in the database which were not used anymore.
A few weeks ago, I wanted to write junit unit-tests for some of the "cleaned up" classes and I failed miserably since all of the code is tightly coupled and it follows the onion architecture. In an onion architecture, you never know what you are going to get under each peel. The point of this blog entry is to remind every IT personnel that systems are not organic beings but they are rather simple business logic processors. In the SDLC, a good design, which has been reviewed and analyzed, clean implementation and robust test cases are the basic ingredients for a good system.
(PIC of a fungus called WitchButter)

Monday, December 3, 2007

Book Review for Mike Daconta's new book

Couple months ago, I agreed to review my esteemed colleague Michael (Mike) Daconta's new book Information As Product. I liked Mike Daconta's previous book called The Semantic Web. The Semantic Web was a great book and it basically laid out the history of the semantic web, described the benefits of semantic Web and described the vision of the semantic Web. This book was my passport into the world of ontologies, data modeling and understanding the importance of data. I have also had the privilege of working with Mr. Daconta on couple of DHS projects and I believe he is truly a visionary in data management.

The biggest knocks against Mike Daconta in the industry is that he is a "dreamer" and he has not been able to deliver his dreams into substance. I believe Mike is a visionary and people like him are essential in IT innovation. He offers ideas which address actual business problems and it is up to engineers to formulate the ideas into reality.

His new book Information As Product follows this pattern. In the book, Mike offers a solution of producing Information as a product which comes out of a "Information" factory line and it is appropriate information for the appropriate person and it is delivered in the appropriate time. The book presents a general solution to a problem plaguing various enterprises. The problem is that there is a temporal and semantic gap between information consumers and producers. The book does a great job of describing the concepts involved with this idea however no system analyst and architect can decompose this book into functional and non-functional requirements to build a system which will make concepts in this book a reality. Mike states in his book that the book Information As Product is the first book in a series of books which will engage its readers in a dialogue on how information systems can be improved.

The things I liked in the book are:
  • easy to read
  • presenting the ideal information management system as a factory line where information can be packaged in a package
  • I loved the way he describes packaging up the information. I believe this idea can implemented
I wish he expanded these concepts better in this book:
  • the importance of metadata, selecting the right metadata and consequences of poor metadata management
  • The DIKW (Data-Information-Knowledge-Wisdom) pyramid - It is only a conceptual model. I would love to see how metadata fits in this pyramid.
  • He lost me in couple of parts otherwise it is not a bad book.
In summary, if you want to build a system from this book then I recommend that you don't buy this book. If, on the other hand, you are looking at data management solutions for your enterprise then this is a great book since enterprise level functional requirements can be derived from this book. I, personally, enjoyed reading the book however I was left with more questions than answers.
Links to buy the Information As Product book.

Thursday, November 29, 2007

Day 2 at the Ontology Conference

My second day at the ontology conference was quite good. I impressed by the various teams which presented their papers on they were designing and implementing ontology based systems. The biggest theme from the conference was that the current technology and lack of defined ontology methodologies was the biggest drawback in this field. Most of the applications are prototypes and they are extremely slow in processing decent sized ontology. I heard talks about reasoners, owl, geo-spatial ontologies, multi-order logic processors, ontologies in graph databases, etc., etc. However the applications which were using these technologies were prototypes. The other problem was that if ontologies were not built correctly then the results were hideously wrong. Everyone in the conference agreed that ontologies and their applications are still new in the field of IT however the promise of ontologies and their applications is so great that large organizations keep funding R&D in the field. I personally enjoyed my time at the conference since it was good to see other data lovers and people who understood the value of data in any IT enterprise. I will probably go again next year. Here is a list of products which were mentioned in the seminar.
  • Knoodl.com - A semantic wiki. It creates ontologies from the wiki entries or uses uploaded ontologies in categorizing wiki entries.
  • VideoQuest - This product searches entities in a video. For example, if the user typed in the query "white car in saint louis" then the result set would include videos which have a white car in Saint Louis. The backend of this product is based off ontologies.
  • Poised For Learning - Rensselaer Polytechnic University's Rensselaer Artificial Intelligence and Reasoning (RAIR) Laboratory's Ontology product which is based reasoners. I was very impressed with this research.
As you can see there weren't many products since this area is still new. That's all for now.

Wednesday, November 28, 2007

Day 1 at the Ontology Conference

Today I went to an Ontology conference which was sponsored by the National Center for Ontological Research (NCOR). I heard talks from various vendors, implementors and subject matter experts in the field of Ontology. Before I get into what they talked about, let me first state the definition of what is an ontology. Wikipedia defines an ontology as, "... a study of conceptions of reality and the nature of being." What does that mean??!! Well it is basically an exercise where ontologists (people who create and work with ontologies) model existing systems into categories which then could be used by information systems to make inference relationship (relationships which are not obivious to the human). As you see it is a qualitative field, people will argue how data entities should be modeled so that they can accurately and precisely describe an existing entity. For example, Ontologist A says that every plane needs to have pilot while Ontologist B would say this is not true. He would say that there are drones which are small planes, that are controlled by a computer. One of them may be right or both of them can be right. The key for any ontology is that it needs to be in a domain to avoid confusion. So in theory both ontologist A and B are correct if it is assumed that both of them work in different domains. Ontologist A is creating an ontology for the airline industry while Ontologist B is creating an ontology for some country's military. Issues like this cause massive headaches to any organization's Chief Information Officer (CIO) since the data cannot easily be exchanged across organizations and across domains. If Ontologies are created well then they are quite powerful.
Currently the ontology is wonderful in the realm of philosophy and theoretical computer sciences. however there isn't much technology in the field. I have been off and on working with ontologies and they are not the easiest thing to implement. I, however, am a strong believer that a good ontology can be used to validate data architectures which pertain to the ontology's domain.

At the conference, I heard two great presentations on ontology. They are:
  1. Werner Ceusters - He gave a fascinating talk on what is ontology which incorporated Basic Formal Ontology (BFO).
  2. Steven Robertshaw - He gave a talk on the lessons learned in implementing complex ontologies.
These people understood how to use ontologies and how not to use ontologies. Both stated that the technology is lacking in this field. Werner and his team built an open source project which lets you play with ontologies. The project, which is a Java based application, can be found at http://sourceforge.net/projects/rtsystem. To get a better understanding on how ontologies work, I recommend downloading the Protege product. It, too, is an open source product maintained by Stanford University. Protege lets you build data models but at the same time it lets you validate your model with "instance" data. I got fascinated with field couple years ago when I was asked to validate a data model for one of the US government agencies. If you are totally lost about what I am talking about then please check out the following links:
The day one of the conference also consisted of vendors displaying their products which are based on the principles of ontologies. In my next blog entry, I will give a more detail summary on what happened at this conference. Stay tuned! (The image is a visual representation on an ontology)

Tuesday, November 27, 2007

Are RESTful web services ideal in a SOA?

Representational State Transfer (RESTful) web services are the latest buzz in the web services domain. What are RESTful web services? RESTful web services consists of HTTP clients which send requests as request parameters in the URL and the response is a XML document which can be viewed in a browser. This is ideal service for technologies like Asychronous Javascript and XML (AJAX) which take XML responses and apply XML transforms (XSLT) to generate a visually pleasing display of content. AJAX's primary mode of transport is through the HTTP browser and it's mode of transport is HTTP or HTTPS. Currently other developers and vendors like Yahoo!, Google and Microsoft have harnessed the AJAX technologies, RESTful services and RSS feeds to generate data aggregators which allow users to find, aggregator, and filter from various data sources how ever they are merely point-to-point services. AJAX working with RESTful web services is the ideal model for the federated query where one request is federated across multiple data sources and then the responses from each data source is aggregated to show the "full" picture.

Yes RESTful web services are not that bulky since they have don't have the SOAP wrapper however let us not get caught up in RESTful services are ideal in a SOA enterprise. The problem with REST is that it is designed for point-to-point services with minimal reuse. This is the same type of problem where SOAP based services exposed method calls which required specific datatypes in their Web Services Description Language (WSDL). When it is a point-to-point services, alot of work is spent in formatting, transformation and processing information to fit the specific datatypes in the non-reusable methods. To make a SOAP based web service more SOA friendly, it is recommended that the services take XML documents as the parameters in the web services method calls. This way one XML document can be passed around where the contents of the XML document might be altered by various method calls. It is not practical for RESTful web services to pass a reference to a XML document as a request parameter.

RESTful web services are great with AJAX however I wouldn't recommend using RESTful web services for services which are highly popular, reliable and have a potential to be reused alot.

Sunday, November 18, 2007

Netvibes.com

A few days ago, I looked at the logs for this site at Google analytics and I noticed users were coming to my site via netvibes.com. I decided to research this site to see what it is all about. After signing up to use their site and researching its functionality, I have to say that it is a classic Ajax site. It shows you what the user experience should be when Ajax is implemented correctly. The UI controls were smooth and easy to use. Even though Netvibes.com uses Ajax, netvibes.com IS NOT a Web 2.0 friendly web site. I don't see any change in the url when I go to the site. I am automatically logged in since the site identifies me via a cookie. As far as the layout is concerned, Netvibes.com simply follows the iGoogle page layout. There is nothing original other than a "To do" widget. After spending some time with the site, I have to say that I am going to stick to iGoogle.

Monday, November 12, 2007

Web 2.0 business!!!

I would love to see if businesses would consider adapting Web 2.0 technologies in their day to day business processes. Here are some existing Web 2.0 technologies:
  • Outlook Calendar - A classic Web 2.0 application where users can share each others calendars.
  • Portals - I am thinking of an application like Microsoft Sharepoint which is a great application where users can share documents and other digital information
  • Outlook Directory - A directory of company employees. Users can add their own entries to their directory.
  • Intranet Website - Employees can find useful information about their company processes, benefits, holidays and other important information.
Here are some other technologies which can be incorporated in an office.
  • Employee Blogs - This will contain an employee's daily status report. The tasks he or she is working on.
  • Project Wikis - Where project artifacts like project plan, project members resumes, contact information, and Configuration Management tasks and existing risks for the project.
  • Collaborative spaces - Where workers can concurrently work and share information and ideas.
  • Aggregated Search Engines which are customized for businesses.
  • Online BPM tools - to let employees monitor their work processes and how they can improve them.
The way is business is done will change rapidly in the two to five years.

Tuesday, October 23, 2007

R U ready 2 Flux?


This evening my father-in-law sent me an email regarding a website called "FLUX". According to Flux, they define their functionality as:
"Flux enables you to add community tools to your website and push your content further - on your site - where you control and monetize it. The Flux Platform is built to meet the unique demands of major media brands and emerging influencer websites"

It sounded interesting so I signed up to enable this blog, Technology-Works, to be Flux enabled. Once I signed up, I got an email which stated:

"
Dear Enoch Moses,

Thank you for your interest in Flux - we're looking forward to adding community tools to your website - increasing traffic and page views. We will be contacting you shortly to follow up.

Check out the Flux blog (www.flux.com/blog) for the latest on what we're up to.

Sincerely, The Team at Social Project
"
After reading Flux's web site, this is what I think of Flux. Flux provides tools which will allow a "Flux'ed" site to be marketed better among its or other social networks. When your site becomes prominent then you can get to make money on the ads which show up on your website, hence the owner of the site can make money. Flux takes a cut of that money. From an advertiser's point of view, Flux offers another channel for marketing.

This is a classic implementation of Web 2.0 where tools are provided which will allow the user to market his site better. After all, Web 2.0 has been dubbed "Active Web" since it allows users to be active on the interne
t. I wonder what my friend who is a renowned Internet Marketing analyst would say about it. He and I have had discussions on the marketing power via Facebook and why Microsoft bought a stake in "now" social network. Anyway I will give an update when I "Flux" this site.

I like the video which I placed on this entry since it does a great job defining a social network and the benefits of it.

Thursday, October 18, 2007

FriendFeed and Kadoo

Today I got my invite to join FriendFeed.com. A personal feed aggregator from various Web 2.0 sites like Blogger, YouTube, LinkedIn, Digg, etc., etc. It also lets you access your friend's personal feed. This way you can get alerts on what your friends up took. I personally don't think it will be popular since users don't want to sign up to multiple Web 2.0 sites. This website is built on top of other websites. I am not too impressed with it. Nevertheless I will use it for awhile since it is still quite new. Good luck to FriendFeed.com site.

BUGS ON FRIENDFEED
  • I cannot add multiple blogger sites.
  • I cannot edit my facebook.com account which is mapped to my FriendFeed account
Things that worry me:
FriendFeed lets you login into your account via your multiple email address (if you register more than one address). This can be a security glitch but the email addresses can also be used by advertisers.

I also came across another web2.0 site called Kadoo.com. They are currently building a Social Information Management System (SIMS). I signed up to be a beta tester on it. SIMS sounds interesting but we have wait and see if it is a cool thing or simply a bust!

Monday, October 15, 2007

Can Oracle learn from Google?

I just read this article on Java Developers Journal (JDJ) that Oracle is looking to buy BEA. This should be interesting. If Oracle does buy BEA then Larry Ellison, Oracle CEO, and his company will have the Weblogic stack, Aqualogic stack and their supporting products. These products will have add to the Oracle collection which consists of:
  • Oracle Application Server,
  • Oracle Database technologies,
  • Oracle Fusion Ware,
  • PeopleSoft,
  • Stellent, etc., etc.,
However after working with Oracle's non-database technologies and hearing other voices in the developer community, Oracle's non-database products are not that great. Unlike Microsoft's products which emphasize user interaction and user experience, Oracle products don't integrate well and they are quite buggy. I don't know if Oracle needs to hire more effective product managers who can build better products. Like Google, it needs to promote innovation and creative. There are reasons why Google is the best company to work for. Among other benefits, Google offers free gourmet meals, 20 % of the work done is on personal projects. As these projects evolve into product ideas, Google inturn takes these ideas and produces great products. Here are some products they created:
  • Gmail
  • Google Talk
  • Google Reader
  • Google Map and other APIs
  • Google Labs
  • Google Apps
  • Google checkout
Most of these ideas were built on technologies they acquired like:
  • YouTube
  • Blogger
  • Keyhole (Google Earth)
  • Orkut
To please its shareholders and client base, Oracle simply acquires companies and their products however they don't spend much time in making their products "cool" or "nitch" after they acquire them. Larry Ellison, if you are reading this, learn from Google's model. Software is cool, fun and it meets business and user needs. If you forget cool and fun then what you end up with a boring, expensive and bulky products like Mainframe computers which eventually joined the dinosaurs in extinction.

Friday, October 12, 2007

Microsoft Popfly and more

A few days ago I was granted access to Microsoft's new and dandy Mashup Editor called Popfly. Why did they call it "Popfly"? I have no idea. Popfly offers functionality like integration to their Visual Studio, the new Microsoft technology called the Silverlight plugin. I had high hopes for the Popfly technology since it is from Microsoft.

Unfortunately tonight I faced with the "Blue screen of death" in the Mashup editor world. I was working with the Popfly's Facebook block , which is like a Yahoo! Pipes widget, and I tried to get pictures of my friends from my network. Unfortunately it didn't work. Instead, I saw the default set of friends who looked like Microsoft employees. The only functionality that worked was that I could only see my information. The Popfly product has promise however it is still alpha. Unlike Google Mashup Editor and Yahoo! Pipes, Popfly offers UI aggregation interface as well as a .NET API interface. The competition has just begun between the major vendors in providing Mashup Interfaces. I am waiting for Oracle to join the party.

As I had suspected, Microsoft is slowing but surely integrating Facebook into its platform. Popfly is a great example of it. It looks like Microsoft and Google see immediate short term gains going the Web 2.0 route and not the Semantic Web route which seems to a better approach in the long term.

Thursday, October 11, 2007

Semantic Web verses Web 2.0



Semantic Web
vs

Web 2.0

Yesterday's entry was about Peter Patel-Schneider's talk about Knowledge Representation and Semantic Web. The Semantic Web and the Web 2.0 are two distinct approaches on how web technology should be in the future.

Definition of Semantic Web
  • According to W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF). "
  • Wikipedia defines Semantic Web as, "...an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily.[1] It derives from W3C director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange."
Definition of Web 2.0
O'Reilly defines Web 2.0 as the next generation of web technologies or methodologies. However, from the latest trends, web 2.0 seems to be web technologies which allow the user to directly interact with the web (mashups) and other members on the web (social networks). This shows the two distinct visions on how the web technologies evolve.
  • Semantic Web promises to provide intelligence data which will enable software to read and understand what the data is about. This inturn will allow software to understand what the user is looking for. Software will be able to use inference engines and other technologies to provide user with a wider variety of pertainent information on the internet. Semantic Web gives the power of decision to the software which works with the users.
  • Web 2.0 promises to provide technologies which will allow the user to make decisions on how he or she wants to access, discover and process the information on the internet. It is assumed that the user has to take initiative to look for information and Web 2.0 technologies will assist the user in finding the information. Unlike Semantic Web's vision, software is only a proxy for what user is looking for.
With these two distinct approachs, let's compare what are the benefits and drawbacks of these two approaches.
  • Semantic Web
    • Thin clients - Clients are not expected to store any information. Built-in Semantic Web language processors will simply read and interpret the data.
    • Not UI friendly
    • Governance is a big issue since Semantic Web lanuages have to be regulated
    • Potential Data discovery by the user since Semantic Web languages can show inferred information
    • Localization is a issue. I am wondering how RDF, RDFs and OWL will be work with languages which had distinct rule differences
  • Web 2.0
    • Thick clients - Clients have to process technologies like AJAX
    • UI Friendly - Look at Yahoo! Pipes, Google Maps
    • Potential bad information is created - If users create aggregated data via mashups, users can interpret the data incorrect.
    • Minimize privacy - With the assumption that social networks, blogs and other personal information on the web, the user's privacy is minimized. We need to make Web 2.0 more secure!
Like the topic of Browsing verses Searching, the optimal answer lies between Semantic Web and Web 2.0 technologies. Semantic Web and Web 2.0 can be used in a complimenting way and I believe this is will be the next wave of web technologies.

I put up a video for each approach. Each video is stylistic different and I chose these two videos since the two approaches are so different.

Wednesday, October 10, 2007

Semantic Web


I came across a great talk on Semantic Web and Knowledge Representation on YouTube. The talk was given by Peter Patel-Schneider. In the presentation, he talks about:
  • the evolution of Semantic Web
  • various Semantic Web languages
  • benefits of Semantic Web
  • problems with the Semantic Web Vision.
After seeing the talk, I see the benefits and issues with Semantic Web theory however there are not applications or tools which process Semantic Web languages. I believe Mashups, Representational State Transfer (REST) web services, and Web 2.0 implementations is a good step towards the next generation of internet technologies. Semantic Web takes the approach of adding intelligence at the data tier while Web 2.0 takes the approach of enabling the user to intimately control the data or the rendering of data. I highly recommend Peter Patel-Schneider's talk.

Tuesday, October 9, 2007

DITA - That's correct!

Today I was analyzing Java code for a refactoring effort. Before our team could refactor the code, we need to write unit tests to be sure we didn't break any functionality in the software. Unfortunately after looking at the code and asking other team members, I realized that the Java code was poorly designed. What does it mean that the Java code was poorly designed? Does that mean that the code didn't meet the software requirements or pass QA? No, it means that it will take alot of work to introduce new functionality into poorly designed Java code. The code is brittle, tightly coupled, not very reusable. It is always beneficial to design all software before it is developed because during the design process the requirements and the initial design could be analyzed for potential bottlenecks, brittleness, tight coupling, etc., etc. If the software is developed without any design then it is a linear process. The developer will write the code to meet the initial requirements however he or she does not take a step back to assess the developed code's resusablity and flexibility.

This is also true in writing documents. In high school and college, students are taught that they need to create an outline of their paper, then a draft and then the final copy of the paper. The paper meets the instructor's requirement like:
  • Write an essay about the Civil War
  • Write a technical paper on how to build a Lawn mover
  • Write a paper why Roe Vs. Wade is beneficial for the United States.
After the paper is written, the paper is graded on how well the student expresses his or her thoughts about the requirements of the paper. However if the instructor added another requirement that he plans to write a paper which is only comprised content from his students' papers then the paper writing process would be different. There would be discussions between the instructor and his students on what the instructor's paper is about and he plans to write the paper. This would also alter the students' approach to their papers. They would emphasize ideas and support their idea .This might restrict the flow and readability of the students papers To improve the flow of the paper, the students may choose to organize the information differently.

In the information age, every major organization is trying to mine data to give themselves an edge over their competitors. Companies like Google, Autonomy, Fast, Vivisimo, etc, etc offer products which offer search capabilities for unstructured data which is linearly, brittle and tightly coupled (just like this blog entry).

Yesterday I came across an XML standard which identifies the issue of linear writting and describes how to write reusable content. It is quite fascinating. The XML standard is called Darwin Information Typing Architecture (DITA). It uses ideas of inheritance and specialization. I don't know if DITA is the answer but it asks the right questions and it identifies the issues. Hope you enjoy the white paper (be warned that it is quite technical). If the white paper is too complicated then check out this power point presentation. Enjoy!

Learning from Mashups

Yesterday and today I spent time creating a Job Aggregator on Yahoo! Pipes and then building a frontend on Google Mashup Editor. After spending some time on Yahoo! Pipes editor, I was able to build a Job Aggregator which incorporated Google Base, Monster.com and Hotjobs. This is how it works:
  1. User inputs a job query in the textbox
  2. User hits submit
  3. The request is sent to the Yahoo! Pipes engine
  4. The Yahoo! Pipes engine inturn creates a copy of the request and submits it to Google Base, Monster.com and Hotjobs
  5. The responses from each job site are sent back to Yahoo! Pipes engine.
  6. Yahoo! Pipes inturn aggregators the responses and removes duplicates
  7. Yahoo! Pipes engine then presents the data to the user.
The Yahoo! Pipe which I created can be seen at http://pipes.yahoo.com/pipes/pipe.info?_id=gl_zggBz3BGfqGtH1vC6Jw

Yahoo! Pipe also offers another service which is to create a rss feed for the service I just created.

I in turn took this RSS feed and rendered it in Google Mashup Editor. I have to say that Google Mashup Editor still needs alot of work. For instance I could not create URLs to extract information. I had to hard code the Yahoo! Pipes queries and just show the results. It was frustrating but for now it has to do. The front-end can be seen at:
GME Team-If you are reading this entry then please add the url builder functionality. It will make your editor more robust. You might also think of adding gwt tags to promote your GWT product.

Data Aggregation with Pipes and more

Today I was working with Yahoo! Pipes and I created a simple module which aggregates data from my wife's, father-in-law's, sister-in-law's and my blog and sorted them from the latest to the oldest published. It is quite neat. The process is quite easy however I couldn't get Yahoo! Pipes to extract feeds from blogspot's rss feed and my brother-in-law's rss feed. I used blogspot's atom feed and my sister-in-law's blog's rdf feed. Yahoo! Pipes could not read:
I liked the way I could combine the feeds into one feed and then sorted the blog entries in the feed. I was not able to aggregate feeds with Google Mashup Editor. I am still learning that editor. For a quick work around with Google Mashup Editor, first subscribe to the feeds via Google Reader and then create one feed which can do all of the aggregation. Then manipulate the aggregated Google Reader feed via Google Mashup Editor. Yahoo! Pipes does not offer any way of adding presentation components like CSS, graphics or JavaScript. Here is the link to the Yahoo! Pipe which I created:

I like them both but either one met all of my requirements.

Can Businesses act on Mashup'ed Data?

A few days ago I got invited to use Google Mashup Editor and currently I have been analyzing Google Mashup Editor and Yahoo! Pipes. I am waiting for an invite to try Microsoft's Popfly. This has me thinking. Currently mashup's allow non-technical users to discover data sources, connect to the data sources, get data from the data sources and aggregate the data from other data sources. This is great since it allows the users to get the important data from various data sources. It is implied that the users are using their natural processes to process information from the web. However the big question is:
"Can decision makers make decisions based on "Mashup'ed data?"

If a decision maker makes a terrible decision based on "Mashup'ed data" then who is liable for the data. Can the decision maker trust the data for him to an important decision? How fresh and reliable is the data? How reliable is the Mashup?

Mashups for now are cool but I feel they are gimmicky if there is no assurance on the mashup or its data. Can we start writing MLAs "Mashup Level Agreements"? Only time will tell if Mashups are successful in the corporate world.

Google Mashup Editor and Yahoo! Pipes

This evening I got an invite to use Google Mashup Editor. I have to say that I have been impressed. Unlike Yahoo! Pipes, this Google Mashup IDE is an WYSIWYG editor. I played with it for a few minutes and I really liked it. The samples that they provided are good enough for a running start and the tag documentation is quite good.

I am currently working with Lancaster County Web Computer Aided Dispatch (CAD aka 911) data which is provided as a RSS feed. I am trying to render them on Google Maps. I tried to the exact thing with Yahoo! Pipes and I felt it was less user friendly. It is true that with movable Widgets and connecting with them with "Pipes" can be fun but it can also be a frustrating experience. Yahoo! Pipes reminded of the times I worked TIBCO BusinessWorks Editor or the AquaLogic Fuego tool for BPM. It is great once you know but the learning curve can be quite steep. I was and still am frustrated with the widgets that Yahoo Pipes had since some widgets don't work with each other. This is typical of any UI friendly editor compared with any WYSIWYG editor.

With WYSIWYG editors, you can look at the XML code and decipher the programming logic. With UI friendly editors, they may look cool but it can be a frustrating experience. For example the Yahoo Pipes editor had a widget for Yahoo Search but I didn't find one for Google Search. I wanted to run the query "children hate animals" against various search engines but it became a frustrating experience. If any Yahoo! Pipes lover is out there, here are few requirements which might make the Pipes editor more pleasant.
  1. Sort the widgets out and tell the user which widgets work with what
  2. Ability to create custom widgets and save them in a library.
  3. Allow users to publish their custom widgets - this would allow a larger user base since users will promote their custom widgets
  4. Allow users to publish their Pipes on different websites (kinda like Google Maps)
  5. Have contests for the best Yahoo Pipe in each category. Publish the widgets and feeds for each contest.
I need to do more research on Google's Mashup Editor but right off the bat it would be nice to see:
  1. Able to integrate other Google products like Google Talk, Gmail, etc.,etc
  2. Index Google Mashups for higher visibility
  3. Ability to publish Google Mashups and put AdSense around it. Incentive for the publisher
  4. Tighter integration with Blogger
  5. More to come.

A Web 2.0 Problem

I just got done reading course notes for a Web 2.0 course. The notes were written by two Google employees, Joshua D. Mittlemen and Steffen Meschkat. The course notes were called Keeping the Web in Web 2.0: An HCI Approach to Designing Web Applications and the notes were well written.

The authors talk about various features involved in designing Web 2.0 web applications. They talk the benefits of designing web apps which using client side processing rather than server side processing. They also talk about the differences in browser implementations and their implementations of JavaScript. Even though the authors talk about the advantages of Web 2.0, they do mention that web page, which has Web 2.0 functionality, is dependent on: Browser implementation like Internet Explorer, Firefox, Opera, and Safari; Browser version, and JavaScript library versions. I strongly believe that moving the data processing to the client side will make the web application more unreliable and it could become a system maintainability issue. This issue could be eliminated if various browser vendors like Microsoft, and Mozilla agree on the JavaScript implementation and not just on the JavaScript standard. This is a problem with various standards and a good example is the Java Messaging Service (JMS). Tibco's JMS implemenation is not understood by other vendors like Sun, Sonic and Bea and vice versa.

XML Based Framework Standards


After working with US government sponsored XML frameworks and standards for 2 years, here are some of my thoughts.

  1. After studying Global Justice XML Data Model (GJXDM) and authoring all the datatypes in InfrastructureProtection domain in National Information Exchange Model (NIEM) version 1.0, I have concluded that these standard will never be fully implemented in Gov. to Gov (G2G) systems or Bus. to Gov. (B2G) systems. They are extremely bulky and may be potential bottlenecks in any system. Working with them can be extremely timeconsuming and sometimes redundant. The governance for these frameworks has not been fully resolved.

  2. I, however, strongly believe these frameworks and standards are a key for any enterprise's efforts in addressing their data management efforts. The GJXDM and NIEM are one of the best data models I have ever worked with. They are well designed and are extremely granular. Eventhough they cannot be implemented by any system, they should be used a Rosetta Stone to evaluate any enterprise's data architecture. These data models should be referred when SOA governance and data managements issues are addressed.

  3. Use these frameworks to reverse engineer various domain models. The GJXDM and NIEM data models are extremely rich. A great place to get requirements.

In conclusion these frameworks aka standards aka data models cannot be used as they were envisioned to be used, these data models are extremely useful when designing an enterprise wide data management system,

Black Hole in the internet (A theory)

I like the PageRank theory which Google uses to rank web pages which it indexes. As I understand, a web page is given a higher rank when other pages (internal and external) link to that page. For example Page A exists on web site alpha and Page B, Page C and Page D link to Page A. Page C and Page D don't have any other references and they exist on web site gamma and Page B resides on web site alpha. By doing so, it is inferred that Page A is more valuable Page B, Page C and Page D. Page A has higher importance since they are referenced by Page B, Page C and Page D. This is ingenious because it may be assumed that if Page A did not exist then Page B, Page C and Page D cannot validate it's content through an reference (which in this case is Page A).

Therefore here is my theory: if there exists a web page called blackHole which:
  • has no links
  • is referenced by all web pages (directly or indirectly) as a link
  • is NEVER changed
  • is hosted on a server which is 100% reliable and it never slows down or goes down.
Implications of phenomenon are huge since:
  • the page called blackHole will get alot of hits from users and spiders
  • this could undermine the pageRank algorithym unless Google fugdes it to address this issue.
I call this phenomenon the cyber black hole. I would like to hear from people about this idea.

Search Engine Problem

Jargonaut, thank you for the link to CMCH Research. I did enjoy their Smart Search interface. It is very interesting but it also shows the underlying problem between various search engine. For example, in the CMCH Research Smart Search interface, I typed in "Why Children hate animals". I got the results:
  1. Google -

    animals politics law children peace hate - Animals - Newly ...

    Top Environmental News stories on global warming, wildlife, sustainable development, animals, nature, health, and more!
    www.care2.com/news/submitted/category/animals/animals+politics+law+children+peace+hate - 51k - Cached - Similar pages - Note this

    Mitchell, Last Dinosaur Book, excerpt

    Why Children Hate Dinosaurs. Although our history of the dinosaur is over, ... the dinosaur may be the most publicized animal in children's lives. ...
    www.press.uchicago.edu/Misc/Chicago/532046.html - 22k - Cached - Similar pages - Note this

    Fur Is Dead > Features > PETA’s New Comic for Kids - a Real-Life ...

    PETA will be there to greet any fur-clad moms and their children with their newest anti-fur leaflet-PETA Comics presents..."Your Mommy Kills Animals!" ...
    www.furisdead.com/momfur.html - 38k - Cached - Similar pages - Note this

    Wondertime: First Pets: Wild Thing - and More Joys of Parenting ...

    Family Pets for Children - Wondertime unveils what children can learn from their first ... "I don't hate animals," I said. It was the idea of having a puppy ...
    wondertime.go.com/life-at-home/article/0806-first-pets-wild-thing.html - 24k - Cached - Similar pages - Note this

    I hate Animal Planet! [Archive] - Pets Hub

    [Archive] I hate Animal Planet! Other Pets & Animals Forums. ... YOU SHOULDN'T HOLD CHILDREN THIS WAY, MUCH LESS WHEN TOSSING RAW MEAT TO A CROC 2.5 FEET ...
    www.petshub.com/forums/archive/index.php/t-36165.html - 46k - Cached - Similar pages - Note this
  2. Yahoo! -
  • Why Do We Hate?
  • ... beings are animals, but we seem to be the only animals that hate enough to kill ... create nonjudgmental safe environments for children who so easily absorb and ...www.kuufnh.org/whyhate.htm - 21k - Cached
  • REDSKIN, A HATE WORD DEFINED page 1
  • Why Indian mascots are considered hateful, and the shameful history of the Redskin label. ... A REDSKIN IS AN ANIMAL ... slaughtered our children as you would ...www.iwchildren.org/redskinhate.htm - 19k - Cached
  • SFist: Why We Hate Racists
  • ... after running an opinion piece titled "Why I Hate Blacks," by a local 24-year ... Youâ€_ re an Asian male-white female couple with biracial children. ...sfist.com/2007/02/27/why_we_hate_racists.php - 88k - Cached
  • Why I hate vegetarians
  • ... day made me realise why I hate vegetarians. ... They are putting the welfare of animalschildren. ... Back to Animal Rights / Home ...www.maninnature.com/Management/ARights/Rights1w.html - 9k - Cached
  • HATE CRIMES
  • ... already faced the murders of, or assaults on, our children and loved ones. ... If we are willing to extend federal protections to animals, why not people? ...www.pflagupstatesc.org/hate_crimes.htm - 18k - Cached
  1. MSN -
  • Why I Hate Your Children 2

    In a previous badass article ( Why I Hate Your Children ) I completely demolished the bullshit ... 400 pound gorilla bitch and stayed thanks to your love of animals ...

  • Why I hate vegetarians

    Why I Hate Vegetarians People should not be bullied into giving up meat by humourless ... They are putting the welfare of animals before that of their children. Giving up meat and dairy has been linked ...

  • REDSKIN, A HATE WORD DEFINED page 1

    ... to you as one human to another, why the Native ... You slaughtered our children as you would slaughter the ... or to be free from your words of hate. We are treated like you treat the animals,

  • Why women hate sex.

    ... are Golden Toilets...Flush; Feminism Is Stupid; Pets Are Not Children ... lot of good work being done on homosexuality in animals. ... is mostly accurate, sure i have my own conclusions as to why women hate ...

  • Why Do We Hate?

    WHY DO WE HATE? Sermon given by Rev. Emily Burr on 5/7/06 ... Human beings are animals, but we seem to be the only animals that hate ... can help create nonjudgmental safe environments for children ...

  1. Ask Jeeves - Democrats Hate Children, But So Does Phil Parlock - Wonkette
    Democrats Hate Children, But So Does Phil Parlock ... [DU] Democrats Hate Children [Wonkette] ...
    www.wonkette.com/archives/democrats-hate-children-but-s...www.wonkette.com/archives/democrats-hate-children-but-so-does-phil-parlock-021497.php
    Newsgroups: alt.food.vegan, rec.sport.football.college, alt.animals.ethics.vegetarian ... Why would killing animals for my computer give you ...
    groups.google.com/group/alt.animals.ethics.vegetarian/m...groups.google.com/group/alt.animals.ethics.vegetarian/msg/d802d7375cf90061
    I have mixed feelings about children but hate isn't in the mix at all. ... why would people hate children, they were all once children, that ...
    surveycentral.org/survey/18886.html
    I love animals yet hate children. I dreamt again about stamping on the babies heads again last night, squashing them like watermelons and ...
    www.notproud.com/anger/anger743.php
    Do Palestinians Teach Their Children to Hate? ... staff blush: “We teach our children to respect life, while they teach that if you die ...
    www.ifamericansknew.org/stats/hate.html
  2. Vivisimo -
  1. Alta Vista
I looked at these results and realized that these links are not the right links. I wanted to see why certain children want to hurt animals rather than to play with the animals. Each reputable search engine gave me different answers. There was no consistence at all. The results across the board were not accurate nor precise. This makes me wonder if the user is really saving time. If I typed in the word "children" in various search engines, I get results related to children. This means that the results are more accurate and precise. This shows that queries directly hit search engine indexes and the indexes and page ranking are dependent are independent search algorithms. This is a typical problem of text based search engines (including Google). I believe there is a way to address this issue. Email me and I will let you know.