Monday, December 31, 2007

Resolution for 2008: Lets cut down on XML abuse

When XML started off it, it was envisioned to be a super set of HTML where custom tags could get created and content in custom document strutures can be circulated in the World Wide Web. I, however, believe that XML technology is being misused. It is popping up in Semantic Web, SOA, and in pure data modeling efforts. It is true that XML is human AND machine readable however computing processors are paying a heavy price parsing and ingesting the content in complex XML documents. I recently attended an Ontology conference in Maryland and the biggest theme from the conference is that there isn't enough machine power to process all of the complex business logic to make content inferences. Even though XML is human and machine readable, it is extremely verbose and it is not practically to have machine process all of the tags and get the data from a XML document. The next question to ask is: "When should you use XML?" The answer is that when the XML documents are not overly complicated which might include nested tags, and a large amount of data. If the data is quite big then other binary content should be passed around in COM objects or Enterprise Java Beans (EJB)s. Here are some arguments regarding XML
  1. XML is interoperable - True XML is interoperable for cross platform communication. However please don't send massive XML documents which can bough the system down. XML may be interoperable but it is not performance friendly.
  2. XML is a good format for web service communication - Web Services (SOAP and REST) largely deal with XML based content however Web Services are NOT reusable if the XML format is propriatory between the Service Provider and its clients. Please avoid sending large amounts of data over the wire. SOAP with attachments is a good alternative for sending large chunks of data.
  3. XML is human readable - This is true however if a developer decides to create a XML format which he only understands then the XML is unreadable and it needs to be released. For example, a developer decides a create a XML document which looks like this: Person. This XML document is a complete waste since not every human understands it. However if tag names are defined properly and their definitions captured then the XML is human reable.
  4. XML is the future in Semantic Web- Recently I came across OWL-S editor software and I tested it out. The software generates an OWL and RDF documents however these XML documents are verbose and they cannot be processed. Instead of a XML based technology, the Semantic Web should look at other options rather than just XML.
In conclusion, XML is a great technology however people tend to misuse it and then tag it as a wasted technology. XML is not going to go away so lets use it properly and take care of it.

Saturday, December 15, 2007

Spock.com: Identity Manager

You ever wonder if there are sites which manage people information which is found on the web. It is an interesting concept. If you type my name, Enoch Moses, in Google.com, you get Christian theological sites which discuss about Enoch and Elijah. You also get my entry in Linkedin.com. However Google displays a number of my blog entries, my linkedin profile, myspace page, etc., and etc. Sites like Spock.com let you manage your information on the web and this way you can define and identify which is your content, content about you or content involving you. It's a neat site. Here is my Spock page.

Enjoy!

Tuesday, December 11, 2007

Would I invest in a dot com startup?

Every Joe on the street thinks he can make a quick million or two by starting his own dot com company. Hey Chad Hurley and Steve Chen, founders of YouTube.com did it. Unfortunately there is a saturation of the web sites which promise functionality however they are dependent on your, the user's, data. The first question to ask is why would I want to put my information on a third party web site so that web site founder can make some money. It is the sad truth but most of these companies will not last long. For example let's look at various Social Networks:
  • MySpace.com - one of the first site which actually picked up in popularity. Now it is mired in mediocrity and I don't see anything new and exciting happening on the site. Did I mention that they have added Facebook like functionality.
  • Facebook.com - I have to say that I was a skeptic when I joined this site however this site offers neat functionality where the user can actually spend time on the site. I am a big fan of Facebook's scrabble application
  • High 5 - Started by an east indian and it's being marketed heavily in the east indian community.
  • Orkut - I have to say this is probably one of the worst social networks I have come across. It is probably Google's worst purchase.
  • Kadoo - A new social network site whose UI looks promising however I still don't have an incentive to join this group. I am not into propogating my identity across the internet
  • Linkedin.com - I like this website. It's a social network for your professional contacts
  • Xing.com - It is a similiar network like Linkedin however this one is popular in Europe
  • PageFlakes.com - Someone emailed me asking me to experience this site. Once again I ask the question. Why should I sign up on PageFlakes.com?
  • NING.com - This is the UBER social network where anyone can create their own social network. UI is not that great and it is meant as a research application.
After I mentioned all of these social networks, how would a investor invest in these kind of businesses? Frankly everyone of these social networks offer similiar if not identical functionaliy. I would probably want to invest in networks which have alot of users and the network has a niche like Linkedin and Xing which only work with professional social networks.

Friday, December 7, 2007

Items of poor design...

For the last six months or so, I have been in involved in a project where is the system has been growing organically. It has been growing like fungus. Fungus is a unique living organism. It does not have a head, legs, hands or a body. It just exists and it keeps evolving as long as there are enough nutrients and dampness. The system I have been working on is an IT fungus. The system has evolved over the last four years or so. Developers come and go but this system still exists. This system has no requirements documents, design documents, and no test plans. The software is poorly documented and I wonder how this system still exists. Well it does exist and it seems to be growing larger and larger. The system is composed of subsystems which have evidence that developers have tried to improve the system but they have miserably failed. The failed implementations were not cleaned and they tend to future the process this system's evolution. This system is a J2EE system and you will notice the following things in this system:
  • 1500 to 7000 line JSP page which have scriplets embedded in javascript which inturn invokes JDBC calls
  • Partially implemented hibernate framework. This is evident with *.hbm.xml files in the source code
  • Numerous properties files which are now neatly packaged in a oracle database
  • Spring Web Flow - The developer who implemented this portion did a great job.
  • Prototype Ajax
  • JSP pages with scriptlets and jstls
  • Same piece of code in numerous web apps which have been customized for each web app
  • Field level filtering in the database for each user (not role but user)
  • User is authenticated between each web-app even though each web-app is part of the bigger system.
  • Custom API integration for each COTS product.
  • Outdated Stored Procedures in the database which were not used anymore.
A few weeks ago, I wanted to write junit unit-tests for some of the "cleaned up" classes and I failed miserably since all of the code is tightly coupled and it follows the onion architecture. In an onion architecture, you never know what you are going to get under each peel. The point of this blog entry is to remind every IT personnel that systems are not organic beings but they are rather simple business logic processors. In the SDLC, a good design, which has been reviewed and analyzed, clean implementation and robust test cases are the basic ingredients for a good system.
(PIC of a fungus called WitchButter)

Monday, December 3, 2007

Book Review for Mike Daconta's new book

Couple months ago, I agreed to review my esteemed colleague Michael (Mike) Daconta's new book Information As Product. I liked Mike Daconta's previous book called The Semantic Web. The Semantic Web was a great book and it basically laid out the history of the semantic web, described the benefits of semantic Web and described the vision of the semantic Web. This book was my passport into the world of ontologies, data modeling and understanding the importance of data. I have also had the privilege of working with Mr. Daconta on couple of DHS projects and I believe he is truly a visionary in data management.

The biggest knocks against Mike Daconta in the industry is that he is a "dreamer" and he has not been able to deliver his dreams into substance. I believe Mike is a visionary and people like him are essential in IT innovation. He offers ideas which address actual business problems and it is up to engineers to formulate the ideas into reality.

His new book Information As Product follows this pattern. In the book, Mike offers a solution of producing Information as a product which comes out of a "Information" factory line and it is appropriate information for the appropriate person and it is delivered in the appropriate time. The book presents a general solution to a problem plaguing various enterprises. The problem is that there is a temporal and semantic gap between information consumers and producers. The book does a great job of describing the concepts involved with this idea however no system analyst and architect can decompose this book into functional and non-functional requirements to build a system which will make concepts in this book a reality. Mike states in his book that the book Information As Product is the first book in a series of books which will engage its readers in a dialogue on how information systems can be improved.

The things I liked in the book are:
  • easy to read
  • presenting the ideal information management system as a factory line where information can be packaged in a package
  • I loved the way he describes packaging up the information. I believe this idea can implemented
I wish he expanded these concepts better in this book:
  • the importance of metadata, selecting the right metadata and consequences of poor metadata management
  • The DIKW (Data-Information-Knowledge-Wisdom) pyramid - It is only a conceptual model. I would love to see how metadata fits in this pyramid.
  • He lost me in couple of parts otherwise it is not a bad book.
In summary, if you want to build a system from this book then I recommend that you don't buy this book. If, on the other hand, you are looking at data management solutions for your enterprise then this is a great book since enterprise level functional requirements can be derived from this book. I, personally, enjoyed reading the book however I was left with more questions than answers.
Links to buy the Information As Product book.