Main

August 9, 2007

Hadoop And The Opposite Of The Not-Invented-Here Syndrome

Microsoft is famous for having a really bad case of 'not-invented-here' syndrome. They don't like to accept any protocol or standard or even take a perfectly working piece of software and include it. It wasn't invented at Microsoft so it's automatically crap. They have to "fix it". Yahoo! appears to have turned that on its head.

Yahoo's biggest competitor is arguably Google. Google invented an algorithm for data processing called MapReduce. They use it to process the terabytes and petabytes of data they grind through on a regular basis. They piggy back that on top of their storage system called GFS (Google File System). Because Google published papers on all this software, even though they don't make the software itself available, there was enough description for people to start developing their own versions of the Google tools.

Yahoo has now decided to both use and endorse the toolset Hadoop. Hadoop encompasses implementations of both GFS and MapReduce so arguably Yahoo is now running software that is based on ideas from their direct competitor. They aren't shy about it either, they aren't hiding it, rather they are telling the world that the software is good, they like it, and they intend to support it.

Bravo. I'm impressed.

July 3, 2007

A Follow Up On Sun's JavaOne Sessions

I mentioned the other day that Sun was making their JavaOne technical sessions available online and they have continued to upload more and more of the multimedia versions since then so you can listen to the audio while watching the synchronized slide presentation. In addition to that they've uploaded the instructions and materials from their hands-on labs as well. If you'd like some in-depth step by step tutorials on a variety of different subjects then you should look through and see if any of them appeal to you. I spotted at least three that would be worth the time for me.

June 22, 2007

JavaOne Sessions Available Free Online

Sun is now making all their technical sessions from Java One available online. You can get the PDF for any of them and they are gradually getting the full audio plus slides up (e.g. all the 2006 presentations and about 20-25% of the 2007 presentations are up in full audio/slides form now, the remainder of the 2007 stuff is PDF only for the moment). You have to join the Sun Developer Network (SDN) but signup for that is free.


http://developers.sun.com/learning/javaoneonline/

 

The presentations are pretty awesome. I'm watching the one on the Java Persistence API from 2006 and there are several others I have my eye on for later. This is an excellent educational resource and if you develop in Java for a living you would be remiss not to go through what's available there.

February 22, 2007

Google Makes Their Internal Lectures Available To Anyone

Google has a regular internal lecture series on mostly technical topics with guest speakers and some of their own employees. This is their TechTalks Series and the best part of it is that you and I can see them too. They record and digitize the talks (close to 300 of them to date) and make them available on Google Video, using a consistent tag on each one so you can easily search for them and see if any interest you. The link above will do the search and show you all the videos so far.

Some of the titles that caught my eye:

Privacy Preserving DataMining

Turning Email Upside Down: RSS/Email and IM2000

Strike Up The Brand: How to Design for Branding

Ruby And Google Maps

Ruby Sig: How To Design A Domain Specific Language

Note: I'm not endorsing any of these, I haven't had a chance to view them yet. They just looked interesting to me.

December 18, 2006

How To: Overcome Being Regular Expression Challenged

I had never really learned regular expressions. Oh sure, I could use * and ? as well as the next guy, but throw [0-9]+ at me and I had no idea what it meant. That is, until the last year or so. Regular expressions can be very helpful in pattern matching against file names or user input and I dislike having gaps in the overall toolset of things I'm comfortable using.

So I set out to correct it. I'm still pretty ham-handed when it comes to typing in a regular expressions and I end up having to consult cheatsheets in order to remember the syntax to do a lot of things, but I discovered online tools and websites that helped me overcome the gap in my knowledge. Here are my favorites:

  • txt2re: headache relief for programmers - This lets you input a piece of text you want to match against using a regular expression and it displays different expressions you could use depending upon which parts you want to match against. It can give you a quick start on matching even if you aren't yet very regex savvy.
  • RegEx: online regular expression testing - This is the other handy part of the equation. Here you can input several items of sample text and a regular expression you wish to test. Hit the test button and it will show you which test text matched, what parts were matched, etc. It's somewhat Java oriented but might be as useful for any language with fairly standard regex syntax. I've found it very helpful for iterative development of complicated regular expressions.
  • Regular Expression Library - A library of already crafted regular expressions (with various levels of complexity and robustness) to validate things like email addresses and telephone numbers.
  • Regular Expression Tutorial - A tutorial capable of teaching you enough to be dangerous in a fairly short time. Good info and it's not intimidating.

August 15, 2006

Sun Commits To Real Dates For Open Source Java

Why it isn't the biggest boldest headline on every major tech website is beyond me, but Sun has apparently commited to open sourcing the Java compiler (the part that takes Java source code and spits out Java byte code for VMs to run) and the HotSpot VM (an excellent virtual machine with really nice optimization) by the end of the year. They will also be open sourcing the Java Micro Edition (popular on cell phones and other small devices) and almost all the rest of the Java Standard Edition stuff by the first half of next year.

Oh, there's been talk about doing something for ages, but real action?!? Real dates with real pieces of software attached to the dates? Real dates that are in the near future?!? Wow! Simply WOW!

I've long argued that Sun was a company that did not get open source. Even though they had projects like NetBeans and OpenOffice, they frequently showed off a real ignorance of what we as Java users and they as Java owners stood to gain by making it all one big group effort. But when Jonathan Schwartz was put in as CEO there was a lot of talk about the fact that he did get it. Clearly the people who said that were right.

Expect Java's already huge lead on .NET to become insurmountable, this will likely be the final nail in the coffin for that particular mistake. Also expect to see releases of Java come much more frequently as we get group effort from corporations with a vested interest in Java throwing resources at it (i.e. IBM, BEA, Google, etc.).

More details are available in this article: Sun expands open-source Java plan | Tech News on ZDNet

June 19, 2006

This Just In: Sun Sacrifices Puppies

This just in, Sun sacrifices puppies to a pagan idol in order to try and regain market share and they also bundled a database with the upcoming JDK.

Reaction: OH MY GOD! They bundled a database! Kill it. Kill it with fire!

May 19, 2006

Google Pulls Back The Curtain A Little More

Google used to be wrapped in mystery. Most developers knew little about how they do what they do. But over the last year or so they've been pulling back the curtain to reveal things like the Google File System, how they process enormous datasets, and in some cases even some details of what tools they are using to do things like Google Mail (Java, of course :) ).

They took that to its next level by releasing a beta version of a Java toolkit for Ajax work called the Google Web Toolkit. It offers a lot of the same kind of remote procedure calls and DHTML/JavaScript stuff we've seen before in nifty toolkits like DWR except that this time I see just a few things I haven't seen elsewhere. One of those things is that when you do a web interface which has multiple tabs and you use their toolkit, you can actually bookmark individual tabs and even use next and previous in the browser correctly. This is something that is broken on many websites that use Ajax today.

I haven't put it to use yet but I will be trying it out in the future and comparing it to DWR unless someone else beats me to it.

Showing Off Swing

I have liked Swing ever since I first used it. It makes far more sense than any of the Windows/GDI, X Windows, or Amiga UI work I had ever done in the past. It offers far more flexibility (often without a lot of pain) and I've never understood why anybody would be down on it other than because it has something of a learning curve associated with it. Nevertheless, it has gotten bashed because it is slow (I never thought so), ugly (can't agree), or just not cool enough.

Both of the latter two perceptions, I won't even refer to them as problems, are being addressed by people like Romain Guy who are working hard at technology demonstrations that show what Swing is really capable of and how cool and sexy it can be. If you want to see some pictures of the latest demonstration of what Swing is capable of, you can check out Aerith, a Very Cool Swing Demo. It features all kinds of 2D, 3D, and other effects to look like the latest thing from Apple.

Go there, check it out and maybe it'll influence you to try something in Swing that you might not have otherwise considered.

April 18, 2006

Holding Off On Eclipse 3.2 Upgrade

I know a new release of Eclipse is coming soon. Release Candidate 1 can be downloaded right now for Version 3.2 if you are interested. But I have to confess I didn't have much of a clue what I was supposed to get in this release that would be different and worth upgrading for. After having looked through the documents below, detailing the new and notable changes for each milestone release up to today, I'm probably going to stick with 3.1.X for the moment and wait for the final release in June. This isn't the kind of major release that made an easy choice between learning Eclipse 2.0 or working with the milestone releases of 3.0, this is smaller incremental improvements befitting the minor version number increment it's getting.

February 28, 2006

Tagged Code Snippet Repository

Peter Cooper has developed a simple source code snippet repository with tagging that I like. It's far from perfect, the syntax highlighting is often squirrelly and if you enter in a multi-word tag like this "unit test" when you go to edit the code you get your tag back as two separate words so you always have to put the quotes back in place. Nevertheless, it works and it's a good start. Plus people aren't being shy about putting code out there.

What we need at this time are a small army of Java developers putting their code snippets into this repository. In no time it could be the place to go for a quick routine to fix a problem.

Here are my snippets thus far:

They will even give you a feed for when I post new snippets. Unfortunately they don't have feeds for individual tags (e.g. Java) working yet. That's a shame and it's actually one of several glitches I've noticed. But even with the bugs this is still a more believable code snippet repository than many I've seen over the years. The tags go a long way towards making it really searchable and usable. But it will only be really useful if _everybody_ enters in some of their favorite utility functions and classes. So go sign up this minute and put in just three Java code snippets! It will take you less than 15 minutes and we will all benefit.

If there aren't any more improvements or he never releases the source code or it never gets full text search then maybe I'll move my snippets to another repository in the future. But it needs to be one with syntax highlighting, an easy framework for inserting the code in the first place (this is very very easy), and tagging.

February 24, 2006

Upgrading The Ant In Anthill

Like many, I use UrbanCode's Anthill for automated builds. It's easy to setup, easy to run, and it has the basic features I need even in its open source version. But recently I noticed that it was ignoring failed builds and at first I couldn't figure out why. Everything else seemed to be working just fine and even installing the new Anthill 1.8.1 release didn't fix the problem.

Just in case you run into the same problem, it was due to me installing a new version of Ant (1.6.5 in this case) and the ant.bat file, which is what Anthill runs under Windows XP, not passing through the error code to Anthill. This message from their mailing lists a few months ago includes a replacement ant.bat file you can use and it will fix your problem.

February 14, 2006

Quit Hardcoding Your *#%^ Database Connections!

Over and over I encounter Java code that hard codes database connection creation or resorts to pulling all of the connection parameters from a properties file bundled in with the .WAR or .JAR, or even a resource bundle (yes, that's you Kasai). QUIT IT!

There are these things in Java called data sources. They are an object you can go to and ask for a database connection, it hands it off, you use it to perform a query or two, you close it. That's it. If any database pooling, testing of new connections, etc. takes place it is hidden inside the data source and you don't have to think about it. In fact, because some data source implementions do things like pooling and testing of connections, they will probably work better than that five line version of "how to create a JDBC connection" you found online and pasted into your code.

Just about any type of server or framework you might choose to use around your application typically makes it easy to create data sources, specify all the details of how they connect to a database, and all you have to do is perform a couple of calls to get a data source. In the majority of your code, you don't even worry about the details of how the data source was obtained, you just accept it as one of the parameters to your class or to your method and you use it.

If your application isn't running inside a J2EE server, or Tomcat, or Spring, or some other place where a data source object is easy to come by, you can use the Jakarta Commons DBCP library to quickly create one of your own (hopefully configured using parameters that can be easily changed).

Here's an example of what I'm talking about that takes its configuration from system parameters:

BasicDataSource dataSource = new BasicDataSource();

dataSource.setDriverClassName(System.getProperty("driverClassName"));
dataSource.setUsername(System.getProperty("username"));
dataSource.setPassword(System.getProperty("password"));
dataSource.setUrl(System.getProperty("url"));

There you go, a data source that has caching and all kinds of other features you can tweak if you need them. Once you've got it you can call dataSource.getConnection() and out pops the connection ready to use. If you are running inside of Tomcat or WebLogic or something like that then you have even less of an excuse. In Tomcat for example, they provide a way to create data sources inside the Administration interface, you can just go to a web page, log in and create them. Retrieving one of the data sources you created in your code is this simple:

// Obtain our environment naming context
Context initCtx = new InitialContext();
Context envCtx = (Context) initCtx.lookup("java:comp/env");

// Look up our data source by the name we gave it when we created it. In this case that's "jdbc/EmployeeDB".
DataSource ds = (DataSource) envCtx.lookup("jdbc/EmployeeDB");

If you keep making everybody jump through hoops to set up your database connections when all we want to do is configure a data source, we'll just start rewriting your code. If you are writing a library to be used by other people, or an application that has to be configured by someone other than you, make it take a data source!

January 30, 2006

A Must Attend Conference

I usually don't get to attend conferences, but given the significance of Waterfall 2006 - International Conference on Sequential Development I'm positive I'll get to go.

Clearly their lineup of sessions, keynotes, and workshops eliminates the need for JavaOne, No Fluff, or any of the other big conferences.

January 26, 2006

Blogger Is A JSP Site's Friend Too

At work there was a desire to put news items onto an internal page of a web application our customers log into. After the request was made, there was talk of including a page supplied to us by our internal customer representatives, providing a form they could type news into, etc. Basically a variety of solutions that all involved us writing something and/or getting directly involved in posting news. Perhaps even teaching HTML to somebody who normally just dealt with numbers and people.

I felt that was just a waste of resources. Why not just use Blogger? It has a better interface than anything I'm going to throw together in a couple of days. It will generate a page and dump the result via FTP to another server (exactly what we needed). And it doesn't need my intervention to get the news up. Blogger is perfectly capable of generating a JSP page instead of an HTML page as its output, and if you dump it in a directory where the server (e.g. Tomcat) will notice that the original page has changed, the server can automatically recompile the page next time someone requests it and they will have updated news but any Java scriptlets on the page will still provide their dynamic output.

When you are setting up Blogger, just make sure it knows that the output file it will generate has a JSP extension, the template for your page will be a normal JSP page with whatever scriptlets and custom tags you normally use. You just have to include the Blogger tags in the section of the page where you want the news to appear. Then Blogger will replace its tags and only its tags with the HTML representing the latest news items, produce a JSP page ready for processing, and publish it to its final destination via FTP.

Definitely an easier solution and it can be setup in thirty minutes or less vs. trying to cobble together your own solution or setup some software like Movable Type or WordPress.

Google Homepage Development: Everything Old Is New Again

Since Google started their search engine schtick their front page has remained basically static. A logo, which might change with the season, mood, or to celebrate an occasion, a place to type your seach query and two buttons to perform your search and return the results or, if you were feeling "lucky", take you directly to the best match Google could find. Yesterday, that changed as Google moved a portal page they had been working on for some time to be the default for Google. If you already have a GMail account you can log in and you'll see your most recent messages, you can get stock quotes, news, local film listings, etc. Pretty much what you can do on Yahoo's portal and others.

The thing is, it's like deja vu all over again. It reminds me a great deal of the great Netscape Portal five or six years ago that had the same kind of box structure, similar content, but one thing most of us hadn't seen up to that point was the ability to put news from any random site on the portal as long as the site supported this new "RSS" thing. It was Netscape's portal support for RSS that prompted me to add automatic generation of RSS to the news system of GameDev.net and then I could have GameDev's news on my Netscape portal just like we were some bigshot Associated Press or something.

Google has resurrected that portal from half a decade ago, thrown in some fancy JavaScript to allow you to drag boxes from one place to another and to have it automatically update. There's nothing amazing there. Except that they have also provided the Google Homepage API that allows you to build your own modules that can pull XML from other sites, process that XML with JavaScript, and produce any HTML you need for the user's page. That's a far cry from the heavy restrictions imposed by Netscape on their portal. They would go and get the RSS from a channel for you, parse it and store the results and present that result to you in just one format.

Here you can go and build widgets of almost any sort provided you can find a way to represent the output in HTML form. They can also be interactive with the user, requesting input, allowing choices, and altering their behavior based on the input. So modules can perform searches, display maps, do calculations, or a great deal more. Thus the widgets have more in common with Yahoo! Widget Engine (nee Konfabulator) widgets than they do with the limited extensibility of My Yahoo!, which was, up till now the most configurable of the big portals. MSN.com and Netscape.com both seem to remain firmly stuck in the distant past and allow very little choice for what is on "your" page.

Now the question is, who will use this ability to create some cool modules that will make the Google homepage an improvement over the other big portals?

January 17, 2006

When Not To Use StringBuffer

The old adage about only having a hammer and thinking everything is a nail can apply just as easily to programming. If you blindly apply the rule that appending strings in Java is way more expensive than using a StringBuffer you can end up with some strange and arguably wrong results.

Take for example the PreparedStatements in a large piece of code I'm charged with working on. The SQL queries in this code are quite large and would stretch out to several hundred characters if they were not wrapped in some way within the code. So they were broken up into multiple strings, but the original author apparently feared the horrific overhead that String could impose and we ended up with a lot of code that looks like this.

StringBuffer selectStatement = new StringBuffer();
selectStatement.append("select blah, blah1, blah2, blah3, ");
selectStatement.append("blah4 from BLAH where );
selectStatement.append("...");

I did a really short one as an example but imagine this going on for 20+ lines because the individual field names are long, etc. And it's all completely pointless in this case. We aren't building a dynamic string that has changing elements in it. There aren't any other strings being passed into be appended, they are all just static strings containing text and question marks where the paramaters will later be put by the prepared statement. So it never changes!

In this case the StringBuffer is actually slower than just using plus signs where we had to break the lines and a String that the entire thing is assigned to. Any compiler worth its salt will take note that all the strings are static and append them into a single string at compile time without any .append() calls being needed. Plus all the extra function calls are just an obfuscation around the SQL query which you would like to have easily readable in the code. Note: The thought of moving the queries out of the code completely using something like iBatis hasn't even been touched yet. This is really straight up JDBC stuff here.

Please, think before you blindly follow rules. Why is the rule there? When does it make sense to break the rule?

December 20, 2005

Data Processing On A Huge Scale: Google's Story

Years ago, I naively thought that Google somehow had amazing machines and software that managed to do most everything in real-time even though the huge amounts of data they process pretty much preclude doing any such thing if I had bothered to think about it rationally. I imagined that they were processing each site they crawled as soon as they found it and into the search engine it went. Each news item from RSS was similarly fed straight into an index and made available immediately and no batch processing of reams of data was done.

Fortunately, such magical thinking has not persisted. Google does not use elves in a hollow tree to produce their results, they use intelligent engineers and many of the same tools available to you and me. They have developed all kinds of innovative solutions in order to be dealt with the huge amounts of data they have. Those solutions include:

  • Building a truly enormous array of commodity PCs on which they run Linux to handle the computing needs for all of Google. When individual computers fail, their software simply shifts the workload to other functional machines. Supposedly, they buy large quantities of parts in bulk and make their purchases in a variety of ways to avoid being gouged by vendors.

  • They created a distributed filesystem that spreads all files across hard drives on three separate machines in order to reduce the chance of failure causing loss of data.

  • Built software that makes it easy to handle machine failures, distribute computing tasks across a large number of CPUs, etc.

The best thing about all of this is that they haven't been particularly quiet about how they do a lot of it. For example, if you go to their Research Publications site you'll see papers about The Google File System and Web Search for a Planet: The Google Cluster Architecture.

Now, I'm not going to snow you on this, if you aren't of a technical bent, this stuff is going to be a hard boring slog. Michael Chabon it's not. But, if data analysis of truly ginormous data sets interests you, then you want to read their paper on MapReduce: Simplified Data Processing on Large Clusters [PDF].

It's all about how they split up many data analysis processing in such a way that it is easy to write the algorithm to process the data and not spend time worrying about hardware failures, how many machines you might be allocated to run your software, or how to optimally use those machines to get the data processed in the least amount of time. Instead, it forms a kind of support system that reminded me of using the genetic programming package JGAP. I'll talk in a future entry about how JGAP can make it easy to find optimal or near optimal solutions for problems that would be tedious or impossible for humans. But the important thing it did was to make it easy for me to focus on the specifics of my problem and not on the mechanics of a framework. MapReduce is one of Google's means to achieve that same kind of focus and I think it makes for a really interesting read.

The Java Nutch project includes a Java version of MapReduce and a distributed file system that you could use as part of your own huge data set processing so reading these articles isn't just an academic exercise. You can actually put this to use if you have a project that needs it. Be sure to check out the wiki for the Nutch project for more helpful information.

Jive Messenger Becomes Wildfire Server And Gets A Speed Boost

I recently mentioned that we installed Jive Messenger at work to get a good instant messaging server that we could control and which didn't result in important conversations leaving the building to talk to distant servers. In the time since I wrote that, Jive Messenger has been renamed to Wildfire Server and it has had a dramatic speed improvement. Jive Software: Wildfire Optimization is an article briefly detailing the optimization Jive Software did for the new version of the server and might be instructive if you haven't done optimization on a Java project before.

December 1, 2005

Creating Reusable Components Requires Extensive Experience

I've rewritten this entry because I was told by multiple people not the least of whom was my wife (a person who can actually write sentences that make sense) that an instant message chatlog between myself and Don Thorp didn't make for a readable weblog entry.

It all started when Don pointed me to this entry about what should and should not go into Ruby on Rails: http://www.loudthinking.com/arc/000407.html. The way he put it, it gave him heartburn.

I read it and agree. The position of the author is that higher level constructs have no place in Rails and that in general it's futile to try and construct higher level software components for websites. I consider that to be a big mistake. They are missing is that the underlying model of, for example, ZWNews and MovableType and Wordpress is fundamentally the same. Most blogging software, forums, comment software, link counting, polls, etc. can be reduced to a basic subset of features which are in just about any of the software that various sites are using. If there's that much commonality then you can produce that subset of functionality, offer a few simple hooks into it so it can be extended in a couple of places and it'll probably serve the needs of 80% of the people who are building sites.

I would argue that they make this mistake because:

  • They don't spend their time building one weblog after another or five or six forums in succession so they don't see the common elements which underly all of them. This is similar to the fallacy of so many game developers who start off by building a "reusable" sprite library or 3D engine before they begin their first game. They either fail completely or they succeed at building something but pronounce it impossible when they cannot then reuse it themselves later for another project or persuade anyone else to reuse it. It's a poor fit because they didn't grow the API based upon the needs of multiple projects, they instead attempted to divine the interface and capabilities based upon what they thought it would need.

  • They imagine both a model and a UI which goes with it. That never works. You can have a UI which is a starting point or an example but the focus of the code has to be on the model and the administration for that model.

Kasai is an excellent example of this. It's design is a good one for an authentication and authorization system which will serve the needs of 80% of all web applications. I know what I've needed in the past and I can look at it and assess whether this would have met the needs of a large number of sites I've worked on and the answer is yes. In fact it really could stand to be reduced or simplified in a few areas and it would still have served my needs.

I think what most people fail to see is what is really a reusable component in real life. In the world of IC components a digital micromirror is a reusable component. It's not a TV all by itself. It has no tuner, memory, light source, etc. etc., yet it's a reusable piece. It's going into all manner of TVs today, all different in subtle or large ways, and for all I know somebody is building a huge array of them for a high resolution video wall.

So when I switch to the world of software, specifically, web applications I should be able to identify reusable pieces that occur over and over again with variations. In the world of websites you frequently have comment systems that have the following characteristics. There are a large number of unique conversations. The conversations are not linked to each other. Each one is a straight linear series of comments. Each comment needs to be attributed to an individual which could easily be referenced via a unique ID. Each comment may have some numeric rating associated with it or a pointer to another set of properties. That is a reusable component. I can build it and you could drop it into the user ratings at the bottom of Amazon products or the file comments at Stock.Xchng or Fark or half a dozen other places and you wouldn't notice that it had changed. Even if the first generation of the component isn't a great fit and needs work (like Kasai) the second or third will be because it was adapted to the real needs of a lot of users.

It's good that there is a major emphasis on making the infrastructure solid in Rails but saying that there shouldn't be a set of libraries that go with it to provide some reusable components for counting links clicked on, comments, authentication/authorization, etc., etc. is like saying that Java would have been better off being just like C++. A language without its huge supplementary library. That library, a well designed one which provided pieces we use every day for things like collections, XML parsing, regex, etc. determined well what some "high-level" components were which could be widely reused. Rails doing the same would only strengthen the framework not weaken it.

November 18, 2005

An Easy To Setup Instant Messenger Server

We were communicating confidential information back and forth between each other using an instant messenger which regularly sent messages outside the building to servers we didn't control. To get it back to being an internal only thing, I setup Jive Software: Jive Messenger XMPP (Jabber) Server and had everybody connect to it with Jabber compatible clients like Gaim, Psi, and Trillian.

Setup was a breeze. I had it up and running in 30 minutes and was connecting our first machines to it. It has a nice admin console (accessible via browser) which I've barely had to use so far. It's open source, it's written in Java, it's easy to setup, what's not to like.

If you need instant messaging and you need to control your own server, I think you will be happy with this.

October 11, 2005

Role != Permission

We use Kasai for security on some applications at work now. I was responsible for the choice of Kasai and I recognize that it has some serious problems, in fact, I have a kind of love/hate relationship with it because of the way it is written and maintained. However, before I do an entry complaining about it, let me talk about why I love it and why most people don't seem to understand some aspects of security.

Let me say it simply. Roles are not equal to permissions. Way too many systems (like Tomcat) and way too many people I've worked with treat roles as though they were a valid form of permission control rather than a way to simplify permission grouping. Kasai does this correctly. It treats permissions as though they were the low level access controls that they should be. Every thing you want to control access to, perhaps down to the page or function call level, can each be a separate permission. Because that many permissions can quickly become unweildy they allow for permission groups and roles to group permissions at higher and higher levels of abstraction.

Let's take a real world analogy for an example. Let's say you had a business where you had a lot of cabinets to which you needed to control access. If you did things they way most people do you'd try to do that with as few keys as possible, one key handles the first five cabinets, the blue one is only for special cabinet 'A', etc. Then, if you suddenly have to shuffle around the contents of the cabinets or add somebody new who you only want to open some subset of cabinets for which no key combination exists, you are in a world of hurt because you may end up having to change the locks on each cabinet and get a whole new set of keys.

That's what most people do when it comes to using roles with security. They try to boil down security to a couple of keys which open a lot of locks. They create Admin, Editor, and User roles and "hope" that it will all work out. Then they code up their application (web or otherwise) to check for those roles and in effect, code the security into their application. If somebody comes along who needs to cross over a couple of roles (i.e. James is just a regular user in most respects but we trust him to review newly added forum posts to filter out the junk so he's like an editor in just that one respect) then you end up creating an all new role just for that one flavor of user (and modifying all affected pages), eventually, if carried far enough for enough users, roles become finer and finer grained and can approach being individual permissions again. Except that you won't have any roles or other higher level abstraction to group those permissions for easier application to the majority of users who aren't exceptions and fall nicely into the easy partitions you wanted in the first place. Every user will have to have 20 different "roles" to be able to function.

What you would ideally do is put a different lock on each of your cabinets and have a different key for each lock. Then, even if you rearrange the contents of the cabinets you can just collect all the keys and hand them back out again in the new combinations and your security is restored. Your cabinets didn't have to be modified and the only problem, at least in real life, is the proliferation of tons of keys to deal with. And in the computer world, we can use abstractions like roles and groups to gather together the most common arrangements of "keys" (permissions) that we will be applying to the majority of users.

Get yourself a real security system which has at least permissions and groups. Use permissions in a fine grained way to control access to individual functions. Use the higher level abstraction(s) to group the fine grained permissions into easy to apply units because most of your users will fall into easy to label categories. If you don't do this right though, it's the exceptions which will eat your lunch.

October 7, 2005

NetBeans 5.0 Looks Like A Big Improvement

I watched this Flash based presentation on the NetBeans 5.0 Beta over at JavaLobby and I was very pleased. I used NetBeans for many years as my IDE for Java development and I only moved to Eclipse because it had things like refactoring that I just wasn't getting from NetBeans.

NetBeans didn't just go stagnant in the face of a superior product though. Quite the contrary. They've been frantically adding new features and refactoring to improve speed and usability. Overall, what I saw in the above presentation and the last one on NetBeans 4.0 (where they shifted to Ant for all project management, hear that Eclipse?) is very very encouraging. With this kind of serious competition between two free IDEs I feel very lucky to be doing Java work.

Without a doubt, between Java and all the open source libraries and servers and tools available for it, I am more productive within a given unit of time than I have ever been in the 18 years I've been doing software development.

September 13, 2005

Java Eats Its Dead

I have the misfortune of working periodically on a legacy Java application which uses Java 1.1.8 (!!!) and BEA WebLogic 4.5. It was decided more than a year ago that trying to move this mess forward to a later version of Java and WebLogic was going to be as much trouble as simply rewriting it and trust me, it really needs to be written from scratch. The first attempt was a train wreck. Needless to say, this combination is a spectacular pain to work with because it lacks such niceties as the collection classes (a Java 1.2 thing), and it didn't use Log4J, Ant, or pretty much any of the things a Java developer might consider part and parcel of any application today.

I was working on some new code for this mess last week and today and I wanted to write some unit tests for the code I was about to write (good little "test-first" developer that I am) and I realized that while it was easy to get the latest JUnit, that when I looked around the JUnit website I couldn't even tell where I would get the older versions nor would I know which ones were compatible with older versions of Java when I did find them. In fact, scratching around various places I found that most Java library websites seem to forget about older versions of Java soon after they are gone. That doesn't really mean that they are gone, but they might as well be.

July 3, 2005

Screencasts of New Eclipse Features

One thing I've not seen mentioned elsewhere is that there are some screencasts covering new features that go with the new release. They aren't necessarily on the very first page you'll look at so here's a link to the page that has them: Eclipse 3.1 Releases

April 6, 2005

Podcasting Is The New Desktop Publishing

Desktop publishing meant that anybody could create books, magazines, etc. and if they had the talent they could make it look as good as a major publisher. The World Wide Web was much the same thing again except that it went beyond just the creation of material, it included a distribution method as well and we've all gotten to enjoy the results of that great experiment.

Podcasting has given us narrow cast radio for any topic and stars and shows that are so far removed from "radio" that I could almost dance for joy. Of course part of that might come from the fact that I live in a radio market with at least four Clear Channel stations in dominant positions on the dial and another channel that prides itself on how different it is but in truth it's just a station sans DJ run from a hard drive. It's like borrowing someone else's iPod except that they added commercials to their playlists.

I'm going to jabber on at lenth about podcasting in the future but I just wanted to put in a quick plug for Tim Shadel's Zdot podcast. It's all about professional development using Java and other tools like Subversion which are language agnostic. If you do development for a living it's well worth dipping into the archives for a listen. My only complaint is that sometimes he's not working from a tight set of notes or watching the clock so you could probably cut 5-10 minutes out of every show without any real loss of material. But that's easy for me to say, I'm not doing a show myself :)

"Je n'ai fait celle-ci plus longue que parce que je n'ai pas eu le loisir de la faire plus courte." I did this one (letter) longer only because I didn't have the time to make it shorter. - Blaise Pascal

January 6, 2005

Manning Has A Clue, Then Loses It

I was buying a book online. This one in fact: Lucene in Action and I thought that while I was at it I would check out what other titles Manning might be offering. I purchased an ebook of a Struts title from them a while back because I could get it immediately and because it was a simple PDF. No stupid site I have to access it through (raise your hand Safari/O'Reilly) and no *&#*ing "digital rights management" (DRM). DRM is a code-word for, "All the rights for us, none for you. But you can still pay us the same or more. OK?"

Unfortunately, the first book I looked at was another Struts title from Manning and it included this wonderful tidbit..."Ebook edition not available. (Due to excessive piracy of this type of book,
we will release it as an ebook when a new ebook protection mechanism becomes available in the coming months.)"

I have a better idea Manning. Don't bother. I cancelled my Safari subscription and I don't think I'm likely to start it up again. I'm not going to download any ebooks from you if they are crippled in any way. So you can save yourself time on looking for the DRM technology and just simply cease putting out books in ebook form at all.

November 30, 2004

A New Automated Build Engine For Java?

This is at least new to me. luntbuild appears to be a new entry into the field of automated build software for Java projects. I've only used the free version of AntHill but CruiseControl is a popular alternative.

Could anyone comment on luntbuild? It looks very nice and the support for subversion and more control over tagging of a build are appreciated. Anybody out there familiar with several of these who could give a comparative review?

October 15, 2004

Simple But Clever Java Server Trick

i-Technology Viewpoint: Laziness Sometimes Pays (SYS-CON) As the author of this piece says, it is very very common for server applications to write out the same file over and over again. If a given page already exists in exactly the form you are about to write then why write it again, end up changing the date on the file, and then cause the user to download it again? So he substitutes a specialized version of the OutputStream that compares the data you are writing to the file that already exists if there is one there. As soon as it notices a difference it begins to change the file, but should it never change, then it leaves the file untouched so the browsers on remote machines may skip downloading it if they have a copy cached.

Simple, but clever.

October 14, 2004

Easier Development Environment Setup

Back in April 2003 I mentioned a piece of software then called Out-of-the-Box. It installed a wide variety of open source development software with a particular emphasis on Java tools like Ant, JBoss, etc. The name has changed but OpenLogic is still selling updated versions of Out-of-the-Box, now rechristened BlueGlue for $200(US)/year. You can get a one month trial for free to see if it appeals to your company.

I still think it's neat software, but I find the price tag is going to put me off using it personally nor would I be likely to recommend it for most, not all, but most companies looking for easy ways to quickly build development environments and keep them up to date.

There is a new competitor though. MyJavaPack Home is open source software trying to fill basically the same niche. It does installation of lots of Java development tools and a few common open source tools that aren't just for Java (e.g. MySQL). It doesn't have as many different tools it can install nor does it offer to install example projects which use subsets of the other tools to confirm that installed everything correctly or to give you a quick starting point for your own work. But even without those, its $0 price tag and open source could make it a popular choice for people who want a quick and dirty solution to setting up a development environment (and it's more IT people and team leads than you may think).

I hope future versions of both packages emphasize installation of groups of software based on common sets you see in work. Ant, Log4J, etc. would always be installed but there could be a group for web applications that would include Tomcat and/or JBoss plus Spring, a web service group could have Axis and/or Apache XML-RPC in it, a graphical UI one could install the JGoodies Forms and L2FProd.com's Common Components. Toss in some sample apps or even better, some templates for applications using Megg and you've got a hell of a starter kit.

September 24, 2004

Make Some Noise!

As with many things that "lots of people say", the adage that Java is not or cannot be successful on the desktop is bullshit. Don't believe me? Take a gander at the top ten most downloaded programs on download.com in any given week. LimeWire is always in the top ten and usually in the top five. Azureus used to accompany it in the top ten but the version that was being distributed over there had had adware added to it by a third party so Azureus now recommends everyone get it from SourceForge instead. What's that I hear you say? A couple of titles in the top ten most downloaded applications doesn't make a desktop presence. Well, you know, you are right!

That's why Sun's decision to acquire the rights to the Watson software on Mac to make a Java version of it made for another app that could easily reach the top ten or twenty. But according to this weblog entry and this one here from another person who was working on it, that might not become available. Arrrgggh!

Fine. I'm no idiot. Sun has to prioritize and we know they can't do everything. They have to pick and choose what is the highest priority. But if they can't find the time to do this one themselves, ASK FOR HELP! Open source the thing and get some help finishing it. It has the potential to be another top twenty program when .NET has, well, none...

So, if what I'm saying has some resonance with you. If it makes sense. Talk about it. Write about it in your weblogs. Make enough noise to get some brief attention from Sun to the idea of giving it out for further development rather than letting it rot on a shelf.

September 12, 2004

Java Comic Readers Begin To Appear

To my way of thinking, Java is a natural for building a comic book reader. It's just viewing images and lots of people are going to want to do the same thing and use the same viewer on multiple platforms. So I started working on one a while back called FourColor. I got it to a point where you can actually read .CBZ and .CBR comic book files with it and in some ways (but only some) I like it better than CDisplay or Comical but I never released the code. I think it's definitely time to do so even if I don't do much more work on it for a while. Three other Java based readers have appeared in as many weeks and maybe someone so inspired can put bits and pieces of the four available together to come up with one really great reader.

Asparagino's Comic Viewer | java-gnome viewer for zipped comic scans is a little different because it uses GNOME for its UI. Apparently Swing wasn't good enough for something that only has a handful of controls on the screen, it was much better to pick a GUI that had limited availability.

Jomic is neat and despite the suggestion that you need Mac OS X on the front page (another person apparently missing the whole "cross platform" part of Java) I was able to run it successfully on Linux. I've not yet tried it on Windows though. It's nice that it handles two pages at once, it's not so nice that you have to do the installation by hand and that you have to install Java Advanced Imaging (JAI) just to run it.

CBViewer tries to outdo Jomic in strange requirements by requiring the not yet released Java 5 rather than the plain old mainstream Java everybody is likely to have on their machines. It supposedly works with Java 2 as well but after downloading it and trying it, it seems clear to me that you would have to recompile it to get it to work with Java 2. The provided binaries were compiled under Java 5.

I have no idea why you need either JAI or Java 5 for simply loading some JPG images and displaying them. FourColor seems to do just fine now without that. What all of these readers suffer from is a common problem that pretty much any Java program is going to face. The stupid, proprietary .RAR format has been used to compress many many comics. That's where the R comes from in .CBR files, .CBZ files use .ZIP compression. Because there is no library to handle .RAR files directly under Java, you have to have the UNRAR command installed in your path whether you run Linux, Windows, or Mac OS X for FourColor to work. How Jomic avoids the need for UNRAR on Mac OS X is something I haven't looked into yet. It's this requirement that keeps FourColor from being all it can be. Otherwise, it's simple Java Web Start installation would make it one of the simplest ways to get read a comic book file.

Anyway, since I haven't filled in anything on my project page for FourColor yet, here is a plain old .ZIP file with the source code for FourColor. Don't imagine that just because I have criticisms of the other three Java comic readers that that means I think mine is perfect. Far from it, just click the "more" link to read about what I think is wrong with FourColor and a multitude of features I think it needs to become a better reader. If you'd like to give it a quick try here is a Java Web Start link to try out the latest version.

Continue reading "Java Comic Readers Begin To Appear" »

September 9, 2004

A Possible Liberal RSS Parser And A Request

I haven't tried out Rome yet as a RSS parser, I'm still using Informa to handle all my parsing in HotSheet and my new project. I've been considering it though and if they add a liberal parser like is suggested here, P@ Sunglasses I think that would be a great feature.

You know another feature that would make a nice benefit to both the Informa and Rome parsers (and anyone else who wanted to use it). An ultra-liberal RSS channel locator like that described here. Anyone needing to pull RSS could then just say, "Type in the URL of the website and I'll look for RSS feeds." Then the function could go out and find the list of one or more RSS channels associated with that site and present them to the user for subscription. It doesn't really do parsing so it would be agnostic to the RSS parsing library it was paired with.

September 2, 2004

Time For Some Praise

It isn't immediately obvious if all you look at is the released files directory but jdic: JDIC - JDesktop Integration Components is chugging steadily along. I put the very first bug in for the software, which offers various packages to make it easy to integrate browsers, display tray icons, etc. and now there must be close to a hundred issues which have been put in for it. The great thing is that they have fixed a little more than half of them.

It still won't bring up a browser inside a window for me on Linux and that's very important. But all those fixed bugs mean that it's not going to turn into another abandoned project and when it is released that we can count on support and people striving to make it into a solid library of functionality that I would argue is much needed.

August 25, 2004

Java Web Start Improvements

I ran across this list a while back but I didn't remember all of them: Enhancements to Java Web Start Technology in J2SE 1.5.0

I really look forward to those improvements. Especially things like being able to associate my apps with specific data files and shortcuts for Web Start apps on Linux desktops.

August 24, 2004

Why Is It That eclipse Won't Print For Me?

Oh, that's right, because SWT just roxxors. Except when it doesn't as with this little glitch: Bug 24796 - DCR - No printing on Linux GTK

I develop on Linux and SWT doesn't have printing support under GTK. It kind of sucks when you want to print out a file and can't ever do it. However, my Swing apps continue to print just fine. Since the most recent update to this bug indicated that there was still quite a bit of work to go, I doubt I'll see my eclipse "Print..." menu item ungrayed before this bug hits its second birthday in October.

I know this sounds a little snarky, I'm just a little irritated.

July 24, 2004

Tonic Look And Feel

Here's a kind of nice replacement look and feel you can use in a Java application instead of Metal, it's called Tonic and there are lots of screenshots on the site to let you know what it looks like in action as well as the ubiquitous demo you can launch via Java Web Start to see it in action.

My (Wiki) Toolbox

Michael Gloegl has taken my toolbox entry from a few weeks ago and turned it into a big wiki page. I'll be adding a permanent link to it from my resources page and updating the wiki with my own additions and changes rather than continuing to keep an outline that is only of use to me.

July 16, 2004

Show Me Your Skeleton And I'll Show You Mine

Normally when I'm about to start a new Java project I go and get my skeleton project and make a copy of that to the correct directory name to get started. Maybe you call your skeleton something else, a template, a prototype, whatever but I'm curious if you have one. Mine consists of an already created directory structure, a build.xml file that serves as a good starting point, a handful of libraries that appear in 100% of all my apps (logging, unit testing, etc.), the shell of a Re