Nirmal Fernando's Blog: December 2010

Bad smells in code ?? :O

I think it is really important that you identify the "bad smells" in a code that you have written, simply since they are not good to have.

There are number of such "bad smells" which have been recognized and expected, programmers to be aware of. You can check most of them from here.

I had a research on the internet for a Eclipse plug-in in order to identify so called "bad smells". Luckily I found this wonderful plug-in called "JDeodorant".

JDeodorant is capable of recognizing four main types of "bad smells" that can be found in your code namely God Class, Long Method, Type Checking and Feature Envy. I have created a screen cast on "how to use JDeodorant?" and here it is.

Hope this post made your attention towards possible "bad smells" in your code, and hopefully correcting them (Oh I forgot to tell, using JDeodorant you can correct most of these bad smells as well ).

Happy Coding!! :)

Work done in the period of November 20th – December 3rd

Within this period I
managed to serialize the whole drools rules file as a Knowledge Base Object,
and observed a significant improvement (approximately 10s) but still far less
than the execution time of the current RelEx2Frame. Also that serialization
needed JVM stack size to be increased to 2MB. Still we felt that the
performance is not up to the requirement so we decided to split the drools
rules file according to 100 rules per file basis and Nisansa did that task.
Danaja came up with a design which is focused on applying concurrency and
parallelism for the RelEx2Frame system, and was accepted by all the members of
the team as the basic design which will be altered and improved after further
analysis. 

In the current RelEx2Frame
there is a significant limitation of the concepts or the words that are
detected. Statistical learning methods can be used to reduce this limitation.
One approach is to use an existing application and the other would be to
implement statistical learner. Google Sets [1] is one of the existing applications
that we are considering. During this period I have implemented an application
which accesses Google Sets and generate new set of words for a given
combination of words (<4). I have used an existing library called ‘XGoogle’
[2] written in Python programming language which provides an interface to
access Google Sets. Since I was not familiar with Python, I had to learn Python
and which I successfully managed to do. We will keep the results came out from
this application and will compare with the results from our statistical learner
to choose the most appropriate set of words.

Preparing the design
document was the major work that we had done during this period, since it was
due on 3rd December. All of us contributed to the design document in
several ways and I contributed by writing design constraints, design decisions
and designing rule learning component. Design constraints part involved
basically three sub sections, namely Operating Environment, End-user
Environment and Performance Requirements. Design decisions consisted with major
decisions, some of which were already taken and others yet to be taken. Programming
language selection, rule engine selection, caching knowledge bases, statistical
learning of concepts and selecting the best suited data mining algorithm were
the main design considerations discussed there.

Designing the rule learner
was the most challenging task to me. I read many documents [3-5] on existing
rule learners, existing rule induction algorithms, data mining techniques etc.  After considerable amount of literature survey
I came up with the architecture for the statistical rule learner using data
mining techniques, which will be altered and improved as it requires. Chamilka
reviewed it and made few suggestions.

We were successfully
managed to submit the design document on 3rd of December.

[1] “Google sets labs,”
[Online]. Available: http://labs.google.com/sets

[2] “XGoogle,” [Online].
Available: http://www.catonmat.net/blog/python-library-for-google-sets

[3] K. Mhashilkar. “Data
Mining Technology,” [Online]. Available: http://www.executionmih.com/data-mining/technology-architecture-application-frontend.php

[4] J. Grzymala-Busse, “Three strategies to rule
induction from data with numerical attributes,” presented at the International
Workshop on Rough Sets in Knowledge Discovery (RSKD 2003), associated with the
European Joint Conferences on Theory and Practice of Software 2003, Warsaw,
Poland, April 5–13, 2003. 

[5] “Rule Learner,” [Online]. Available: http://openrules.com/RuleLearner.htm

Work done in the period of 6th to 19th of November

During
this period my main task was to integrate the converted Drools Rules to the
RelEx source code. For that first I just created a Drools State-full Knowledge
session which creates the Knowledge Base based on the converted Drools Rules
file. At the first debugging time, I got hell of errors, which is some what
expected since we had not done any debugging of created Drools rules before
that moment. So I started to look into each and every error thrown. Few errors
among hundreds of errors are noted below.

Import
statement was missing from the drools rules file: I altered the RuleConverter
code such that it adds that. 

There
were lot of typos found in the RelEx2Frame hand written rule file, which
mislead our RuleConverter to generate malformed rules: I debugged and edited
the hand written rule file as needed.

Once I
made the Drools to create the KnowledgeBase successfully, I started to debug
the methods used inside the Drools rules which were implemented by Danaja and
Nisansa. They didn't have a way to debug there method earlier, so we expected
few bugs from those methods. I found few bugs in one of the methods (which is
the tricky method), and successfully fixed those bugs, and got the rules to
work.

Then I
found out that currently we were not outputting the given sentence related
output but only the rule. So I went ahead and implemented that functionality.
It needed few changes in the Drools rules file (i.e. RuleConverter) and also in
few methods in the RelEx2Frame. While doing this I observed that the 'then'
part of the rule is not get executed suddenly after the activation of a rule. I
had a discussion about this behaviour in Rules Users List
 mailing list and got to know the following:

“ In the
rete algorithm the agenda is a list of activated rules who's actions are
eligible to fire. The "first" one on the list is selected, it's
action is fired, and the agenda might change as a result. "First" is
in quotes because the agenda list is sorted by conflict resolution rules.”

So I
altered the code a bit such that it solves this issue as well.

I did few
tests with the new Rule Engine, and found out it takes ages (10-15mins) to
display the resulted frame outputs. I raised this issue in front of my group
mates and made the point that a possible approach of caching which I have seen
in few mailing list discussions. 

Further I
edited the existing RelEx build.xml file such that it adds necessary Drools
related executables to the classpath before compiling the source code.

We had
few discussions on preparing the design document as well, again Chamilka took
the lead and cooperated with others and divided separate parts among four of
us.

Nirmal Fernando's Blog

Tuesday, December 21, 2010

Bad smells in code ?? :O

Work done in the period of November 20th – December 3rd

Work done in the period of 6th to 19th of November

Pages