Nirmal Fernando's Blog: RelEx

Showing posts with label RelEx. Show all posts

Tuesday, December 21, 2010

Work done in the period of November 20th – December 3rd

Within this period I
managed to serialize the whole drools rules file as a Knowledge Base Object,
and observed a significant improvement (approximately 10s) but still far less
than the execution time of the current RelEx2Frame. Also that serialization
needed JVM stack size to be increased to 2MB. Still we felt that the
performance is not up to the requirement so we decided to split the drools
rules file according to 100 rules per file basis and Nisansa did that task.
Danaja came up with a design which is focused on applying concurrency and
parallelism for the RelEx2Frame system, and was accepted by all the members of
the team as the basic design which will be altered and improved after further
analysis. 

In the current RelEx2Frame
there is a significant limitation of the concepts or the words that are
detected. Statistical learning methods can be used to reduce this limitation.
One approach is to use an existing application and the other would be to
implement statistical learner. Google Sets [1] is one of the existing applications
that we are considering. During this period I have implemented an application
which accesses Google Sets and generate new set of words for a given
combination of words (<4). I have used an existing library called ‘XGoogle’
[2] written in Python programming language which provides an interface to
access Google Sets. Since I was not familiar with Python, I had to learn Python
and which I successfully managed to do. We will keep the results came out from
this application and will compare with the results from our statistical learner
to choose the most appropriate set of words.

Preparing the design
document was the major work that we had done during this period, since it was
due on 3rd December. All of us contributed to the design document in
several ways and I contributed by writing design constraints, design decisions
and designing rule learning component. Design constraints part involved
basically three sub sections, namely Operating Environment, End-user
Environment and Performance Requirements. Design decisions consisted with major
decisions, some of which were already taken and others yet to be taken. Programming
language selection, rule engine selection, caching knowledge bases, statistical
learning of concepts and selecting the best suited data mining algorithm were
the main design considerations discussed there.

Designing the rule learner
was the most challenging task to me. I read many documents [3-5] on existing
rule learners, existing rule induction algorithms, data mining techniques etc.  After considerable amount of literature survey
I came up with the architecture for the statistical rule learner using data
mining techniques, which will be altered and improved as it requires. Chamilka
reviewed it and made few suggestions.

We were successfully
managed to submit the design document on 3rd of December.

[1] “Google sets labs,”
[Online]. Available: http://labs.google.com/sets

[2] “XGoogle,” [Online].
Available: http://www.catonmat.net/blog/python-library-for-google-sets

[3] K. Mhashilkar. “Data
Mining Technology,” [Online]. Available: http://www.executionmih.com/data-mining/technology-architecture-application-frontend.php

[4] J. Grzymala-Busse, “Three strategies to rule
induction from data with numerical attributes,” presented at the International
Workshop on Rough Sets in Knowledge Discovery (RSKD 2003), associated with the
European Joint Conferences on Theory and Practice of Software 2003, Warsaw,
Poland, April 5–13, 2003. 

[5] “Rule Learner,” [Online]. Available: http://openrules.com/RuleLearner.htm

Work done in the period of 6th to 19th of November

During
this period my main task was to integrate the converted Drools Rules to the
RelEx source code. For that first I just created a Drools State-full Knowledge
session which creates the Knowledge Base based on the converted Drools Rules
file. At the first debugging time, I got hell of errors, which is some what
expected since we had not done any debugging of created Drools rules before
that moment. So I started to look into each and every error thrown. Few errors
among hundreds of errors are noted below.

Import
statement was missing from the drools rules file: I altered the RuleConverter
code such that it adds that. 

There
were lot of typos found in the RelEx2Frame hand written rule file, which
mislead our RuleConverter to generate malformed rules: I debugged and edited
the hand written rule file as needed.

Once I
made the Drools to create the KnowledgeBase successfully, I started to debug
the methods used inside the Drools rules which were implemented by Danaja and
Nisansa. They didn't have a way to debug there method earlier, so we expected
few bugs from those methods. I found few bugs in one of the methods (which is
the tricky method), and successfully fixed those bugs, and got the rules to
work.

Then I
found out that currently we were not outputting the given sentence related
output but only the rule. So I went ahead and implemented that functionality.
It needed few changes in the Drools rules file (i.e. RuleConverter) and also in
few methods in the RelEx2Frame. While doing this I observed that the 'then'
part of the rule is not get executed suddenly after the activation of a rule. I
had a discussion about this behaviour in Rules Users List
 mailing list and got to know the following:

“ In the
rete algorithm the agenda is a list of activated rules who's actions are
eligible to fire. The "first" one on the list is selected, it's
action is fired, and the agenda might change as a result. "First" is
in quotes because the agenda list is sorted by conflict resolution rules.”

So I
altered the code a bit such that it solves this issue as well.

I did few
tests with the new Rule Engine, and found out it takes ages (10-15mins) to
display the resulted frame outputs. I raised this issue in front of my group
mates and made the point that a possible approach of caching which I have seen
in few mailing list discussions. 

Further I
edited the existing RelEx build.xml file such that it adds necessary Drools
related executables to the classpath before compiling the source code.

We had
few discussions on preparing the design document as well, again Chamilka took
the lead and cooperated with others and divided separate parts among four of
us.

Tuesday, November 23, 2010

Work done in the period of October 23rd – November 5th

I updated our external supervisor Dr. Ben Goertzel about our current progress on 25th of October.

I communicated with my group mates and completed the initial version of the SRS on 26th October. As requested from the course coordinator I uploaded that version to the Moodle on 29th October. After the project meeting with course coordinator, we thought of altering our SRS a bit, and agreed to again look into parts did in the SRS by each one of us and come up with an improved version by 1st of November. As discussed I made few small changes to the parts I had done in the SRS, and send them to Chamilka, for formatting tasks on 31st of October.

Meantime I had few chats going on with the OpenCog developers community, mainly with Dr. Joel Pitt, Jerad, and Linas, at the #opencog IRC channel, for better understanding of our requirements. Since our project idea came from Dr. Ben Goertzel, I thought to have a chat with him to get clarify few problems. I had a google chat with him on 1st of November and the points discussed were following.

Why the existing RelEx2Frame code is kind of hacky?
What if the introduction of a standard rule engine degrades the performance?
Is there a corpus that can be used for testing purposes?
For final presentation purposes is it possible to use the virtual dogs developed by OpenCog?

That discussion was really worth, and I shared the chat log in our mailing list.

I did a bit research on the generated drools rules after converting hand written rules using the Rule Converter developed by Danaja and Nisansa. To get clarified few things which I came across from those converted rules, I contacted the Drools community through their mailing list on 31st of October. There I had a discussion with a Drools developer called Wolfgang Laun and figured out few important facts that we should condider.

In Drools rule eval() function is the least efficient way of formulating a condition; none of the optimizations will work this way.
eval() cannot be used on the RHS, after 'then'. It is a wrapper for general
boolean expressions, to be used as a constraint in a Pattern.

I shared these with my group mates and did necessary modifications.

In this period I had taken up another task which is to integrate RelEx with Drools, using minimum number of dependencies needed to get Drools to work. I integrated a test class given by Drools into the successfully configured RelEx Eclipse Project Folder, and built it and find out the missing libraries (JAR files) and added them to the Java Build Path of the RelEx project. Following were the minimum JAR files needed from the Drools 'bin' folder for a successful build.

drools-core-5.1.1.jar
drools-compiler-5.1.1.jar
drools-api-5.1.1.jar
lib/antlr-runtime-3.1.3.jar
lib/ecj-3.5.1.jar
lib/mvel2-2.0.16.jar
lib/xstream-1.3.1.jar

Sunday, October 31, 2010

SeMap - final year project (9th - 22nd October)

On 9^th of October I have created a new private mailing list in our sourceforge project account, nldex-devs, since that would be much easier platform for us to have our project discussions going on. I added all my group members to the mailing list and made them aware about it and requested them to always use this mailing list for project based discussions.

Around 13^th October we got few comments from our internal supervisor Dr. Shehan Perera, on our project proposal. We altered our proposal according to his suggestions.

I found few sample SRSs on 15^th October, from our seniors and looked at them and discussed with the group members. We figured out the necessary parts for our project’s requirement specification, and divided parts among each of us. I took up the sections of Operating Environment, User Documentation, Assumptions and Dependencies, and went through few resources to find out what I need to write. I put up the basic draft of my parts on 17^th October; in our mailing list so all my group members can review them. We planned to finish the SRS on 26^th October.

Throughout this period I continued to read on Drools documentation. I had downloaded Drools examples projects and ran them in Eclipse IDE and play around to get familiar with its behaviour. Also I looked at a rule creation method in Drools, called Domain Specific Language (DSL) and will be continued to look at them in coming days as well.

I tried to setup the RelEx in the Eclipse IDE since as it will make our lives easier when dealing with RelEx code. So I successfully setup the RelEx source code in Eclipse IDE. Following are the steps I followed.

· File --> New --> Java Project --> Create project from existing source --> specified the path to relex folder

· Now an eclipse project will be created based on the existing project

· Go to the project root in the Package Explorer in the IDE

· Right click on it --> properties

· Select "Java Build Path" --> Libraries --> add external jars, and add the following jar files and press ok.

· gate.jar

-download from http://sourceforge.net/projects/gate/files/gate/4.0/gate-4.0-build2752-BIN.zip/download

-gate.jar can be found inside the bin folder

· jwnl.jar

-download from http://sourceforge.net/projects/jwordnet/files/jwnl/JWNL%201.4/jwnl14-rc2.zip/download

-jar file can be found inside the extracted folder

· linkgrammar-4.7.0.jar

-download from http://www.abisource.com/projects/link-grammar/#download

-jar file can be found inside the extracted folder

· opennlp-tools-1.4.3.jar

-download from http://sourceforge.net/projects/opennlp/files/OpenNLP%20Tools/1.4.3/opennlp-tools-1.4.3.tgz/download

-Go to the path of the folder from a terminal and run "ant" command

-That will build the opennlp-tools-1.4.3.jar for you inside the "output" folder of the same directory.

· Go to Project and un-tick "Build Automatically" in the menu strip of the IDE.

· Then go to the project root in the Package Explorer in the IDE

· Right click on it --> Build Project

· This will build the RelEx source without any errors.

Tuesday, October 12, 2010

SeMap - final year project (25th Sept. - 8th Oct.)

I started from what I left earlier week, basically to setting up the environment for us to use RelEx. RelEx has lot of dependencies which are in turn needed some other dependencies. I gave a try to set up RelEx in a windows environment, using MinGW which is a minimalist development environment for native Microsoft Windows applications. RelEx developers preferred us to use linux environment, but they asked me to give a try on windows (since most of us wanted to use windows) which they had no experiences of building RelEx. In an IRC discussion I told them that if I got successful, I will come up with a “how-to-do” document which will help someone who is planned to use OpenCog on windows and they were appreciated that.

Link grammar parser is one of the dependencies of RelEx and I had a nightmare of building it in MinGW environment, since some libraries used by them are not there in the windows platform. So I tried to manually download some of the missing libraries after debugging the build, but was unsuccessful. Then Nisansa started to build link grammar using Visual C++, neither he got it to build without errors (I think he will continue his investigation). Without further wasting our time on just setting up the environment, we thought to move to Linux based OS. I already had some experiences of using Ubuntu, so I just re-booted and logged into Ubuntu. Amazing fact is that within 10 minutes I was able to build RelEx successfully! But I had no regret of trying to install RelEx in windows.

I came up with a set of steps which one should follow in order to get RelEx installed in Ubuntu. Here I list them.

Download link-grammar http://www.abisource.com/projects/link-grammar/#download
Extract link-grammar
Issue following commands in a terminal invoked in the place where you have link-grammar extracted. (eg: /media/OS/OtherNirmal/L4-S1/Project/link-grammar-4.7.0)

*sudo apt-get install build-essential

*./configure

*make

*sudo make install

*sudo ldconfig

*cd link-grammar

*./link-parser (this will run LGP, you can just verify that it is working)

*sudo cp linkgrammar-4.7.0.jar /usr/share/java

Install libgetopt-java package:
Install Wordnet:
Download JWNL from http://sourceforge.net/projects/jwordnet/files/jwnl/JWNL%201.4/jwnl14-rc2.zip/download and Extract it .

Go to the folder where jwnl.jar contains (in jwnl 1.4 it's inside extractedPlace/jwnl14-rc2) and issue following command to copy the jar into /usr/share/java
Go to http://wiki.bazaar.canonical.com/Download and download our VCS bazzar or simply open up a terminal and type

sudo apt-get install bzr

Go to the folder where you need the RelEx trunk to be in and issue following:

bzr branch lp:relex

;this will check out the relex source code

In relex source code there's a build.xml file. Open it up and change the following:

line 20:

pathelement location="${PREFIX}/linkgrammar-4.7.0.jar"

Save it!

Finally go to the folder which contains build.xml (relex) & issue:

*ant -Check whether the "Build Successful" . This will build the code.

*ant run -to run the relex

*ant test -to test the relex

Project proposal was due on 8^th of October so we had started are work on the proposal on 30^th September. We discussed and distributed parts of the project proposal among us. I had taken the parts of introduction and methodology. Danaja and Nisansa proof-read the separated parts of the proposal and Chamilka did the most of the formatting part. I did help them in possible ways that I can. I should note here that I was down with a contagious eye disease, from 6^th - 9^th of October where I was instructed to keep my eyes rest. But I participated in group chats in that period as well, to keep me in touch of the project. We submitted our proposal to Dr. Ben Goertzel; our external supervisor and also to Dr. Shehan; our internal supervisor to get their approvals on 5^th of October. Dr. Ben had reviewed our proposal within 24 hours and sent us few suggestions, and we all agreed to alter our proposal according to his amendments, and sent the revised proposal to Dr. Shehan. Unfortunately Dr. Shehan was attending a conference in US, so he was unable to sent us a reply.

Dr. Shantha had asked project groups to appoint a leader, and my project members appointed me as the group leader. I will try to do my best as the group leader and is desperate to make our project a success by getting contributions from all my passionate and highly talented group mates.

Saturday, October 2, 2010

Inception of an exciting time!

Ok, now I'm in my final year as an undergraduate, and this post is all about the 10 credits worth final year project of me. We've formed our group when we were in training, and my group mates are Nisansa de Silva, Chamilka Wijeratne and Danaja Maldeniya. Further this post carries what I did in first two or so weeks since the start of the semester.

During this period we went across an exciting time with the selection of a project idea and analysing the feasibility of ideas for a final year project in the area we are most interested i.e. Artificial Intelligence. We had few interesting ideas, some are posted in Moodle, and some ideas came up from my group mates. We had met with the lecturers and got to know about their expectations etc. about those ideas. But we decided to explore few ideas which are more interesting to us.

While I was thinking about a project idea, suddenly I remembered about an AI related open source project which I got familiar with during GSoC-2010, that is OpenCog. OpenCog is an open source Artificial General Intelligence framework, intended to one day express general intelligence at the human level and beyond. I did surf about OpenCog to get more familiar with it. Meanwhile I sent an email to the co-founder of OpenCog Dr. Ben Goertzel (CEO of Novamente LLC and Biomind LLC, CTO of Genescient Corp., Chairman of Humanity+, Advisor in Singularity University and Singularity Institute, Adjunct Professor of Cognitive Science, Xiamen University, China) and Dr. Joel Pitt who is a developer in OpenCog, by mentioning our interest to work under OpenCog and inquiring him possible project ideas with good research value. I got a really quick response from Dr. Ben with four possible categories namely NLP, machine learning, virtual embodiment and cognition, including brief descriptions. I shared Dr. Ben’s reply with my project members and went on a process of selecting the most interesting and doable two categories. We were able to select two categories i.e. NLP and machine learning and requested Dr. Ben to give us more detailed descriptions, at the same time I introduced other members of my team to Dr. Ben.

Dr. Ben was on a trip for couple of days, so we were asked to wait till he is back at home. Meanwhile I did few readings on OpenCog, NLGen and RelEx. On 26th I had a chat in gTalk with Dr. Ben and got to know that machine learning involves pretty hard core C++ programming including lots of templates and use of STL and Boost, since most of us were not familiar with C++, he suggested us to do a project in NLP which is more java based. Dr. Ben suggested three possible project ideas with brief descriptions on each, all were really interesting. After discussing with Vishaka Madam and Dr. Shehan we all agreed to do the project which is involved in improving the RelEx2Frame rule engine, which is used to identify the semantic relationships in English sentences. Dr. Shehan extended his support as our internal supervisor, while Dr. Ben Goertzel will support us as an external supervisor.

Dr. Ben provided us with the list of tasks that would be possible to do under the selected project, which will help us to come up with a comprehensive project proposal. Last few days I had looked into RelEx and RelEx2Frame, and tried to set up the environment to use RelEx. Also today I had a useful discussion on #opencog IRC channel with two OpenCog developers including Dr. Joel Pitt, and got very useful information about the project and they asked me to give the details of our group and our university etc. to mention us on the OpenCog Recap which is a fortnightly summary of stuff that is happening in the OpenCog community.

This is how things went since the inception of our project group.