Monday, October 29, 2007

Revolutionary step towards IR using Java – 28th October 2007

I did coding using java to clean, to detect errors, to extract frequencies and to generate distinct word list file by file of my editorial articles corpus. I was quite interesting to see the results

New trend of Data Collection – Editorial Articles – 24th October 2007

I have collected some texts from Sunday Lankadeepa online, specially the weekly political review and weekly security status as data of my Mphil research. But today I realized that, those data is not suitable for such a work, because it is too narrow for a specific topic. Then I desided to collect editorial articles of online news papers and stated downloading Lankadeepa editorial articles from their online archives.

New Turn towards IR – 23rd October 2007

Today my research colleague Mr Dulip (From Cambridge) has sent me the steps of Information Retrieval process in brief. I was excited with it and then decided to follow those steps soon with my collected texts. Reply him with lot of thanks and promised him to send dome results soon.

Tuesday, September 18, 2007

Met the Supervisor and Change the Direction – 18th September 2007

Today I met my supervisor, Dr Ruvan Weerasinghe and we discussed the possible paths to carry out my research. We discussed the possibilities of implementing Banko’s approach which he described on his paper, “Headline Generation Based on Statistical Translation” (Banko, M., Mittal, V.O., and Witbrock, M.J. 2000).
Then due to lack of pre summarized text for the training, we decided to model an extractor, which can extract summaries from a source. It was decided to use basic surface level approaches to rank the sentences and weight them. Then the summary will be evaluated using the question answering approach. So first I need to find set of data and made questions from those by humans. Then the summary can be evaluated using that data.
Dr Ruvan gave me the book, “Text Mining Application Programming” by Manu Konchady and I read two chapters of it which explained the methods of information extraction and Summarization.

Friday, June 29, 2007

Submitted the half year progress report – 29th June 2007

Today I filled Half Yearly Progress Report of MPhil Research with including my current status of research and got signed by Dr Ruvan (The Supervisor) and submitted to the administration division.

Under “Work plan for the next six months period” category,
I mentioned the following
“It is hope to apply the Headline Generation techniques which was discussed at “Headline Generation Based on Statistical Translation”, Banko, M., Mittal, V.O., and Witbrock, M.J. 2000, to Sinhala and evaluate the results accordingly.”

I changed the topic as “Head Line Generator for Sinhala”, since Automatic Text Summarization is a more general concept and gave the expected date of submission as November 2007.

My supervisor asked me to do some publications since it is very important to my future carrier.

Sunday, May 13, 2007

Prologue - 14th May 2007

Hi all,
I am doing a research for “Automatic Text Summarization for Sinhala” under field of Natural Language Processing. I have registered as an MPhil student at University of Colombo School of Computing, Colombo Sri Lanka.
I created this blog today to note down my ideas and problems which can be occurred while I am reading for my research!