Posts Tagged Sinhala

Sinhala Spell Checker for Firefox

Update : I’ve changed the addon words list to “UCSC/LTRL Sinhala Corpus Beta“. This provides much more accuracy. Updated version is in the addons site. I’ll combine both word lists in the next version.

Sinhala language has been used in computers for a long time. In the beginning, it was simple ASCII fonts, replacing the English glyphs with sinhala letters. However, sinhala unicode came to play around in 2004. Around that time, we built a search engine converting ASCII text to unicode, so that it can search sinhala text written in any font. Actually, that’s how Paradox Software started. There were few blocks with rendering the unicode fonts, and people weren’t exactly using them. However, those problems are solved with newer releases and sinhala unicode is extensively used today.

You might have seen that there is a english to sinhala dictionary developed by UCSC Language Lab. It’s released under GPL as a firefox addon. Today, I extracted the words from the addon database and built a spell checker for firefox.

Following python code is used to extract the words from sqlite database :

#!/usr/bin/env python
import sqlite3
import re

conn = sqlite3.connect('en-si.db')
c = conn.cursor()
c.execute("select * from dict")
out = file("words","w")
for row in c:
  words = re.split(r"[ |]",row[1])
  for i in words:
    out.write(i.encode("utf8")+"\n")
out.close()

After that simply running, “cat words | uniq | sort > words.sorted” produced a sorted uniq list of words. The “affixcompress” tool comes with hunspell generated the affix rules file and I’ve placed some rules to support some common mistakes.

Sinhala Spell Checker Screenshot 1 Sinhala Spell Checker Screenshot 2

Install the addon from here. Once after you installed, you can right click a textbox, enable spell checking and select Sinhala as the language.

(Don’t ask me how සොක්කා got recommended for මඤ්ඤොක්කා)

Download the final source here

Tags: ,

SCIM – Installing Wijesekara Sinhala Input in Ubuntu Edgy

Guys at the LKLUG has done a great job regarding the sinhala support for linux. Not only they designed a new open source font, they have also coded the rendering engine and an input method. To enable sinhala support just follow the guidelines in http://sinhala.linux.lk/.

However, the default input method is a phonetic based one which is really good if you have no experience in sinhala typing. But since I’m pretty much familiar with the wijesekara keyboard, I wanted to go for it. The SCIM itself doesn’t have a wijesekara input driver. First thing I had to do was installing m17n. I just apt-get the package and it worked. Then, I re logged into linux, there were plenty of languages in SCIM. So, I choose m17n-si-wijesekara. It’s fine but there were quite a lot of errors. So, I downloaded the development version of the keyboard from http://cvs.m17n.org/viewc…..b/si-wijesekera-preedit.mim. Then I copied that file to /usr/share/m17n/ and restarting the Xorg got me the nice wijesekara keyboard.

Here are some screenshots :

Chat with Nuwan

Chat with Chathu.. lol..

Tags: , , ,

Google.lk Sinhala Fonts

Yesterday, I was looking at my access logs (using webalizer) and found out that there are many hits from people who are searching for a sinhala font for google.lk. Google Sri Lanka is based sinhala Unicode fonts. For windows users, fonts can be downloaded from www.fonts.lk; Last year, it wasn’t fully supported – ‘rakaransaya’ and ‘yansaya’ didn’t render properly in firefox. I wonder whether they have made any progress since then. For linux users, get fonts from www.linux.lk. The sinhala support was debianized, so, if you are in debian environment, just a matter of adding the repositories and using apt-get or synaptic. It works really nicely.

Tags: ,

Live.com, Google and Sinhala

Since the beginning of this year, the most prestigious IT companies in the world focused their attention to one profitable large business, SEARCHING. A lot of companies were trying to be the best, but there was no clue, Google was the best. Few years ago, Microsoft was trying to overcome Google by their MSN searching but wasn’t able to make it success. Live.com

So, now they are trying to start over using the popular AJAX technology. Around two weeks ago Microsoft launched their AJAX based search engine, live.com which is a bit of threaten to google. Live.com isn’t only doing searching, but a web portal such as MSN and the most important thing is that you can customize your interface, and everything is AJAX based which means it’s really fast. Even though, it’s a portal, the interface is not overloaded with too much of information as in MSN. With the power of AJAX they have made it really descent and professional. The most amazing thing is that it is not only for IE, it is working fine in firefox under linux :-)

But when we look at the search content and the quality of the search, Google is still ahead. As most people think, the search content might be a matter of time, but we have to remember that Microsoft is a large IT company which mainly focuses at their software, not in search, but the heart of google is searching and they are the first guys who launched a popular AJAX based service, GMAIL. So, their experience is, of course better than Microsoft. Even though most of the people don’t use, Google itself support customizable interface(Google News, etc).

Google Sinhala As Sri Lankans, we saw a big step in google, they launched google.lk and their interface using Unicode. Earlier it was using English letters.But still google doesn’t support searching inside sinhala Unicode range. When we look at the live.com, there is no sinhala interface but the important thing is you can search using the sinhala Unicode characters.

I wonder why google doesn’t support sinhala Unicode characters, whenever it is not hard to implement. They are supporting almost every other Unicode character sets.

But still, no one of those giants don’t support ASCII sinhala searching using Unicode, so, still sinhalasearch.com rocks!!!

Tags: , ,

Google.lk and Sinhalasearch.com

Last week, when I was trying to access google.com, there was a link to google sri lanka, of course, it is google.lk . They have bought .lk domain and they have integrated the sinhala interface into it. Anyway, the sinhala interface was bit old, and seems like they have managed to translate a lot of words to sinhala, in sinhala letters. Still, there are few words to be converted to sinhala letters(They are using english letters for those words). After all, they still don’t support sinhala searching. I think their indexer is ignoring the sinhala unicode characters. So, doing query in sinhala unicode characters won’t work out.

When they have started the google.lk, one of my friends told me that google will support sinhala searching soon, and sinhalasearch will be useless after that. No! It’ll not. Let’s think google supports sinhala searching. So, then people will be able to search in sinhala unicode. But the problem is, it only allows you to search pages in unicode font. Still, sinhala unicode is not popular. Lot of sinhala sites in the internet, use fonts like ‘kapuradotcom’,'aKandyNew’, etc. So, people will get only very few results.
So, in sinhalasearch.com we are converting all those fonts into the unicode font and then index the pages. So, query using sinhala unicode in the sinhalasearch will give you much more results.

Except that, lot of people complain that when they are browsing sinhala pages, they have to download different fonts for different sites. For example, let’s say you are reading lakbima newspaper, then you have to download their font. And of course when you’re reading divaina you have to download their font. But if you are using sinhalasearch you can easily overcome that. Since, we are converting all those pages into one common font, sinhala unicode, people can read everything without bothering downloading the fonts. They only have to download the unicode font.

Try sinhalasearch : www.sinhalasearch.com

Tags: ,

Get Adobe Flash playerPlugin by wpburn.com wordpress themes