Archive for the ‘Google’ Category

Google Charts

Thursday, January 31st, 2008

There are handful feasible flash based charting solutions, and they just work really great - with interaction. But unfortunately most of those are not open source, nor free. Few months ago(may be it has been there for longer?), google released a chart api which can generate a PNG image based on your data.

The charts API very easy to use, and everything is encoded in the URL. So, you can either download the image and host it on your servers, or rather use that in the web page itself without any troubles at all.

Yesterday, I gave it a try. I got some of my apache logs, parsed them, and used google APIs to draw the charts. The results look good.

Bar Chart

The log files are too big to parse quickly and contains lot of junk data. So, I wrote the code so that it only takes the “200 OK”, and ignores “js,css,png,jpg,gif” requests. Here is the php code.

	$ignored = array('css','js','png','jpg','gif');

	$vals = array();

	$f = fopen("/var/log/apache2/access.log","r");
	while (!feof($f)){
		$line = fgets($f);
		preg_match("/(.*) - - \[(.*)\] \”GET (.*) HTTP\/1.1\” 200/”,$line,$matches);
		// If it’s not 200 OK - just ignore it
		if (empty($matches))
			continue;
		$path = pathinfo($matches[3]);
		if (in_array($path["extension"],$ignored))
			continue;
		$time = strtotime($matches[2]);
		$month = date(”M/y”,$time);
		$vals[$month]++;
	}

	function text_encode($vals,$max) {
		$tvals = array();
		foreach($vals as $val) {
			$tvals[] = round($val/$max*100,1);
		}
		return implode($tvals,”,”);
	}

	function get_y_axis($vals) {
		$max = max($vals)+3000;
		$step = $max/5.0;
		$temp = array();
		for($i=0;$i<5;$i++)
			$temp[] = round($i*$step);
		$temp[] = $max;
		return implode($temp,"|");
	}

	echo "http://chart.apis.google.com/chart?chs=500x300&chd=t:"
			.urlencode(text_encode($vals,max($vals)+3000)).
			"&cht=bvg&chbh=50&chxt=x,y&chxl=0:|".
			urlencode(implode(array_keys($vals),'|')).
			"|1:|".urlencode(get_y_axis($vals))."\n";

Even though this contains code for ommiting unrelated results, it takes too much time. So, you can use the piping to create a simple log file, which is reasonably speed than running this through the whole log file. For example I use this command to create a smaller version of the log file.

cat sandaru1_log | grep 'HTTP/1.1" 200' | grep -v '.css HTTP'
	| grep -v '.png HTTP' | grep -v '.js HTTP'
	| grep -v '.jpg HTTP' | grep -v '.gif HTTP' > log

Google Charts also provides ability to generate pie charts. It’s easier to use a pie chart for browser percentages.

Pie Chart

Here is the code :

	$vals = array();

	function parseUserAgent($ua)
  	{

    	$userAgent = array();
 		$agent = $ua;
    	$products = array();

		$pattern  = "([^/[:space:]]*)” . “(/([^[:space:]]*))?”
		.”([[:space:]]*\[[a-zA-Z][a-zA-Z]\])?” . “[[:space:]]*”
		.”(\\((([^()]|(\\([^()]*\\)))*)\\))?” . “[[:space:]]*”;

		while( strlen($agent) > 0 )
		{
			if ($l = ereg($pattern, $agent, $a))
			{
				// product, version, comment
				array_push($products, array($a[1],    // Product
                                        $a[3],    // Version
                                        $a[6]));  // Comment
				$agent = substr($agent, $l);
			}
			else
			{
				$agent = “”;
			}
		}

		// Directly catch these
		foreach($products as $product)
		{
			switch($product[0])
			{
				case ‘Firefox’:
				case ‘Netscape’:
				case ‘Safari’:
				case ‘Camino’:
				case ‘Mosaic’:
				case ‘Galeon’:
				case ‘Opera’:
					$userAgent[0] = $product[0];
					$userAgent[1] = $product[1];
					break;
			}
		}

		if (count($userAgent) == 0)
		{
			// Mozilla compatible (MSIE, konqueror, etc)
			if ($products[0][0] == ‘Mozilla’ &&
            	!strncmp($products[0][2], ‘compatible;’, 11))
			{
				$userAgent = array();
				if ($cl = ereg(”compatible; ([^ ]*)[ /]([^;]*).*”,
                           $products[0][2], $ca))
				{
					$userAgent[0] = $ca[1];
					$userAgent[1] = $ca[2];
				}
				else
				{
					$userAgent[0] = $products[0][0];
					$userAgent[1] = $products[0][1];
				}
			}
			else
			{
				$userAgent = array();
				$userAgent[0] = $products[0][0];
				$userAgent[1] = $products[0][1];
			}
		}

		if (strstr($userAgent[1],”http:/”))
			$userAgent[1] = “”;

		return $userAgent[0]
			.($userAgent[0]==”"||$userAgent[1]==”"?”":” “)
			.$userAgent[1];
	}

	$f = fopen(”log”,”r”);
	while (!feof($f)){
		$line = fgets($f);
		preg_match(”/([\d.]+).* [^ ] [^ ] \[(.*?)\] (.*?) (.*) (.*)”
				.” (\d+) ([^ ]+) (.*?) \”(.*?)\”/”,
					$line,$matches);
		$bot = parseUserAgent($matches[9]);
		$bot = preg_replace(”/Firefox 2.*/”,”Firefox 2″,$bot);
		$bot = preg_replace(”/Firefox 1.*/”,”Firefox 1″,$bot);
		$vals[$bot]++;
	}

	$others = 0;

	foreach($vals as $key => $val)
		if ($val<500) {
			$others += $val;
			unset($vals[$key]);
		}

	$vals['Others'] = $others;
	$vals['Unknown'] = $vals['-'];
	unset($vals['-']);

	$lables = array();
	function text_encode($vals,$sum) {
		global $lables;
		$tvals = array();
		foreach($vals as $key => $val) {
			$tvals[] = round($val/$sum*100,1);
			$lables[] = $key.” (”.round($val/$sum*100,2).” %)”;
		}
		return implode($tvals,’,');
	}

	echo “http://chart.apis.google.com/chart?cht=p&chd=t:”.
			urlencode(text_encode($vals,array_sum($vals))).
			“&chs=700×400&chl=”.
			urlencode(implode($lables,”|”)).”\n”;

This code will get the user agent, parse it and generate a pie chart. The user agent parse function is by dotvoid.com.

Create an automated bot to crawl gmail inbox

Wednesday, September 19th, 2007

There are certain scenarios that you might want to write a script to access gmail inbox automatically. One possible way to do that is to use this great project, gmail-lite.

But if you simply just want to access only the inbox, did you know there is a RSS feed which can do this? This method is more simple than using an automated bot. You can authenticate using HTTP authentication and grab the contents of the inbox. The feed url is https://mail.google.com/mail/feed/atom.

If you are using a scripting language like PHP, you can use curl extension. curl has several other language bindings too. Another possible way to do this is to execute external application(in background of course) such as wget.

GMail Video

Wednesday, August 29th, 2007

As you might have already noticed, google has been doing a video clip showing that lot of people around the world are using gmail. Even though the video seems like entirely marketing idea, it shows how many people love gmail. And after all, the whole video idea is pretty much creative.

Google Apps - Email

Tuesday, July 24th, 2007

I have been using paradox server to handle my blog(www.sandaru1.com) and emails(AT gunathilake.com) for more than a year now. However, due to several reasons our server went offline by a timely manner. The problem is fixed now but the xmail configuration is not perfect either.

So, I decided to go for google apps. I have used that for about a week now, and so far the only problem I got was, since the mailing account is new some important mails are marked as spam. However, google really learns fast(or it has some hidden filters for each user) and by clicking on “spam” and “not spam” buttons managed to fix the problem.

So, overall the system seems pretty good and handles with out any errors. Another big advantage of the system is it has superb gmail interface.

How to find free mp3 using google

Wednesday, July 4th, 2007

Most probably, you are using file sharing apps(Limewire, Bearshare, even Bittorrent) to download music. But do you know there are thousand of free mp3 hosted on internet, and those are directly accessible by normal browsers?

Sometimes, people upload mp3 files to their web servers thinking that no one will find them. But when someone enters the url of the folder which contains the music, the web server uses directly listing to show the files in that folder (Directory listing can be turned off).

Basically, if you can find some web servers with mp3 files, you can download those. The problem is finding those. You can use google advance queries to find them. The title of apache directly listing starts with the phrase “Index of”. So, you can use google to search pages with “Index of” in title. Then, you need “mp3″ files. So, just append mp3 to the query. Lets say you want Beatles. Then, append Beatles. Here is an example : intitle:”Index of” mp3 Beatles

There is a possibility that some of those url might not work. But keep on searching, there are lots of working urls.

When there is a lot of sub directories and files, you might want to get a list of urls. So, i wrote a simple PHP script. You can execute this script in command line. I have put some sample URLs. The sample pages given there will generate more than 1000 direct links for mp3s.

<?
	$stderr = fopen('php://stderr', 'w');
	set_time_limit(0);
	error_reporting(0);

	$urls[] = "http://www.mcgees.org/mp3/pearl_jam/";
	$urls[] = "http://www.xieish.net/Collective%20Soul/";
	$urls[] = "http://www.asilentflute.com/mp3/";
	$urls[] = "http://www.semret.org/music/";
	$urls[] = "http://www.koreangirlssuck.com/emotion/mp3/";
	$urls[] = "http://www.vrees.net/mp3/";
	$urls[] = "http://www.webpiri.net/Mp3/";
	$urls[] = "http://pierre33200.free.fr/Music/";

	$done = array();

	for($i=0;$i<count($urls);$i++) {
		$url = $urls[$i];
		$done[strtolower($url)] = true;
		$temp = parse_url($url);
		$path = pathinfo($temp['path']);
		$domain = $temp['host'];

		if ($path['extension']!="") {
			if (strtolower($path['extension'])=="mp3"
				|| strtolower($path['extension'])=="wma") {
				echo $url."\n";
				continue;
			} else {
				fwrite($stderr,"Escaping $url : Not mp3\n");
				continue;
			}
		}

		fwrite($stderr,"Proccessing $url\n");

		$html = file_get_contents($url);

		$direct = preg_match("/index of/i",$html);
		if ($direct==false) {
			fwrite($stderr,"Error : Not a directy index\n");
			continue;
		}

		$count = preg_match_all("/<a href=\"(.*?)\">.*?<\/a>/i",
				$html,$matches);
		foreach ($matches[1] as $match) {
			// Ignore the pages link to same url
			if ($match[0] == "?")
				continue;
			if (substr($match,0,7)=="http://")
				$cur = $match;
			else if ($match[0] == "/")
				$cur = "http://".$domain.$match;
			else
				$cur = $url.$match;
			if (!isset($done[strtolower($cur)]))
				$urls[]=$cur;
		}
	}
	fclose($stderr);
?>

Save the above script(”mp3.php”), then execute it(”php mp3.php > urls.txt”). Then, it will show you what it’s doing and all the urls will be written to “urls.txt”. If you are in linux, you can use “wget -i urls.txt” to download the songs. If you are in windows, download Free Download Manager and use File -> Import List of Downloads.

You can also download the list of generated links.