August 16, 2007

Yahoo Search Suggestions for MySQL

Posted by peter

I noticed today if you go to Yahoo.com and type MySQL to Search field it gives you suggestions for MySQL, which are:

* loading javascript arrays with mysql data
* mysql performance blog
* mysql download
* mysql administrator download
* mysql front download

It may not be best search suggestions but it is cool to see our blog name to make it in the top of the list.
Too bad few Geeks use Yahoo for search so most of our search engine traffic comes from Google.

August 11, 2007

SpyLOG Was sold the other day, time to look back

Posted by peter

Friends are pointing me to the article saying SpyLOG, the startup which I co-founded back in 1999 was sold the other day to the MasterHost. The amount is not disclosed but it is estimated to be $3M - amount not worth mentioning for USA market but quite decent one for Russian Internet Market.

So I guess after all this project with not easy destiny can be called successful.

It is especially Interesting another month we announced our own startup project ClickAider which operates in related market. This deal reassures us there are money to be earned in the area.

In general looking back to my years spent with SpyLOG (1999-2002) I find them to be stressful but very rewarding in terms of knowledge and experience. We had great Development team with our team members later going to play important roles in projects like Begun.RU, Mamba.RU being leaders in the Russian market in respected area. Konstantin Osipov who worked for SpyLOG for number of years is now Dev Lead for MySQL, Dmitry Lenev who was CTO after my departure is also Engineer for MySQL.

It is with SpyLOG I learned a great deal about MySQL, LAMP and building scalable architectures in general. We were one of the first users to start using Innodb and instantly at TB scale. We found massive amount of MySQL and Innodb bugs and at that time Monty were reading bugs lists himself so we had those fixed quickly.

Looking back at SpyLOG architecture I would not have done many things the same way again. True now LAMP technologies are much better developed and hardware is much faster but some things were just done the way they were done due to lack of experience. I favored too complex solutions a lot, partially due to my formal Computer Science education which demanded 100% correct results while were were good enough alternatives. I also probably micromanaged too much and tried to get going too many projects at the same time. A lot of these projects never saw the sun light or were closed after beta testing (sometimes for political rather than technical reasons)

I also was too hard on people many times, fired few guys I should not have been and dealt with few others in a ways I’m not proud right now. I also should have shared more company details with our employees. Though that was consistent with Russian traditions of that time and it was a great contrast for me to end up at MySQL with Tom Basil as my boss. It was great experience to be in employee skin and to see how you can be treated and how you would like to be treated.

The other interesting discovery for me was to find out how much you can do just by sales and marketing. 5 years after my departure SpyLOG did not have any significant changes in the architecture and only few simple functionality extensions designed (some thing were in works when I left) but still by Sales, Marketing and Partnerships SpyLOG was able to get more revenue than any other company does on the same market.

It also changed my opinion on software survival. Even though we did not have the best docs ever and all original software authors and original admins are long gone it could still be kept more or less operational all this time. So software authors are not as irreplaceable as it may sound. It costs to replace them but it can be done.

What also interests me is why MasterHost would buy SpyLOG. Perhaps I should ask. The official explanation to provide extra service for hosting customers for differentiation purposes does not sounds enough for me. I would expect it is massive SpyLOG coverage what they are after which allows to get a lot of intelligence on the sites and market areas out where. This information can be used for many things including global visitor tracking and selling targeted advertisements based on users visitor profile.

July 10, 2007

Silicon Valley Onsite consulting anyone ?

Posted by peter

Last time I was in Silicon Valley in April after MySQL Users Conference, this time I’m planning to spend July 30 - August 2nd in Silicon Valley after OSCON visiting friends and customers. If you’re located in Silicon Valley or San Francisco area and interested in some onsite MySQL Consulting I can offer 1 day and half day visits during these days with no added cost (meaning you just pay for consulting time, and Hotel and Travel is on me)

This time could be well spent to have mini-training on MySQL high performance design operations or development practices, to look into your application architecture and check it against growth requirements or simply look into performance challenges you might have.

July 9, 2007

ClickAider - Track Adsense Clicks and much more

Posted by peter

Let me announce ClickAider - another projects we were working on in stealth mode for last several Months.

ClickAider is Hosted Web Statistics system but it tracks Clicks rather than page views as most web counters do. And by clicks I mean not just clicks on the urls and images but clicks on many sophisticated advertisement systems - Google Adsense, Yahoo Publishers Network, AdBrite, AuctionAds and few others, plus ClickAider also can track form submissions.

The Click Tracking is done non-intrusive way, without Advertisers JavaScript modification so typically it is compatible with advertisers terms of service.

Why did we decide to do it ? Most PPC Advertisers we worked with have very poor statistics about clicks and we wanted to know much more. We wanted to know which search engine keyword bring traffic which clicks, which countries are clicking as well as where people go to from your web site (may be you’re sending traffic to your worst competitor and just have not noticed it).

ClickAider can do all of this and much more - you can drill down to all details available about every click as well as filter all reports by any fields. For example you can view information about clicks for users which came to you from particular referring domain or see from which countries people clicked on given page URL come from.

This flexibility with filtering of course comes at cost, meaning all reports have to be dynamically generated.

The fact we’re working with clicks rather than page views means it can work for reasonably large sites, and we’re working on technologies which should allow us to bring this limit much higher.

The fact hardware have improved over recent years is also quite helpful. When I was designing another Web Statistics System - SpyLOG back in 1999 you could hardly do much of real time aggregation and we had to store aggregated data for all stats.

Compared to SpyLOG we also significantly simplified architecture based on experience gained during the years and simply because not having so many resources we have to innovate and keep things simple but powerful way.

If you’re interested what is under the hood - it is typical LAMP application at this point. We use Scale-Out MySQL architecture
based on MySQL 5.1 with some partitioning and mostly Innodb storage engines. Initially we tried to use PBXT for log storage but it had some stability and performance gotchas so we decided to give it some time to settle.

We use ClickAider on bunch of our own projects and most people we invited as beta testers during our close beta testing period think it is quite cool even though still has some rough edges.

Now we’re finally open and pleased to invite you to try it out, tell your friends to try it out and report us all bugs and suggestions you may have.

March 15, 2007

Box for some tests anyone ?

Posted by peter

We’d like to test few things in regards to MySQL and Innodb scalability with multiple CPUs but we seems to be short of boxes right now, all stuff we have access to is in production right now which makes it not good for benchmark.

Could anyone lend us access to the box with at least 4 cores running Linux or Solaris ?

March 13, 2007

Mail clients and Databases

Posted by peter

I get a lot of mail and I prefer to store it for long time if not forever. With modern hard disk sizes it should not be problem at all, but because of how mailing programs are written it causes a lot of problems.

I’ve tried a lot of programs - Kmail, Evolution, Thunderbird on Linux, Outlook and The Bat! on Windows and they all seems to have the same problem - it is some kind of assumed mail messages, or at least some portion of them will fit in memory.

At this point for example I got tired of Thunderbird handling my 1GB inbox (In fact my Inbox holds less than 1000 of emails rests are “Deleted” but Thunderbird still keeps it in the same file) so I decided to move some 70.000 of messages to specially created “archive”
Folder. This makes Thunderbird to consume about 2GB of memory and I’m not sure if it will be able to complete operation at all as it is already running low on virtual memory.

This is not only my problem with these systems. Second one is crash recovery - in case of corruption due to power down or lack of disk space I see index rebuilt being done which is far from enjoyable on large data sizes.

So what always was interesting to me - why these mainstream solutions do not use some form of databases which both would handle problem of recovery and memory consumptions as databases usually are designed to handle large data sizes with limited amount of memory. MySQL in its embedded version could be cool but if not there are bunch of others such as BDB, SQLite, even JET if we count Microsoft solutions.

Seriously the only part you really need to have in memory to be able to quickly show list of messages sort them etc is message subject authors and few more fields from the header - it is no more than 200 bytes per message which should allow handling folders with 1.000.000 of messages with something like 200MB of memory.

Interesting enough if we look at hosted solutions there are some with database backend such as Zimbra or DBMail.

January 16, 2007

Some of our articles are translated to Russian

Posted by peter

A friend pointed out to me number of our our articles were translated and included in PHPInside.RU - electronic magazine about PHP and surrounding technologies which of course include MySQL. You can download PDF for free right here.

We love our articles being translated and or republished, as long as they are available for free same as original articles and as long as you give us a credit for being authors.

January 11, 2007

Looking for someone with Chinese knowledge

Posted by peter

We’re looking to implement CJK Support in Open Source Full Text search engine Sphinx .
Initially we’re thinking to base search ob bi-gram indexing to keep it simple, especially as according to research papers it offers decent quality for most cases. This is not that complex to implement however there is no way we can test it as we have zero knowledge of Chinese or Japanese.

If you know Chinese Japanese or Korean and would like us help us testing Sphinx support for these languages let us know. No special development skills are required. If you’re reading this blog you should be technical enough.

November 29, 2006

BoardReader - Forum Search Engine

Posted by peter

One may have notice we were not blogging too much recently, this is because we were quite busy, mainly building BoardReader.com - Search Engine which indexes tens of thousands of forums from all over the world. This project was built by us as consulting project so too bad we do not own it completely but we’re still quite excited it is live now. We did not work on crawler in this project only on database Backend and full text search engine implementation. In this part it is standard LAMPS application. I guess you know what LAMP is and S Stands for Sphinx - Full Text Search Engine which we love to use where large scale search is needed. At this point we have over 300 millions of posts indexed with only 3 search servers and still counting. I guess we’ll have half a billion of forum posts soon.
[read more...]

November 9, 2006

Back from OpenSource Database Conference

Posted by peter

I’m just back from OpenSource Database Conference and PHP International Conference which took place in Frankfurt.

I’ve uploaded slides for two talks I’ve been giving which you might want to check out.

In general Database portion of the conference was a bit boring. May be because it was not widely announced or may be for some other reason. There were number of talks about MySQL by Arjen Lentz, Me and Giuseppe Maxia . There also were talks about Firebird, Apache Derby, Ingres and DB40. There however were no talks about PostgreSQL which is probably second most popular OpenSource Database or any others.

There were number of nice talks in PHP section - I especially enjoyed talks about PHP6 localization and DateTime handling in PHP 5.2+ I wish there would be some hard core performance optimization sessions for PHP applications which did not exist.

Attending Arjens session about typical MySQL/PHP Mistakes I was surprised how few people were able to catch even rather simple mistakes in my mind. It is encouraging as it means there will be enough work for consultants like us but it is also frustrating as it explains why it is hard to hire knowledgeable people to work with you.

In general it was worth the visit. Too bad the hotel was in the middle of nowhere so I could not see Frankfurt itself.