July 31, 2014

Alternatives of PHP ?

When I loaded GigaBytes of XMLs into mysql database with PHP script, I was thinking about PHP alternatives. Why do I need that ?
1. PHP is slow – I’m speaking about area of data processing and implementation of algorithms
2. No good cli debugger – I’m just tired of debugging with ‘echo’ and ‘var_dump’
3. Unpredictable memory consumption – it’s easy in processing of big files to eat all available memory
4. Need something new – I’ve been using PHP for almost 10 years, so I want to try something else to refresh my mind.

List of my requirements:
- Stable binding to MySQL, support of new protocol with prepared statement is necessary.
- good XML handling
- Fast in terms of performance and development
- multiplatform, at least Linux, Windows, Solaris necessary
- under active development and wide community
- web binding
- not Java (I don’t like it, as most PHP-guys, I believe)
- has something that impressive me

So I went to The Computer Language Shootout Benchmarks and walked through wide list of proposed languages: D, Ocaml, Haskell, Erlang, Python, Ruby, LISP, Scheme, Lua,
Eiffel, C#, SML, Perl, TCL and rest.
Unfortunately my first requirement of support MySQL narrows the list to Perl, Ruby, Java, C, C++, even Python supports only old mysql protocol (am I wrong here?).
Excluding Java (see my requirements) and C / C++ (I don’t consider them seriously from Web-development) I have: Perl and Ruby. One problem with these languages they work as fast as PHP if believe to Shootout Benchmarks, so I’m not sure I should replace PHP on Perl/Ruby.
However, taking into account increasing popularity of Ruby, maybe I’ll take a closer look at.

Speaking about impressive features I’d look at Ocaml & Haskell as functional languages.
Also I was enjoying of syntax of J and K languages, e.g. program to calculate count of words in file and quicksort algorithm:

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. S says:

    Privet. Po-russkiy obschaeshsya? :)

  2. Hi,
    before ruling out Perl, please have a look at this discussion about the benchmarks you are mentioning.
    http://www.perlmonks.org/?node_id=392696
    Most of the benchmarks were made by people with little experience of Perl. While Perl is not really designed to wrestle with gcc on efficiency, a judicious usage can achieve surprisingly good results.
    If you write Perl code by mimicking PHP/Java/C/C++ code, then it’s likely that you will end up with highly inefficient programs. If you want to code for efficiency, you should learn the Perl way of doing things. This may not suit you, but you should at least know that most efficient results come from using the appropriate idioms.

    Cheers

    Giuseppe

  3. jim says:

    you’re right, python’s connector does not support server-side prepared statements yet. neither does the pure ruby connector (ruby/mysql), but the one built on top of libmysql (mysql/ruby) does.

    jim

  4. sigsegv says:

    Wow, nice chunck of FUD there buddy. Let’s examine your points one by one:

    1. PHP is slow – I’m speaking about area of data processing and implementation of algorithms

    Really? What are you comparing it to? A compiled language like C or C++, perhaps yeah… that’s a fair comparison. I loathe the day you do car reviews, I’ll bet you’d be comparing Toyota Corolla to Williams/BMW Formula 1 car.

    For complex algorithmic tasks like counting words, for example PHP offers native function written in C, called str_word_count(). And over 60 other string functions in addition to that (not to mention PCRE regex support).

    2. No good cli debugger – I’m just tired of debugging with ‘echo’ and ‘var_dump’

    Before publishing a research article please educate yourself with a search engine, I hear Google is very user friendly. There is a free, open source debugger for PHP called Xdebug that works quite well on CLI and offers all the same capabilities you’d expect from something like GDB, with a few PHP specific features.

    3. Unpredictable memory consumption – it’s easy in processing of big files to eat all available memory

    Ok, well if you do malloc(large_random_value) it’d be hard to predict your memory usage as well. If your code is not designed to be careful in use of memory, PHP offers memory_limit that allows you to restrict PHP’s memory utilization to a given value. PHP memory usage is actually quite linear and very easy to predict in just about all cases, with exception of instances when memory is allocated outside of PHP by a 3rd party library such as libxml. This however, would be an issue in any scripting or programming language. In the interest of science why don’t you write some C code that uses libxml and see that crash, then you can publish an article on how horrible and slow (and memory inefficient) C is.

    Also if you are planing to store gigabytes of XML use something equally elephantine like Oracle or IBM’s db2. If you need something smaller try Berkeley DB4. All of the previously mentioned solutions have internal XML data representation ability and are far more efficient for this use.

    4. Need something new – I’ve been using PHP for almost 10 years, so I want to try something else to refresh my mind.

    This perhaps is the only point that makes sense. If you really want speed and processing efficiency native ASM is the only true way, custom XML parser in ASM will surely be an entertaining hack that will only run one of CPU reeeealy fast, make sure to use CPU’s native vector instructions to really push all you can out of it!. Or if you are totally lame you can stoop down to using C. And then you can make a php module (which is how it is supposed to be done anyway). Thereby making your work useful to the community as opposed to this particular article.

    P.S.
    Rasmus? Is this you? Dude….. You wrote php 10 years ago and now you’re bored? If by some remote chance this is not Rasmus, please indicate when you are using “dog years” to count time, as PHP has only been out for 10 years and for the 1st year has pretty much had a 3 digit user base.

  5. Isotopp says:

    Nice code example. What is this? sendmail? :)

  6. Vadim says:

    To S:
    Yes, I speak Russian, if you have personal qestions contact me apachephp at gmail

  7. Vadim says:

    Hi Giuseppe,

    I used Perl a bit, something stopped me to use it more widely. I\’ll place Perl in my list for next try :)
    [code]
    cat \"test... test... test...\" | perl -e \'$??s:;s:s;;$?::s;;=]=>%-{< -|}<&|{;;y; -/:-@[-{-};`-{/\" -;;s;;$_;see\'
    [/code]

  8. Vadim says:

    Jim,

    Thank you for confirm.
    Do you know when python’s connector will support new features?

  9. Vadim says:

    Sigsegv,

    Take it easy – I\’m sorry it hurt you so deeply. You are right – this is Sunday\’s piece of flame. I\’m still using PHP for next bunch of XMLs.

    1) Speed.
    I\’m speaking not only about C/C++ but also about Python, Perl, Lua, Java, Haskell, Ocaml,
    e.g. http://www.timestretch.com/FractalBenchmark.html
    Btw, what is Toyota Corolla and what is Williams/BMW Formula 1 in list of PL from your point of view?
    2) Thank you for pointing me on xdebug as cli debugger. As I see it is still in beta stage and isn\’t under active development.
    Did you use it for debugging cli php scripts?
    3) memory:
    I\’ve posted bug report about memory consumption in libxml: http://bugs.php.net/bug.php?id=38604. Also I\’ve seen such
    problem several times in other areas. Well, that can be problem in third-party libraries, but this fact does not make my life easier.

    regarding your sarcasm about Rasmus – (un)fortunately I\’m not Rasmus, but IIRC I started to use PHP in 1997 when I made my first
    website for my company. That was early version of PHP 3.

  10. Vadim says:

    Isotopp,

    That isn’t sendmail. J programming language:
    http://en.wikipedia.org/wiki/J_programming_language
    which is declared to be “very terse and powerful, and is often found to be useful for mathematical and statistical programming, especially when performing operations on matrices”.

  11. Xaprb says:

    I think Perl is a good alternative. I agree with Guiseppe. I have been programming Perl for 8 years and PHP for 6, and know both quite well. If you learn the intricate details of Perl, it can be very efficient indeed. Especially if you use SAX to parse your XML. I’ve converted some scripts at work to parse large files with SAX and had good results.

    By the way I have deep expertise with XML and related technologies; I think you will find Perl’s support for XML so good and so varied that you can achieve better results by choosing one of its many tools for XML, just for your specific need, rather than for example .NET’s XML support which only offers you two or three ways to do something. (.NET’s XML Parser takes laughable amounts of memory too, if you use it badly).

    I only say that to point out that if you use the wrong thing in Perl, it will suck, just as if you use the wrong thing in .NET it will suck, but you can do the right thing and it will fly.

    I too have never found a good CLI debugger for PHP. There are ones but it can be very frustrating to get them to work. I’ve never found it satisfactory. On the other hand, perl -d is fantastic :-)

  12. Pierre says:

    This little line is certainly only about reproduce the leak/bug, but somehow representative of how to do things wrong (and I seriously hope you don’t read 1G with this line ;-):
    simplexml_load_string(file_get_contents($filename));
    http://www.php.net/simplexml_load_file

    Also as long as you use P*, Ruby or mono, you will hardly see huge performence improvements as they all use libxml (except if you use sablotron or other alternatives).

    If you like to limit your memory usage (in a constant manner), I recommend you xmlreader (available for c# and perl too, no idea if python/ruby have it). It is as fast as the other api but with a very low and “constant” memory footprint.

    But the point is clearly about knowing what you do and how you should do it (which extension, functions or API fits best), it does not matter if you develop with PHP (or any other language) since N years or 2 months.

  13. Dmitri Mikhailov says:

    There is always a way to speed up programs/algorithms, for instance rewriting them in assembly language, however, in this case the speed is bundled with a portability nighmare.

    Perl syntax is a little too loose, mixing the programming styles most likely will make the code unreadable and unmaintainable, a few more advices on coding styles can be found at http://thc.segfault.net/root/phun/unmaintain.html). Personally, writing Perl code I use a subset of Perl syntax constructions that resemble C-language syntax.

    J-language, in my view, belongs to esoteric programming language category. An approximate list of products of a bored mind is at http://en.wikipedia.org/wiki/List_of_esoteric_programming_languages

  14. peter says:

    First I should note you should not take it as attempt to start a flame. I think anyone developing long enough using any Language (and any piece of technology in general) can get upset with it, for specific project. This is great motivation to try out other things.

    Speaking about benchmarks mentioned I agree this is not really the point – if you develop in PHP all heavy weight processing is normally done in modules written in C/C++ – XML processing, regexp matching, sorting and even MySQL Client. If you need something else which is very CPU hungry and can’t be mapped efficiently on existing routines you should consider implementing it as an extension.

    With CLI Debugger the keyword is “GOOD” – meaning allowing you to effectively debug applications in CLI mode. Why IDE is not enough ? Because we’ve got to work with remote Servers which might not even have X Windows. Not to mentioning working with remote X Windows client can be pretty slow.

    Now memory consumption – what Vadim is mentioning is memory leaks. I’ve run into this with number of extensions (and XML In particular) and unfortunately developers do not seem to care. For Web applications it is not critical as soon as memory is freed after each request is processed. It is however big problem for batch job applications and permanently running script. Number of customers I worked with had used workarounds something like fork, process 100 files and exit exit, giving up workaround memory leaks.

    There is also does not seems to be whole a lot of memory allocation tracking tools for PHP. I mean something which could tell you where memory was allocated for which objects etc, which could help to point where exactly leak happens.

    My other concert with PHP which Vadim does not mention is lack of error handling. Of course there are exceptions but there does not seems to be a way to intercept fatal error, so you could display nice error message instead of partially created page. Yes these are often caused by development errors, for example passing false instead of object and trying to call it method. This kind of error should also be catchable.

  15. I don’t want to go off topic, but if you want to intercept fatal errors Peter, you can set a custom error handler with set_error_handler(). Actually, you can even convert almost all runtime errors into exceptions by using your own error handler.

    Sorry for the digression, back to the topic :)

  16. Peter,

    I’d like to know why do don’t like Java? I suspect I’d know the answer but I’d like to know. Putting that aside, have you actually considered JSTL, Java Standard Tag Libraries.

    While this is built on Java, it’s very PHP/ASP like in syntax. Indeed I use in for quite rapid prototyping. The clear benefits are it’s intepereted, so you need only an editor and a browser to code away. You can however at any time revert to straight Java code if there is some functionality not supported (I find this rare).

    I’ve started my own open source project to take my framework and publish for others. I’m some way from cleaning up everything, but you can get a clear picture to date at http://htmltags.sourceforge.net and http://htmltags.arabx.com.au

    You can get more blurb at http://java.sun.com/products/jsp/jstl/. Handling XML for example is a piece of cake. Check out the XML functions at the API docs http://java.sun.com/products/jsp/jstl/1.1/docs/tlddocs/index.html

  17. peter says:

    Java…. It is just whole another world. I’m not saying it is bad though.

    Things I find inconvenient (possibly wrong is)

    - Need to compile stuff. For scripts I prefer to be able to run them right after the changes.
    - Class names. They are smart and standardized which does not make them pretty for my taste.
    - Too much standardization. For example Connector/J has to be JDBC complaint which means it has to do many smart things which are required by specs.
    - Love for complication. Java applications are typically designed “right” which makes them complicated. You can take a look at any stack trace posted for Connector/J bugs… it is rarely less than 20 levels.
    - Too many third party extensions, many acre commercial.
    - Not overly convenient to work with strings.
    - Product of large company. With Perl and PHP it is relatively easy to reach developers.
    - Fully OO. I like objects but not for 2 line scripts.

    I guess most of them are just lame excuses the right reason would be I just do not feel like it. I enjoy playing with something which allows me to do things quick and dirty. Most of my tasks are far in complexity and reliability requirements from plane auto pilot.

    Java is probably good for enterprise world but I do not expect it to get too much traction in Web words which wants applications to be quickly developed by students implementing prototypes for their ideas.

  18. peter says:

    Hubert,

    I knew something would write about set_error_handler. Unfortunately it does not work.

    Here is what documentation says:

    The following error types cannot be handled with a user defined function: E_ERROR, E_PARSE, E_CORE_ERROR, E_CORE_WARNING, E_COMPILE_ERROR, E_COMPILE_WARNING, and most of E_STRICT raised in the file where set_error_handler() is called.

    Here is little example with problem I’ve mentioned:

    < ?php

    function myErrorHandler($errno, $errstr, $errfile, $errline)
    {
    echo("Error !");
    }

    set_error_handler("myErrorHandler");

    $a=false;

    $a->method();
    ?>

    Error handler does not help in this case.

  19. Possible memory leaks were exactly the reason why I never used PHP for anything else but short running scripts (like web apps). As soon as I got the task to write some long running daemons in a scripting language for the first time some three years ago, I turned to Perl or even Bash (for simpler stuff), even if I never hit a real problem with memory leaks in PHP before.

    But as soon as I started to write the first few lines of the deamon in PHP, I started to get some bad feelings about it (even if it would’ve fit for the task at hand and I was actually almost only using PHP at that time). I just realized that nobody was using PHP for anything that runs longer than a few seconds – and I just didn’t want to be the first one to hit the possible bugs.

  20. stepz says:

    If you change your mind and still want to use your PHP expertise, then XMLReader is really nice for huge XML -> database transformations. If you don’t need to do complicated processing of the data then it’s quite nice and if you take care to avoid circular references (not that hard with the XMLReader processing model), doesn’t leak memory. I haven’t had the “pleasure” of working with gigabyte sized XML files, but hundreds of MB’s aren’t a problem. As I have 6 years of experience using PHP, the development goes really fast too. Not that Python, et al are bad for this kind of thing, but experience really counts towards development time.

  21. peter says:

    It is actually pretty interesting. Even in Web applications you need to write some scripts every so often – when you do some database structure changes, data loading, removing old data etc.

    Using other language may mean you will have to duplicate certain portion of application functionality, leading to problems with maintainence.

  22. Harry Fuecks says:

    As it goes PHP isn’t too bad at looping over lists of things, if that’s what you’re doing. But if your XML file is really that big, you’ll have problems no matter what language you use.

    Some random thoughts;

    Stream the XML – using something like a SAX API (not DOM which loads the entire document)

    If you need to do some kind of calculation based on the entire document i.e. you need to see all the data to be able to produce the final result, you might consider storing intermediate values to file. At the far end of this way of thinking is Google’s MapReduce (http://en.wikipedia.org/wiki/MapReduce) – distribute the processing. The livejournal guys have something heading towards a Perl implementation of the same – see links here: http://del.icio.us/harryf/gearman . Another angle would be to consider an MMap based solution – in Perl check out http://cpan.uwinnipeg.ca/htdocs/Cache-FastMmap/Cache/FastMmap.html for example

    One advantage of Perl is it’s attracted the type of developers that like to solve this kind of problem (unlike PHP). Browse CPAN – Perl’s XML tools a also good. And demand Unicode support (some like Perl have it… other languages don’t) ;)

    http://cpan.uwinnipeg.ca/htdocs/Cache-FastMmap/Cache/FastMmap.html

  23. peter says:

    Harry,

    No it is not like 1GB XML file, but more like 100 10MB XML Files, which would not be the problem if memory would not be leaked by PHP.

  24. Adam says:

    C# has the MySql Connector/NET and is reasonably fast and predictable. The Mono project has a page on the topic with a quick example – http://mono-project.com/MySQL there is also the native System.Xml classes for handling Xml internally.

  25. Rene L. says:

    Learn how easy it is to use a SAX parser (PEAR::XML_Parser) which allows to quickly process large XML documents without loading it into RAM
    http://www.schst.net/index.php?__path=articles.xml-parser

  26. NTDOC says:

    @Harry Fuecks

    Perl is great but with so many dead links to what were booming sites before is sad. Perhaps a bad perception on my part but it seems to me that a lot of the open source projects seem to lack the wealth of community support they once had years ago.

  27. Isaiah says:

    Have you tried Euphoria? I’ve been looking into it for sometime. I just can’t stand php anymore, it’s slow, messy, and hard to develop. From what I understand it’s easy to install, just put the interpreter in cgi-bin as it’s written in c. From a benchmark I found it’s apparently 35x faster than perl and 31 times faster the python. http://www.rapideuphoria.com/bench.txt
    It has an ODBC library allowing it to connect to MySQL http://www.usingeuphoria.com/?page=bestofeu

    I see also it has an euphoria to c translator that makes it 3.7 times faster.
    I think I’m going to download it and play around with it, I’ll let you know what I think

  28. Olivier says:

    You could try WebDNA: it is lighter than php and mysql and is also much faster (resilient in-memory database). It is 100% compatible with any browser (does not need anything client-side) and you can hack virtually anything the server send to the browser. I run it with heavy in-memory databases but you can use MySQL if you want to.

Speak Your Mind

*