Why Joe Watkins is wrong about pthreads in PHP

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

Over at Reddit Joe Watkins wrote about pthreads in PHP.

Someone asks:

Is there a facility to use thread-local storage?

and Joe Watkins replies:

The static scope of a class entry can be considered thread local, in a way. Complex members (objects and resources) are nullified when creating new threads, but simple members (arrays/strings/numbers/mixture of any of the above) are copied, so in the static scope can be class::$config which contains connection info to whatever and class::$connection can be the connection itself, when class::getConnection() is called self::$connection should be created where null, thus providing all threads with a local copy of the same resource using common connection info.

Does anyone really think that is the right behavior?

(Below I’m going to criticize concurrency in PHP, but, to be clear, I am specifically talking about concurrency that uses objects. Other kinds of concurrency can be made to work in PHP.)

I have the impression that Joe Watkins is trying to make the argument that pthreads are safe in PHP because all copying is done as copy-as-value rather than copy-as-reference. That is what I get from:

The fact is that even if you pass data to a function that in turn uses that data in a non-reentrant way, it will make absolutely no difference because the data you pass is always a copy; pthreads utilizes copy on read and copy on write to maintain the shared nothing architecture and keep sane the executor.

and also:

Operations being implicitly atomic and cor and cow, I kinda just threw that out there. In the real world this means any time you $this->anything you are reading a copy of the data stored at [anything] which is made under the supervision of a lock that ensures nobody can change [anything] while the copy is made. Anytime you assign $this->anything the lock I just mentioned is acquired and the data you assign is copied to the pthreads object, the original data that was assigned does not have it’s refcount changed and Zend is able to free it if no more references exist. This is what is means by copy on read and copy on write, and implcitly atomic

Joshua Bloch, in his book Effective Java, writes “Favor composition over inheritance”. This is a rule that I have tried to also follow in all of my OO PHP work. So what do I end up with if pthreads:

Complex members (objects and resources) are nullified when creating new threads, but simple members (arrays/strings/numbers/mixture of any of the above) are copied

I can not imagine how I am suppose to work with that. What would sort of work is if every object inside of my object was also copied, but then memory consumption would be immense.

As before, I find myself favoring Clojure for any work I need to do concurrently. Clojure was designed, from its beginning, as a language for concurrency. Concurrency is simply too complex to try to bolt it onto a language as Joe Watkins is trying to do.

Joe Watkins justifies the existence of pthreads in his README on Github:

PHP is awesomely powerful, but the simple fact of the matter is, the number of extensions or features a language has doesn’t matter one bit. What matters is how many features or extensions you can utilize in your latest and greatest project. We only have about one or two seconds to send a page to a user, in practice we end up picking and choosing which of PHP’s features we will use because time is always a factor. Enterprising applications usually have to look to Java or the .NET framework if they are designed to do heavy lifting, aggregation, mathematics or the like.

No man is an island: today’s websites have to interact with several sources of data – from reference databases, to social networking API’s and content feeds … and everything inbetween … they have to use and reuse caches, update those caches and then, log all about it, they have to do this several hundred million times a week, if your startup is successful. PHP excells at all of those tasks; but having to execute them synchronously will often mean that when you do start getting the traffic you want to your new project, things are a bit shaky, and from that moment on you’re looking to replace the perfectly good code that you “made it” with, or even worse you’re looking for features to remove ! Bringing threads to PHP stretches your two seconds as far as it will go; and I believe allow you to design your applications to do more than you would if Threads were not available; and allow you to develop much faster than you can in Java or .NET, or any other language ( perhaps ), and as a result, you will be a happier human being, as will your boss, and your projects have virtually no limits

The key idea here seems to be:

allow you to develop much faster than you can in Java or .NET

Is that true? What happens if you make a mistake while writing a multi-threaded PHP app? The eco-system lacks the tools for debugging this sort problem (among the toughest problems in computer programming), especially when compared to the world of Java. What happens if you write code with race conditions, or a deadlock? Would you rather debug that in Java, or PHP?

Joe Watkins has made a tremendous effort here, but still, I can not imagine the day when a sane person would write a multi-threaded PHP app. This is the year 2014. Even Java now seems hopelessly out of date, since it forces you to manually manage a bunch of locks. If we want to take advantage of multi-CPU machines and write seriously concurrent software, then its time to move on to those languages that have been designed from scratch to deal with concurrency — therefore we need to be using a language like Clojure. As Tim Bray said:

Clojure’s Concurrency Features Are Awesome · They do what they say they’re going to do, they require amazingly little ceremony, and, near as I can tell, their design mostly frees you from having to worry about deadlocks and race conditions.
Rich Hickey has planted a flag on high ground, and from here on in I think anyone who wants to make any strong claims about doing concurrency had better explain clearly how their primitives are distinguished from, or better than, Clojure’s.

Post external references

  1. 1
    http://www.reddit.com/r/PHP/comments/1jo517/multithreading_in_php_with_pthreads/
  2. 2
    http://www.amazon.com/Effective-Java-Edition-Joshua-Bloch/dp/0321356683
  3. 3
    https://github.com/krakjoe/pthreads
  4. 4
    http://www.tbray.org/ongoing/When/200x/2009/12/01/Clojure-Theses
Source