Doing Web Design in Botswana!

I’ll be spending the month of May in Botswana, doing web development.

Follow my trip on UBC’s Leave for Change blog.

Latest Posts:

What’s this all about? I’ll be spending the month of May working in Botswana, as part of the “Leave for Change” program funded through UBC, Uniterra, and the World University Service of Canada. I’ll be working for an HIV/AIDS teen outreach organization.

My job will be to create a website and promotional strategy that will let them take online donations, help showcase their programs, and share upcoming events with local teens. I’ll also train local staff in doing the same, building local tech capacity at the same time.

Botswana? Botswana is a young, peaceful and democratic country, and it’s one of the fastest-growing economies in Africa. About 25% of Motswana have HIV, and 1 in 5 kids have lost a parent, which is slowing that growth down a lot.

Geographically, it’s land-locked, located above South Africa and between Namibia and Zimbabwe, and 70% of the country is Kalahari desert; 10% is the lush, verdant Okavango Delta. I’ll be living in neither of those places, though: I’ll be in the capital, Gabarone, far to the south-east, in an apartment in the centre of the city. Besides working and hanging out with the people I meet, I plan on doing touristy stuff like visiting museums, game parks, and historical sites.

Mind: blown (or: How to use Drupal’s post teasers properly)

This past week I was lucky enough to take Drupal module development course taught by Gord Christmas. Gord showed the class a lot of really cool stuff, and helped solidify my understanding of some heretofor-considered black Drupal magic (it’s now white magic) - and also demonstrated what rapid prototyping can accomplish in a classroom setting.

But the coolest thing by far that happened was a little aside he travelled down in response to a (slightly snide) comment by yours truly: I said something like “Pshaw. Teasers? Who even uses those things anyway?”

And then Gord said “Well, actually…” And then he totally blew my mind.

I had always thought that teasers were an outmoded display method, a legacy from the old Drupal 5 days. I’d always considered them sort of old-fashioned – nobody ever uses the “river of news” front-page display anymore, and teasers seemed to belong to a quaint era before Views offered its lovely “truncate to X words and add elipses” option.

How wrong I was.

“Only use fields if you have tabular data to display. Otherwise, use Full Node.” The idea here is that you configure teasers in Display settings, and then set up your templates to be aware if it’s a teaser or not, so you can theme it (the full node’s view mode lets you exclude links and whatnot typically shown on full posts, so this might not even be necessary for most things). Then you use Node view in your Views, and what pops out is consistent, every time.

Mind? Blown.

Using teasers for fun and profit

  • In any list-type view, instead of adding fields each time you set up a similar view, use Node View and set the Build Mode to teaser.
  • Actually configure teasers in your content-type settings – because they’re separately configurable you can show/hide, change display format, and outright exclude totally different things from teasers as from full node. Teaser length is configurable per content-type if you download a little module for 6. It’s rolled into 7.
  • Create per-content-type node.tpl.php files. In each templates, test for teaser mode existence and output accordingly (because chances are different types will have different layout needs.)
  • In your css, theme *once*, instead of adding CSS classes to your views and tacking on different view-types to .views-row and .views-field-xxx-xxx…

And voila. The payoff is consistent Views displays and no fiddling with fields, consistent single-point-of-configuration, very consistent look across views, much less CSS and easier theming, and a bit of database/caching ease, too.

The trade-off is “extras” – throwing an extra field into a single view will have the same overhead as it did before, you’ll have to add all the fields, and theme for it, etc, etc. But most of the ways we use views is to display lists, and for the most part the only time we change how things look in those lists is for different content-types – and then they appear that way all over the site. We rarely add “just one extra field” unless we’re doing something very View-y like complex relationships using different views-hooked-module-field-output. Because we can still leverage the power of filters, arguments, and relationships, however, those use-cases shrink to something fairly small. And it’s the same amount of work to deal with those anyway.

Now, I haven’t tried this myself. I’m sure as shooting going to try it for the next site I develop. I’ll let you know how it goes – it sounds like a bit more work up front, but thinking about it for half a second reveals that it might actually save tonnes of time, and cut down my swear-words-per-hour rate to something close to manageable.

What are your thoughts? Does this look totally mind-blowing, or more trouble than it’s worth (you know where I stand! =) ?

UPDATE: Was recently reading some performancy-things, and found this entry: avoid calls to node_load. They explicitly recommend using fields in views. Hmm. Mind still blown, but wondering… :) If it’s a small performance trade-off then with caching and whatnot it shouldn’t make a huge difference, but it will depend on the size of the site and your priorities. Benchmark away.

Stop wasting my time! (or: a primer on using Drupal properly)

I think I can safely say that I speak for 95.6% of all frustrated Drupal professional out there when I tell new developers this:

STOP creating custom modules and proprietary solutions.

Just… stop it. For every proprietary project that goes and duplicates 96% of the work another proprietary project already created, there’s an open-source developer like me crying into her beer because of the horrible waste of human capital and interoperability this represents.

This isn’t just a Drupal problem, it’s an open-source problem, but I see it here because this is my world.

The problem? Programmers.

Don’t get me wrong: I work with programmers. I love programmers! But programmers love to program. That’s what they do. When you hire a programmer, nine times out of ten they do not carefully research possible solutions, find one that pretty-much-fits, implement it, and customize it using the tools the solution itself provides. That’s not what they trained to do in their university Computer Science classes! Nooo. They were trained to program. So they whip out their favourite text editor, they get up to their elbows in API, and they start writing code.

And then I get to spend a not-insignificant number of billable hours, after they’ve gone on their merry way, ripping out all that lovely hand-written custom module logic, exporting custom database tables, and re-implementing everything in basic out-of-the-box Drupal just so that a customer’s site can be upgraded and properly maintained.

I hate doing this. I really, really do.

One developer friend I work with joked “The client looked at the site and said ‘It looks exactly the same…’ and I was like ‘YES! ISN’T THAT AWESOME?’” The difference was invisible, but crucial: the first version (which she was hired to ‘maintain’) was hideously insecure and totally un-upgradable; the second version was done properly, and could be maintained going forward using standard Drupal tools and best-practices.

Unfortunately, clients do not care what goes on under the hood, they just want it to work, so this puts the new developer they hire to fix whatever goes wrong in a huge bind. Clients don’t really understand the fundamental difference between solutions — they just see line-items on an invoice, and they don’t get the open-source mentality of security-by-lots-of-eyes, or contributing-work-back-to-the-community. They just understand functionality. If it works, they pay.

This just encourages programmers who should really know better, and screws the people who have to clean up after them.

But all is not lost: clients are willing to listen. They don’t want to have to go back and re-do everything when they get hacked, or when they try to upgrade versions and find they can’t. But if we don’t educate them, they will naturally blame the software, not the developer. We can’t expect them to magically understand the amount of work that needs to go on just to make their site look and function identically to their old site because their programmer loved to program. They just get a bill, and blame Drupal.

Drupal doesn’t need this.

Now, sometimes, I realize, it’s necessary to implement proprietary stuff. Sometimes you’re working with weird, antiquated and cantankerous systems that just need a good smacking-down in the form of a custom module. Oh, yes, I have met those. But this is rare. I have seen so many custom implementations of modules already well-maintained on Drupal.org that I want to cry. My first question is always “Can you write a rule for that?” “Did you create an action?” “TRIGGERS!” “This is a CCK field, right?”

I once helped rip out an entire web store done with its own database tables; we were able to re-implement the whole thing using just CCK, Views and Ubercart. And I will never get those hours of my life back.

A more egregious example is this: one site I worked on had a lovely custom-LDAP module set up, to add users from the company’s LDAP directory and give them pre-defined Drupal roles. WAIT A MINUTE. THAT SOUNDS FAMILIAR.

Well, of course it does: there’s already an LDAP module! And it works out of the box. It’s not perfect (it needs some work on provisioning, for example, because of the way PHP pages results) but it is a Drupal-community-maintained module. If the developer had the time to create a whole new custom module, writing thousands (yes) of lines of code to do exactly the same thing as the LDAP module, why couldn’t she or he have just implemented the LDAP module and helped fix it up where it didn’t fit their needs — and contribute it back?

Well, he or she probably didn’t know about it, because they probably just cracked open a text editor and started coding.

In all cases where this happens, the open-source community loses the chance to have a great programmer working on a module or project that can use some TLC (not to talk smack about any particular module — all Drupal modules need TLC, all the time, and they all need people willing to test and patch, always! Because this is software we’re talking about.)

So, all new Drupal developers? DO NOT REINVENT THE WHEEL.

Feel free to add some sparkles, feel free to maybe put on a nice hub-cap or two, but stop making things that already almost exist. Search! Use some of those analytical skills to assess existing solutions! And, if you do create something and it seems useful, stop keeping it to yourself: this is a community, and we need your contributions. Try to generalize it, and then give it back. Use your amazing developer skills to do it the Drupal way.

The community will thank you, and so will I – the fewer sites I have to fix, the better.

How to handle taxonomies – CCK Fields or default?

I’ve recently come to the conclusion that it’s far better to add vocabularies via CCK than it is to use the default Taxonomy setting.

Why? Oh, so many reasons. Here are three:

  1. Content display
  2. Tokens
  3. Views

1. Content Display

Main Drawback of Drupal’s Taxonomy: You can’t separate out a specific vocabulary for special theming. You also can’t choose to display only one vocabulary, and have another be hidden.

The default Drupal implementation of taxonomies are all-or-nothing. If you have several vocabularies assigned to a content type – like, say, a site vocab and an audience vocab, and a free-tagging vocab, then Drupal just treats them as “terms” and prints them all in a block at the bottom of your post (for the programmers in the audience: they’re all lumped into the $node['vocabularies'] variable).

With CCK Content Types in place of the default taxonomy, though, you can set what shows up on the field’s “Display” settings page, and can choose to hide vocabularies there. You can also theme them separately on the theme layer.

2. Tokens

Main Drawback of Drupal’s Taxonomy: You can’t separate out a specific vocabulary for use in tokens.

The default Drupal implementation of taxonomies is also all-or-nothing in tokens. For example: if you’re using Pathauto and want to create “Category” paths, you can only tokenize the term that happens to come first in the first vocabulary assigned to your node. What?!

Using separate CCK fields, on the other hand, lets you choose the vocabulary you want to use – it’s available as a token, and since they are separate, well, there you go.

3. Views

Main Drawback of Drupal’s Taxonomy: Again, mainly by virtue of being separated out, CCK taxonomies allow you to use separate vocabs in views. This allows you to sort much better when displaying fields.

Main Drawbacks of using CCK Fields: Well, this is great for View fields, however it’s not so great for View arguments, because they’re specially written to leverage Drupal’s main site taxonomy… as are a lot of things.

OK, so can we do both?

YES! Views “Arguments” implementation is written to mesh really well with Taxonomies, allowing you to, for example, take the page terms as a default argument into a block (by specifying “Provide Default Argument”) and filtering on them. If you are using CCK Fields, of course, this becomes a problem. Unless…

Workarounds: Always always always select “Save terms to core taxonomy tables” when adding a CCK Taxonomy Field. You have to have the core Drupal Taxonomies set up in advance anyway – CCK Fields can just be used as a better way to leverage them. So make sure you’re still saving all those terms onto your nodes as if they were using core Taxonomy terms. That way you can have the best of both worlds. Really.

Drupal Performance (Post 1 of 32492)

Note: This post is going to get re-written shortly to better fit the new format of the blog, ie: pulling out key points and making them separate posts, with tools and caveats for each one. Leaving this here for historocity’s sake :)

In the course of revamping the UBC Computer Science department’s website, I came face-to-face once again with the fact that Drupal, in terms of performance, is like a bunch of drunken monkeys on an oil tanker in a narrow channel during a monsoon: fun to play with, as long as you don’t have to try to get them to actually do anything coordinated or useful, and as long as your environmental disaster insurance is paid up.

Replace “monkeys” with “modules”.

Inevitably the coast guard shows up, and tries to make all those drunken monkeys walk in a straight line… and, well, it doesn’t work. (Again, replace “coast guard” with “users”. And the ship with a server. And replace the captain with our sysamin. Imagine the captain sobbing. There you go.)

Work, monkeys!

There are two aspects to site performance: load times, and server performance. Load time is, roughly, how long it takes the client to download a page and all its parts; server performance is how long it takes to generate that page to get sent in the first place – and this metric will determine how well the server stands up to getting hammered by high load, as well.

Load-times

Some of the problems I encountered were basic best-practices things you have to tackle when taking a site from development to production, just in terms of page load-times.

  • too much CSS, and none of it squished/optimized
  • too many enormous background graphics
  • images not optimized or sprited
  • IE-specific CSS not removed to a separate file
  • PNGs where true background alpha transparency really isn’t needed
  • javascripts not minimized
  • nothing validating as W3C Complaint

Fixing this stuff took the first page load-time from 14 seconds to 7-ish seconds (I use both YSlow and Google’s PageSpeed Firefox plugins for measuring this).

Not bad – 50% reduction.

But 7 seconds? That’s awful. There’s much more that can be done. On the one hand, much overhead is just delivering the files to the client, and we had lots of big pretty background CSS images eating up a lot of download time. Not much we can do about that beyond what I listed above. But getting all the CSS and JS to the client can be optimized. Zip ‘em up.

Gzipping

Turn on gzipping on the server via mod_deflate.

Modern browsers, all of them, can handle gzipped files. Drupal itself has a page compression setting. However, I did this through the server, rather than through Drupal, basically as a way of spreading out the work – letting Drupal do it means it is done through PHP functions. Apache does it after that point, using its Apache resources. (Note: if you have a proxy, do it there instead – although there are SSL issues that come with that.)

If you do this in Apache, Drupal’s Page Compression should be turned off. But it you do it in Apache, non-Drupal pages on the server will get included too. That’s good for us.

First page load went from 7 seconds to 5 just by doing this. Add something like this to your httpd.conf:

<Location />
# Insert filter
SetOutputFilter DEFLATE
# Netscape 4.x has some problems…
BrowserMatch ^Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip
# MSIE masquerades as Netscape, but it is fine
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
# Don’t compress images
SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png)$ no-gzip dont-vary
# Don’t compress compressed files or executables
SetEnvIfNoCase Request_URI .(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
# Don’t compress PDFs or other pre-compressed media
SetEnvIfNoCase Request_URI .(?:pdf|avi|mp3|mp4|mov|rm)$ no-gzip dont-vary
#Make sure proxies don’t deliver the wrong content
Header append Vary User-Agent env=!dont-vary
</Location>
#This is pretty close to Apache’s suggested configuration; you don’t have to include the Netscape stuff, but
#we have a very varied user-base, so I left it in.

Ship shape

Not only were the pages slow to get to the client: the pages were causing the server to collapse into a little heap. The first load-tests of our setup basically put our server on its knees after 30 concurrent requests. Not good. So, database tweaks and server tweaks were needed, along with some stuff in Drupal itself.

Database Tweaks

  • Storage engine: use InnoDB as your storage engine, not MyISAM. It’s more overhead, but – especially if you allow user logins – is more robust and is the better choice for Drupal. InnoDB will be the default for Drupal 7, also.
  • Make sure your InnoDB log buffer and log table sizes are bigger than the default. Read here.
  • Use a separate database server if you can.

Drupal Tweaks

There’s lots you can do within Drupal. Once we launch I plan to go back and do a full post on this, but for now…:

  • Install the Drupal Tweaks module. It’s like it was made for this or something!
  • Install the Block Cache module (I’m using the “with node grants” patch because we’ve got lots of access control stuff on the site.) Drupal evaluates all blocks on each page load, so this helps by telling it what situations the block should be looked at – per page, per role, etc. You decide. Because I’m using Context, this module isn’t so important, but it still helps because all the page menus are part of the main Block loading cycle, even if the rest of the blocks are disabled. I have them set to cache “per page”, so that active menu items get set.
  • Panels/Context improvements. I ended up doing a custom panels/context hybrid; I’m not happy with it but it’s pretty OK. I didn’t benchmark this, because I can’t change our setup at this point anyway, but I did do some things like removing lots of custom panels and putting them into selection criteria in the master node/%node that will I think improve performance in the future (I’ll do another post on that later, too. I PROMISE.)
  • Set each View to cache at whatever interval is reasonable.
  • Using Panels caching, if you’re using Panels.

Server Tweaks

  • Memory. You need lots of it. Up your PHP.ini memory to be 64MB. At least. We have it higher, but we do lots of cron importing and job scheduling. Raise it in increments and stop when you stop getting whitescreened. Depending on your setup, you can do it:
    • in PHP.ini: memory_limit=64MB
    • in settings.php: ini_set(‘memory_limit’, ’64M’);
    • in .htaccess: php_value memory_limit 64M
  • If you have a dedicated server and can do so, increase your RAM. If you’re on a shared server with ssh access, you can check how much RAM your server has by typing free -t -m. Rent more. For example (rough numbers here) you’ll want around 4GB for a site that has the common Drupal modules and gets >10,000 hits a day.

Opcode Cache

Install an Opcode Cache like APC. This stores chunks of oft-compiled code in memory for quick retrieval. All PHP-powered CMSes like opcode caching. Just do it. It’s like hot coffee for the monkey. However, there are some Drupal-specific gotchas. The apc section of our PHP.ini looks like this:

[apc]

apc.enabled=1
apc.cache_by_default=1
apc.shm_segments=1
apc.filters=”-(.*\.tpl\.php)|(.*/job_scheduler/*)|(.*/feeds/*)”
apc.shm_size=128M
apc.ttl=7200
apc.user_ttl=7200
apc.gc_ttl=3600
apc.num_files_hint=1000
apc.enable_cli=0
apc.stat=0
apc.mmap_file_mask=/tmp/apc.XXXXXX

Some of this stuff is already the default, but it’s here to enable tweaking and testing. TTL is important, because it means that if your cache fills up it won’t get dumped: older entries will get timed out. gc_ttl is for garbage collection, and is more important if you’re using apc-specific functions.

Note the apc.filters directive.

When I turned APC on, I got a lot of stuff that looked like this:

include(): apc failed to locate ./sites/all/modules/context/theme/context-block-browser-item.tpl.php – bailingin /cs/web/www.cs.ubc.ca/docs/drupal/includes/theme.inc on line 1066.

Hmm. It turns out that Drupal’s PHP Template engine is doing weird things with relative paths that was causing APC to throw up all over the place. Drunken monkeys, anybody?

Turning on or off the apc.include_once_override directive doesn’t fix the problem – it’s buggy right now, and using it has performance problems of its own. Since I’m using PHP5.3 anyway, it doesn’t matter so much, since PHP5.3 has its own includes optimizations.

Instead, I simply filtered out the problem modules until I can get into the code and see what’s going on. The filter is a regular expression, and as you can see I filtered out a few other places I don’t want APC going (the feeds module is making it choke as well, so I just excluded the folder, since it only runs once a day on cron anyway).

You can explicifly include or exclude from the filter by using – or + before the expression. If you’re using + filters, make sure “apc.cache_by_default=0″.

Load test that sucker

It doesn’t matter what you use, it really doesn’t. Apache Benchmark is nice if you have server access. There are dozens of online services that’ll let you do a few tests free so you can get out of your own building and see what somebody on the actual internerd is seeing, and they give you pretty graphs too. Whatever. You can google yourself up lots of different tutorials on load testing, but basically remember that concurrent connections are as key as the number of requests.

Ultimately, with all this stuff tweaked and working well, our server went from groaning at a mere 30 concurrent connections with basically a J-shaped load profile (running ‘top’ on the server showed that the edge of the vertical was the point at which it started swapping to disk), to something that looked more like a sideways hockey stick and was ticking along fine at 100 and 150 and beyond.

Sober little monkeys

Right now, our page load times are sitting around 3-4 seconds for a first load on a cable connection, and that’s not good enough yet – next steps are to go through and turn on Views and Panels caching (as I recommended above) once we’re finished development and have all the views and content in there and know what sort of page profile we’re looking at.

This is a very brief overview, but now I have a site to configure. Later, monkeys.

Excluding the current post from a view

Excluding the current post from a sidebar view is something that I sometimes like to do.

Case: It’s nice to have a view of, say, “related content” that takes the page’s terms as an argument, and shows all of the nodes tagged with the same terms as the node we’re on. We build a block view to do this*, put it in a sidebar, and activate it.

Problem: A list of related nodes on each page, including the current page we’re already on.

Desired Outcome: Wouldn’t it be nice not have the current post we’re on included in the resulting list?

It’s one of those little details that I think really makes a subtle difference. Showing the node we’re already on is an unnecessary duplication, and it’s a bit ugly. Likely if this tweak is done, it’s going to go unnoticed by the user — but when it’s not done, it is noticed. We can say the same of all the best UIX tweaks, can’t we?

Luckily, it’s simple to do. Simply add the node as an argument, and then exclude it.

Step by step:

1. Add a Node: Nid argument.

Add Argument Node: Nid

2. Select “Provide Default Argument.”

3. Choose “Node ID from URL”

4. Select “Exclude this argument.”

5. ???

6. PROFIT!

Caveats: There is one caveat with doing this, however: caching. If you need a heavily cached site, then regenerating a view per page, which is what you’ll force it to do, can add a bit of overhead.

Basically, you won’t be able to set the view to cache at all using the native per-View cache; however, if you use it in a block (or with Panels Context) then you can set the block to cache per page if you use the CacheRouter module (which I recommend everybody use, just ’cause it’s awesome) and/or cache per page in your per-pane Panels cache.

* …using “provide default argument if none exists”, if it’s a block — because blocks can’t take arguments. It will snag the $node object on the current page automatically.