Stop Grepping Your Database: Implementing Elasticsearch 0.90 on KVM
If your application's search logic relies on SELECT * FROM table WHERE column LIKE '%keyword%', you are the bottleneck. It works for 100 rows. At 100,000 rows, your database locks up, your I/O wait spikes, and your users leave.
It is 2013. We don't grep databases anymore. We index them.
Elasticsearch (currently version 0.90.1) is rapidly replacing Solr as the go-to search engine for agile teams. It's JSON-native, RESTful, and scales horizontally. But it is also a resource hog that will eat your server's RAM for breakfast if misconfigured. I've seen too many developers throw a jar file onto a cheap OpenVZ container and wonder why it crashes every 24 hours.
Here is how to architect a search cluster that actually stays up, specifically for the Norwegian market where latency and data sovereignty matter.
The Java Heap Trap
Elasticsearch runs on the JVM. Java 7 is much better at memory management than its predecessors, but it still requires strict boundaries. The most common mistake I see is leaving the heap size to default.
On a dedicated VPS with 4GB RAM, you shouldn't give Elasticsearch all of it. Half goes to the Heap, half goes to the OS for Lucene's file system cache. If you starve the OS, I/O performance tanks.
Edit your startup environment:
export ES_HEAP_SIZE=2g
Then, you must force the JVM to lock that memory. If the OS swaps your Java process to disk, garbage collection (GC) pauses will skyrocket from milliseconds to seconds. In your elasticsearch.yml, set this:
bootstrap.mlockall: true
Note: This usually fails on shared hosting or oversold VPS providers because they don't allow memory locking (mlock). This is why we use KVM virtualization at CoolVDS. You get a dedicated kernel and reserved RAM. If you pay for 4GB, you get 4GB. No noisy neighbors stealing your pages.
Storage I/O: The Silent Killer
Search is I/O intensive. Elasticsearch uses Apache Lucene under the hood, which writes immutable segments to disk and periodically merges them. This merging process destroys mechanical hard drives (HDDs).
I recently audited a Magento store struggling with 5-second search times. Their provider had them on 7.2k RPM SATA drives. During a re-index, the disk queue length hit 50+. The CPU was idle, waiting for the disk.
We migrated them to a CoolVDS instance backed by Enterprise SSDs (RAID-10). The re-index time dropped from 4 hours to 20 minutes. Latency dropped to sub-50ms.
Pro Tip: Check your I/O wait times withiostat -x 1. If%utilis consistently near 100%, your disk is the problem. No amount of code optimization will fix bad physics.
The "Split Brain" Nightmare
If you are running a cluster (even just 2 nodes), you risk a "split brain" scenario where both nodes think they are the master and diverge your data. It is messy to clean up.
Always configure the discovery.zen.minimum_master_nodes setting. The formula is (N / 2) + 1. For a 3-node cluster, set it to 2.
discovery.zen.minimum_master_nodes: 2
Do not skip this. I have spent too many nights manually reconciling JSON documents because a network blip caused two servers to divorce.
Latency and The Norwegian Context
Physics dictates speed. If your customers are in Oslo, Bergen, or Trondheim, hosting your search index in Virginia (US-East) adds 100ms+ of unnecessary round-trip time (RTT). For a search-as-you-type interface, that delay feels sluggish.
Hosting locally isn't just about speed; it's about Datatilsynet. With the strict interpretation of the Personal Data Act (Personopplysningsloven), keeping user data within Norwegian borders simplifies compliance significantly compared to navigating the US Safe Harbor framework.
CoolVDS peers directly at NIX (Norwegian Internet Exchange). Pings from major Norwegian ISPs (Telenor, Altibox) are often under 5ms. That snappy feel builds trust with your users.
Architecture Summary
| Component | Recommendation (2013) | Why? |
|---|---|---|
| OS | Ubuntu 12.04 LTS | Stable, long support cycle. |
| Java | OpenJDK 7 or Oracle Java 7 | Better GC (G1GC is experimental but promising). |
| Virtualization | KVM (CoolVDS) | OpenVZ fails with mlockall and Java heap. |
| Storage | SSD | Lucene segment merging kills spinning disks. |
Final Thoughts
Elasticsearch 0.90 is powerful, but it assumes you have the hardware to back it up. Don't put a Ferrari engine in a go-kart. Use dedicated resources, fast storage, and keep your data close to your users.
Need to test your cluster performance? Spin up a CoolVDS SSD instance in Oslo. You can have a KVM node ready for indexing in under 60 seconds.