I’ve been testing Redis 2.0.0-rc3 in the hopes of upgrading our clusters very soon. I really want to take advantage of hashes and various tweaks and enhancements that are in the 2.0 tree. I was also curious about the per-key memory overhead and wanted to get a sense of how many keys we’d be able to store in our ten machine cluster. I assumed (well, hoped) that we’d be able to handle 1 billion keys, so I decided to put it to the test.
I installed redis-2.0.0-rc3 (reported as the 1.3.16 development version) on two hosts: host1 (master) and host2 (slave).
Then I ran two instances of a simple Perl script on host1:
#!/usr/bin/perl -w $|++; use strict; use Redis; my $r = Redis->new(server => 'localhost:63790') or die "$!"; for my $key (1..100_000_000) { my $val = int(rand($key)); $r->set("$$:$key", $val) or die "$!"; } exit; __END__
Basically that creates 100,000,000 keys with randomly chosen integer values. They keys are “$pid:$num” where $pid is the process id (so I could run multiple copies). In Perl the variable $$ is the process id. Before running the script, I created a “foo” key with the value “bar” to check that replication was working. Once everything looked good, I fired up two copies of the script and watched.
I didn’t time the execution, but I’m pretty sure I took a bit longer than 1 hour–definitely less than 2 hours. The final memory usage on both hosts was right about 24GB.
Here’s the output of INFO from both:
Master:
redis_version:1.3.16 redis_git_sha1:00000000 redis_git_dirty:0 arch_bits:64 multiplexing_api:epoll process_id:10164 uptime_in_seconds:10701 uptime_in_days:0 connected_clients:1 connected_slaves:1 blocked_clients:0 used_memory:26063394000 used_memory_human:24.27G changes_since_last_save:79080423 bgsave_in_progress:0 last_save_time:1279930909 bgrewriteaof_in_progress:0 total_connections_received:19 total_commands_processed:216343823 expired_keys:0 hash_max_zipmap_entries:64 hash_max_zipmap_value:512 pubsub_channels:0 pubsub_patterns:0 vm_enabled:0 role:master db0:keys=200000001,expires=0
Slave:
redis_version:1.3.16 redis_git_sha1:00000000 redis_git_dirty:0 arch_bits:64 multiplexing_api:epoll process_id:5983 uptime_in_seconds:7928 uptime_in_days:0 connected_clients:2 connected_slaves:0 blocked_clients:0 used_memory:26063393872 used_memory_human:24.27G changes_since_last_save:78688774 bgsave_in_progress:0 last_save_time:1279930921 bgrewriteaof_in_progress:0 total_connections_received:11 total_commands_processed:214343823 expired_keys:0 hash_max_zipmap_entries:64 hash_max_zipmap_value:512 pubsub_channels:0 pubsub_patterns:0 vm_enabled:0 role:slave master_host:host1 master_port:63790 master_link_status:up master_last_io_seconds_ago:512 db0:keys=200000001,expires=0
This tells me that on a 32GB box, it’s not unreasonable to host 200,000,000 keys (if their values are sufficiently small). Since I was hoping for 100,000,000 with likely lager values, I think this looks very promising. With a 10 machine cluster, that easily gives us 1,000,000,000 keys.
In case you’re wondering, the redis.conf on both machines looked like this.
daemonize yes pidfile /var/run/redis-0.pid port 63790 timeout 300 save 900 10000 save 300 1000 dbfilename dump-0.rdb dir /u/redis/data/ loglevel notice logfile /u/redis/log/redis-0.log databases 64 glueoutputbuf yes
The resulting dump file (dump-0.rdb) was 1.8GB in size.
I’m looking forward to the official 2.0.0 release.
Thanks!
Great post, wish you had recorded the run time
Have you looked into using virtual memory?
I’m curious to see the performance impact when values are pulled from disk, back into memory. Having rarely used key values stored on disk seems more cost effective, specially reaching the ONE BILLION DO… keys (with more realistic/larger values).
Would you be up for another go?
Thanks again!
Sean,
VM is really only appropriate for situations where you have fairly large values behind your keys. I suspect that I could make it “work” but I’d get maybe 500,000,000 on a single node at best (since the keys still need to be in RAM).
I suppose it’d be fun to try on one of our Fusion-io equipped machines!
How large a value are you realistically looking to store ?
It will be very interesting to see how it works when you store, say 10K of XML per key.
You see because what you are doing is fitting the value in one “machine word-size” – Only in larger text will, realistic performance be apparent.
Pingback: 1,250,000,000 Key/Value Pairs in Redis 2.0.0-rc3 on a 32GB Machine « Jeremy Zawodny's blog
Pingback: Top Posts — WordPress.com
A little off topic – what is ‘$|++’ (2nd line of your script) ? Hard to find an explanation on Google.
if you set $| to true ($| = 1 or simply $|++) it makes filehandle output unbuffered (flushes buffer immediately)
Hey, can I ask what you’re using this for? We’re researching doing something similar and we probably need to store 300-500M keys so we’re trying to decide if we should just use one nice machines or a few decent machines in a cluster setup using some type of internal sharding to know where to fetch the data from.
What about read/write performance? Was it linear when you have that many keys in redis?
Pingback: 用Redis存储大量数据 : NoSQLfan
You can use sharding with keys (php example) –
$servers = array(“192.168.0.1”, “192.168.0.2”);
$host = crc32(md5($key)) % count($servers);
then connect and store / read the key from this host.
however in this set-up, you will not be able to do set operations between the servers
(e.g. intersect / union sets that are located in different servers)
i tried client intersects – 1M+ sets on 2 diff servers and wait time is huge – 3-4 sec for long sets,
since all data needs to be send to the client.