Friday, May 25, 2007

Acts As Fast But Very Inaccurate Counter

Introduction

If you have chosen the InnoDB MySQL engine over MyISAM for its support of transactions, foreign keys and other niceties, you might be aware of its limitations, like much slower count(*). Our DBAs are in a constant lookout for slow queries in production and the ways to keep DBs happy so they recommended that we should try to fix count(). They suggested to check SHOW TABLE STATUS for an approximate count of rows in a table. This morning I wrote acts_as_fast_counter which proved that the speed is indeed improved but the accuracy might be not acceptable. The rest of the post just records details of the exercise.

The approach

I created a model per engine and seeded each with 100K records. Then I run count on each model for a thousand times and measured the results.

The code:

module ActiveRecord; module Acts; end; end 

module ActiveRecord::Acts::ActsAsFastCounter

def self.included(base)
base.extend(ClassMethods)
end

module ClassMethods

def acts_as_fast_counter
self.extend(FastCounterOverrides)
end

module FastCounterOverrides

def count(*args)
if args.empty?
connection.select_one("SHOW TABLE STATUS LIKE '#{ table_name }'")['Rows'].to_i
else
super(*args)
end
end

end

end

end

ActiveRecord::Base.send(:include, ActiveRecord::Acts::ActsAsFastCounter)

# create_table :myisams, :options => 'engine=MyISAM'  do |t|
# t.column :name, :string
# end
# 100_000.times { Myisam.create(:name => Time.now.to_s) }
#
# create_table :innodbs, :options => 'engine=InnoDB' do |t|
# t.column :name, :string
# end
# 100_000.times { Innodb.create(:name => Time.now.to_s) }

class Bench

require 'benchmark'
require 'acts_as_fast_counter'

def self.run
measure
show_count
convert_to_fast_counter
show_count
add_records
show_count
destroy_records
show_count
measure
end

def self.measure
puts "* Benchhmarks:"
n = 1_000
Benchmark.bm(12) do |x|
x.report('MyISAM') { n.times { Myisam.count } }
x.report('InnoDB') { n.times { Innodb.count } }
end
end

def self.convert_to_fast_counter
Innodb.send(:acts_as_fast_counter)
puts "* Converted Innodb to fast counter"
end

def self.add_records
@myisam = Myisam.create(:name => 'One more')
@innodb = Innodb.create(:name => 'One more')
puts "* Added records"
end

def self.destroy_records
@myisam.destroy
@innodb.destroy
puts "* Destroyed records"
end

def self.show_count
puts "* Record count:"
puts " MyISAM: #{ Myisam.count }"
puts " InnoDB: #{ Innodb.count }"
end

end


The results:
* Benchhmarks:
user system total real
MyISAM 0.180000 0.040000 0.220000 ( 0.289983)
InnoDB 0.430000 0.070000 0.500000 ( 35.102496)
* Record count:
MyISAM: 100000
InnoDB: 100000
* Converted Innodb to fast counter
* Record count:
MyISAM: 100000
InnoDB: 100345
* Added records
* Record count:
MyISAM: 100001
InnoDB: 100345
* Destroyed records
* Record count:
MyISAM: 100000
InnoDB: 100345
* Benchhmarks:
user system total real
MyISAM 0.250000 0.030000 0.280000 ( 0.350673)
InnoDB 0.250000 0.040000 0.290000 ( 0.977711)


Final thoughts

The MySQL manual has a clear warning about inaccuracy of the amount of rows in the SHOW TABLE STATUS results:

Rows - The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40 to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.


The test confirms it by showing 345 more records then expected thus making it not very useful but for some edge cases. If you know a way to improve the speed of count() on InnoDB with some other approach beyond using a counter table, please share.

6 comments:

Bernardo Santos said...

Have you considered counter_caching, which is a rails built in feature? You use the option :counter_cache=>true in the belongs_to declaration in the child Model, then define a column in the parent's table, named "child_name_count", and that value will be automatically mantained by rails (as long as you always use rails to update the table). It is described in the AWDwR book, pg. 362 (never tried myself).

To be able to keep a "cached" count for whatever model you want, you could also use the before_create and before_delete model callbacks to update such a cache whenever a model was created or deleted. This could be easily made into a plugin, keeping the count cache values in a separate table and overriding the count() method, as you did in your code.

Perhaps, if this solution doesn't suit you (you update your DB from other sources besides rails), you could still use the same "count cache table" approach, and run a periodic query (cronjob or something similar) to fill it and update it.

Of course, there would always be a lag in the count value you got, and you would have a small overhead of running the query all the time (you would also have an overhead by using the callbacks). But if, for instance, you have thousand of users running counts in your database and it is ok to have a small update lag (say 10 minutes), you should have a huge performance gain.

If MySQL supports it (I don't know), you could maybe use Triggers instead of the periodic query way and have the count cache always up to date.

Val Aleksenko said...

Bernardo: Yeah, we thought about that approach ( the phrase "If you know a way to improve the speed of count() on InnoDB with some other approach beyond using a counter table, please share." from the original post kind of hints that ;-) ). So far it seems the only way to handle it.

Bernardo Santos said...

Oops, sorry, I didn't quite pay attention to the last line... Anyway, good luck in your quest for a faster counter. : )

clemens said...

It may be a little weird but if performance is important try using the acts_as_paranoid plugin and then instead of using COUNT(*) use SELECT MAX(id) FROM table.

Val Aleksenko said...

clemens: that only works if you don't care about whether the record was deleted or not which is rarely the case for count(*) usage. If only an approximate count is needed then the described in the article counter works just fine or MAX(id) if deletion of records is a rare event.

Chad said...

I like to use count(id) since id has an index on it and I don't really need mysql to look through the entire row .. which I believe it does.