Discover Top Posts Tagged with #active-record

Fix MySql group_by after upgrading mysql

MySql version 5 contains breaking changes among which default SQL mode is one of them. The new default SQL mode includes these modes: ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES NO_ZERO_IN_DATE NO_ZERO_DATE ERROR_FOR_DIVISION_BY_ZERO NO_AUTO_CREATE_USER NO_ENGINE_SUBSTITUTION. Brief on new changed modes ONLY_FULL_GROUP_BY and STRICT_TRANS_TABLES modes were added in MySQL 5.7.5 NO_AUTO_CREATE_USER…

View On WordPress

#active-record #database #devops #ei-capitan #mac #mysql #osx #rails #ruby #server #software-development

Day 15 (W3D5-7): Recreating Active Record

Thank the gods it's Friday Friday was a short day because there was an event going on in our usual room. I left around two thinking I would continue working on it at home. Later, that night I realized I would have to force myself. This is one place where pair programming comes in handy. Unlike with Checkers, recreating Active Record is taking a lot of willpower to accomplish. Saturday Doing it today wasn't so bad. I mean there were many distract things around, but I kept at it (with breaks). Apparently putting it on Github gives me that accountability that I need to finished it and/or make better versions of it. Sunday Recreating Active Record is interesting in its own way. The victories were still sweet but they were slower in coming. One of my biggest problems was not knowing the syntax (especially with heredocs, unknown number and names of variables, and SQL queries). I would know what I wanted it to do, but how do I express it properly? Another one was not always knowing what self was. Sometimes, I was expecting it to be a custom object, but it was really the class. And then visa versa.

#Today I Learned #ruby #active-record #App-Academy

Speeding up CSV imports with rails

I was recently working on a project that required frequent importing of S3-hosted CSV files containing hundreds of thousands of users. My first pass at the import was fairly standard:

require 'rubygems' require 'fog' require 'csv' start_time = Time.now counter = 0 tenant = Tenant.first connection = Fog::Storage.new({ :provider => 'AWS', :aws_access_key_id => 'xxx', :aws_secret_access_key => 'xxx' }) directory = connection.directories.get("xxx") file = directory.files.get('imports/test-import-medium.txt') body = file.body CSV.parse(body, col_sep: "|", headers: true) do |row| row_hash = row.to_hash user = User.new( first_name: row_hash["FirstName"], last_name: row_hash["LastName"], address: row_hash["Address1"], address2: row_hash["Address2"], city: row_hash["City"], state: row_hash["State"], zip: row_hash["ZipCode"], email: row_hash["Email"], gender: row_hash["Gender"], ) user.set_random_password user.memberships.build(tenant: tenant, status: Membership::STATUSES[:created]) user.save! counter += 1 end end_time = Time.now puts "#{counter} users imported #{((end_time - start_time) / 60).round(2)} minutes (#{( counter / (end_time - start_time)).round(2)} users/second)"

I ran that version against a test file containing 25,000 records. The result was -- well, I'm not sure what the result was because I stopped it over an hour in when it had only imported 5,000 records.

After scratching my head for a minute and doing a little debugging, I discovered that the slowest part of the script was the "user.save!." Sure, connecting to s3 and streaming down the file wasn't lightning-fast, but the real culprit was saving a user. It was extraordinarily slow for some reason.

Then it hit me, the reason I was setting a password via my custom "set_random_password" method was because the User model used has_secure_password. My workflow didn't even require the user to have a password at this point, but has_secure_password requires a password on creation, so I was passing in a dummy password to pass the validation. The generating the password was fast, but encrypting the passwords was where things were slowing down. I was taking a significant hit for something that I didn't even need.

So, my first step was to rip out has_secure_password and write my own encryption/authentication methods. That way, I had control over the validation which meant I no longer needed to pass in a password during this import process. That alone was a huge gain. After that change, I re-ran the import and the result was:

25000 users imported 12.25 minutes (34.01 users/second)

12 minutes was a significant improvement from the first pass, but at that rate, it would still take 3-5 hours to process a file containing 200k rows (assuming the rate slowed with larger files). So, I began looking for big optimization gains.

After some quick googling, I found https://github.com/zdennis/activerecord-import. This looked extremely promising, so I cranked out some code to test it out

start_time = Time.now counter = 0 tenant = Tenant.first users = [] ... CSV.parse(body, col_sep: "|", headers: true) do |row| row_hash = row.to_hash user = User.new( first_name: row_hash["FirstName"], last_name: row_hash["LastName"], address: row_hash["Address1"], address2: row_hash["Address2"], city: row_hash["City"], state: row_hash["State"], zip: row_hash["ZipCode"], email: row_hash["Email"], gender: row_hash["Gender"] ) user.memberships.build(tenant: tenant, status: Membership::STATUSES[:created]) users << user counter += 1 end User.import users end_time = Time.now puts "#{counter} users imported #{((end_time - start_time) / 60).round(2)} minutes (#{( counter / (end_time - start_time)).round(2)} users/second)"

The good news: it was FAST

25000 users imported 1.57 minutes (265.99 users/second)

The bad news, it completely ignored the build() association and didn't save the membership records. This was because activerecord-import can't handle associations. When it inserts records, nothing is returned, so building an associated model just wouldn't work. As frustrating as that was though, I wasn't willing to abandon the activerecord-import path -- 265 records/second was a huge impremovent over 34.

My next pass is where things got a little less attractive, but I was willing to sacrafice some elegance for the sake of speed. I decided to import the users, then loop through the CSV again, get the User.id for each freshly inserted record, manually build a Membership model, then import an array of Memberships. My theory was that doing two loops and a lookup of every user to get their id would still be faster than a non activerecord-import solution.

require 'rubygems' require 'fog' require 'csv' start_time = Time.now counter = 0 tenant = Tenant.first users = [] memberships = [] ... CSV.parse(body, col_sep: "|", headers: true) do |row| row_hash = row.to_hash user = User.new( first_name: row_hash["FirstName"], last_name: row_hash["LastName"], address: row_hash["Address1"], address2: row_hash["Address2"], city: row_hash["City"], state: row_hash["State"], zip: row_hash["ZipCode"], email: row_hash["Email"], gender: row_hash["Gender"] ) users << user counter += 1 end User.import users counter = 0 CSV.parse(body, col_sep: "|", headers: true) do |row| row_hash = row.to_hash user = User.where("email = ?",row_hash["Email"]).first membership = Membership.new( tenant: tenant, status: Membership::STATUSES[:created], user: user ) memberships << membership counter += 1 end Membership.import memberships

I ran it again and the results were:

25000 users imported 2.72 minutes (153.3 users/second)

OK, so not as fast as before, but still plenty fast. Even with 200k records, this would certainly finish within 30-45 minutes. I felt good about the path I was on, but began looking for other improvements. First, I was getting an entire user record in the second loop, when I really only needed the User.id:

user = User.where("email = ?",row_hash["Email"]).first

So, I replaced that line with the following:

user_id = User.where("email = ?",email).select(:id).first.id

I was also making unecessary calls within the loop to get the value of Membership::STATUSES[:created], so I moved that out of the loop and set a "status" variable once. I was also passing in an entire Tenant object, when again, all I needed was the Tenant.id, so I fixed that too.

My final version looked like this

... start_time = Time.now users = [] memberships = [] counter = 0 tenant_id = Tenant.first.id status = Membership::STATUSES[:created] ... CSV.parse(body, col_sep: "|", headers: true) do |row| row_hash = row.to_hash user = User.new( first_name: row_hash["FirstName"], last_name: row_hash["LastName"], address: row_hash["Address1"], address2: row_hash["Address2"], city: row_hash["City"], state: row_hash["State"], zip: row_hash["ZipCode"], email: row_hash["Email"], gender: row_hash["Gender"] ) users << user counter += 1 end User.import users counter = 0 CSV.parse(body, col_sep: "|", headers: true) do |row| row_hash = row.to_hash user_id = User.where("email = ?",row_hash["Email"]).select(:id).first.id membership = Membership.new( tenant_id: tenant_id, status: status, user_id: user_id ) memberships << membership counter += 1 end Membership.import memberships end_time = Time.now puts "#{counter} users imported in #{((end_time - start_time) / 60).round(2)} minutes."

And when I ran it, I got the following

25000 users imported 2.4 minutes (173.71 users/second)

Those last little optimizations had a 20 records per second improvement. From where I started, the import process went from 5-10 hours to under thirty minutes for 200k records. That's a huge improvement and the time savings will yield huge rewards for my client. Not a bad day's work.

#rails #active-record #optimization #CSV #activerecord-import

基于stack的运行时环境

完全静态运行时环境：

最简单的运行时环境，不支持函数递归调用，如Fortran77

所有数据都是静态的，不论全局或局部变量，都是静态分配的，可以通过固定的地址直接访问

在程序执行期间内存中的数据都是固定的，没有指针和动态内存分配

基于栈的运行时环境：

栈用来保存函数/过程调用中产生的活跃记录

允许递归调用且在递归调用过程中要重新分配局部变量，不能静态分配活动记录

栈是向下增长（向地址空间减少的方向），堆是向上增长（向地址空间增加的方向）

当一个函数被调用时，被调用函数的活跃记录将会被压入栈中，被调用函数在执行过程中，使用栈顶的内存，之后弹出活跃记录，栈顶为调用函数的活跃记录，恢复调用函数的现场，继续执行

活跃记录（Activate Record）：

control link 指向调用者的活跃记录

access link 被调用者可能需要的其他值，比如在其他活跃记录中保存的变量

一个函数调用中各个参数的压栈顺序是：从右向左压栈，只是个实现问题

活动记录就是函数调用时被压入栈的数据总称

实参一般称argument，形参一般称parameter

涉及函数调用还有一个指向当前AR的指针，称为框架指针(frame pointer)或fp，通常fp保存在寄存器中

#stack #运行时环境 #active-record #AR #栈