Skip to content

Write a Darlingtonia importer

Bess Sadler edited this page Feb 6, 2019 · 11 revisions

Darlingtonia Importer Goals:

  • Write a CSV importer using the darlingtonia ruby gem
  • Be able to point to the parts of the importer

Setup

(OPTIONAL) Save your current changes

If you have changes in your current branch -- you can check on this via git status -- you'll want to save those before starting this lesson (which uses a separate branch):

  • git checkout -b your_branch_name
  • git add .
  • git commit -m 'checkpoint before beginning darlingtonia importer'

Check out working branch

git checkout simple_importer

NOTE: If you make experimental changes and want to get back to the minimal code state necessary to run this lesson, you can check the starting code out again using:
git checkout simple_importer

1. Write a test for the darlingtonia importer

Since we want to achieve the same goals for our second importer, the test is going to look pretty familiar. Make a file in the spec/importers folder called modular_importer_spec.rb and paste the following content into it:

# frozen_string_literal: true

require 'rails_helper'
require 'active_fedora/cleaner'

RSpec.describe ModularImporter, :clean do
  let(:modular_csv) { 'spec/fixtures/csv_files/modular_input.csv' }
  let(:user) { ::User.batch_user }

  before do
    ENV['IMPORT_PATH'] = File.expand_path('../fixtures/images', File.dirname(__FILE__))
    DatabaseCleaner.clean
    ActiveFedora::Cleaner.clean!
  end

  it "imports a csv" do
    expect { ModularImporter.new(modular_csv).import }.to change { Image.count }.by 3
  end

  it "puts the title into the title field" do
    ModularImporter.new(modular_csv).import
    expect(Image.where(title: 'A Cute Dog').count).to eq 1
  end

  it "puts the url into the source field" do
    ModularImporter.new(modular_csv).import
    expect(Image.where(source: 'https://www.pexels.com/photo/animal-blur-canine-close-up-551628/').count).to eq 1
  end

  it "creates publicly visible objects" do
    ModularImporter.new(modular_csv).import
    imported_work = Image.first
    expect(imported_work.visibility).to eq 'open'
  end

  it "attaches files" do
    allow(AttachFilesToWorkJob).to receive(:perform_later)
    ModularImporter.new(modular_csv).import
    expect(AttachFilesToWorkJob).to have_received(:perform_later).exactly(3).times
  end
end

Run this test and you should again see an error saying it can't find the expected class:

NameError:
  uninitialized constant ModularImporter

2. Make the ModularImporter class

Make a file called app/importers/modular_importer.rb that contains just enough of an importer class that your test can run and give a meaningful error:

  class ModularImporter
    def initialize(csv_file)
      @csv_file = csv_file
      raise "Cannot find expected input file #{csv_file}" unless File.exist?(csv_file)
    end

    def import
    end
  end
  1. Run your test:
bundle exec rspec spec/importers/modular_importer_spec.rb

It should fail with a message like

expected `Image.count` to have changed by 3, but was changed by 0

So, at this point, your test is running, but the importer isn't yet creating any records.

3. Get the test passing

  1. Add the darlingtonia gem to your Gemfile and run bundle install:
gem 'darlingtonia', '~> 2.0'
  1. Edit app/importer/modular_importer.rb so it looks like this:
  require 'darlingtonia'

  class ModularImporter
    def initialize(csv_file)
      @csv_file = csv_file
      raise "Cannot find expected input file #{csv_file}" unless File.exist?(csv_file)
    end

    def import
      file = File.open(@csv_file)
      Darlingtonia::Importer.new(parser: Darlingtonia::CsvParser.new(file: file), record_importer: Darlingtonia::HyraxRecordImporter.new).import
      file.close # Note that we must close any files we open.
    end
  end
  1. Now your test should pass with output something like this:
ModularImporter
Creating record: ["A Cute Dog"].Record created at: jw827b648Record created at: jw827b648Creating record: ["An Interesting Cat"].Record created at: 3n203z084Record created at: 3n203z084Creating record: ["A Flock of Birds"].Record created at: wm117n96bRecord created at: wm117n96b  imports a csv

Finished in 7.56 seconds (files took 9.06 seconds to load)
1 example, 0 failures

4. Write a rake task and run this importer in your development environment

Make a file called lib/tasks/darlingtonia_import.rake and paste the following code into it:

# frozen_string_literal: true
namespace :csv_import do
  desc "Load sample CSV"
  task darlingtonia_import: :environment do
    ENV['IMPORT_PATH']=Rails.root.join('spec', 'fixtures', 'images').to_s
    Rake::Task["hyrax:default_admin_set:create"].invoke
    Rake::Task["hyrax:default_collection_types:create"].invoke
    Rake::Task["hyrax:workflow:load"].invoke
    load_csv_sample
  end

  def load_csv_sample
    csv_sample = Rails.root.join('spec', 'fixtures', 'csv_files', 'modular_input.csv')
    ModularImporter.new(csv_sample).import
  end
end

Run the rake task (rake csv_import:darlingtonia_import) and visit localhost:3000/catalog to see the imported objects.

Note: You can see the changes we made in this section on github.

For discussion:

  1. How do you attach more then one file to an object in this importer?
  2. How do you specify where the files are on disk?
  3. What happens if a future version of our CSV file has the headings in a different order?
  4. What do you need to do if you want to add another of the core Hyrax metadata fields to the data?
  5. Can you identify the parts of an importer we talked about? Where is the:
  • top level kickoff?
  • parser?
  • mapper?
  • record importer?
  • logger?