Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove picture from data #1

Open
mensfeld opened this issue Feb 8, 2014 · 3 comments
Open

Remove picture from data #1

mensfeld opened this issue Feb 8, 2014 · 3 comments

Comments

@mensfeld
Copy link

mensfeld commented Feb 8, 2014

Hey, is there a way to remove once indexed file?

@mrkamel
Copy link
Owner

mrkamel commented Feb 9, 2014

Hi, not yet, but i'll add the feature.
Thx for proposing it.

@mensfeld
Copy link
Author

mensfeld commented Feb 9, 2014

Hey - well I "fixed" that on my side by replacing (same ID, same keywords) proper image with 1x1 small gif ;) That way the original file is removed. I also wrote a simple wrapper around RestClient for Similarity. You can take a look ;)

require "base64"
require 'ostruct'

# A simple wrapper for Similarity library
# @see https://github.com/mrkamel/similarity
class Similarity

  # We should raise it when we get response different than expected
  class InvalidResponse < Exception; end
  # We should raise it when we try to process non-existing file
  class NonExistingFile < Exception; end

  # We can't remove data directly, from Lire, so instead we replace
  # given record, with 1x1 GIF file. This is its's content
  NON_EXISTING_FILE = 'R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7'

  HOST      = Settings.similarity.host
  PORT      = Settings.similarity.port
  LIMIT     = Settings.similarity.limit
  NAMESPACE = Settings.similarity.namespace
  SOURCE    = "http://#{HOST}:#{PORT}"

  # Adds a new record about image to similarity
  # @param file [String, Tempfile] path to a file or a ready tempfile
  # @param file_id [Integer] id of this file
  # @param namespace [String] a namespace/keywords describing image - thanks
  #   to this value, we can narrow search to a given scope/namespace instead
  #   of querying all the images
  # @return [true]
  # @raise [InvalidResponse] InvalidResponse if something bad happened
  # @example Create a new record in Similarity
  #   Similarity.create('/path/to/file.jpg', 123, 'Artworks')
  def self.create(file, file_id, namespace = NAMESPACE)
    if file.is_a?(String)
      self.ensure(file)

      file = File.new(file)
    end

    response = self.post(
      "#{SOURCE}/uploads", 
      file: file, 
      id: file_id.to_s, 
      text: namespace
    )

    raise InvalidResponse, response unless response == ""

    true
  end

  # Removes information 
  # @param file_id [Integer] id of this file
  # @param namespace [String] a namespace/keywords of image that we want to remove
  # @return [true]
  # @raise [InvalidResponse] InvalidResponse if something bad happened
  # @example Remove information about given file
  #   Similarity.destroy(123, 'Artworks')
  def self.destroy(file_id, namespace = NAMESPACE)
    self.create(blank_file, file_id, namespace)
  end

  # Searches for similar images
  # @param file [String] path to a file that we want to check fior similarity
  # @param namespace [String] a namespace/keywords of image that we want to remove
  # @param limit [Integer] maximum number of similar images
  # @return [Array<OpenStruct>] Array of similar images
  # @example Search for similar images
  #   Similarity.search('path/to/file.jpg') #=> [#<Openstruct id="12">]
  def self.search(file, namespace = NAMESPACE, limit = LIMIT)
    self.ensure(file)
    file = File.new(file)

    response = self.post(
      "#{SOURCE}/search", 
      file: file,
      q: namespace, 
      start: 0, 
      limit: limit
    )

    data = [Hash.from_xml(response)['response']['result']].flatten
    data.collect{ |r| OpenStruct.new(r) }
  end

  private

  # Performs a RestClient post request and handles errors
  # @param url [String] url where we should send post
  # @param params [Hash] request params
  # @example
  #   post('http://localhost:8984/search', id: 1)
  def self.post(url, params = {})
    RestClient.post(url, params)
  rescue RestClient::InternalServerError => e
    raise InvalidResponse, e.to_s
  end

  # Ensures that we won't work on non-existing file
  # @param [String] file path
  # @raise [NonExistingFile] raised when given file doesn't exist
  def self.ensure(file)
    raise NonExistingFile, file unless File.exists?(file)
  end

  # @return [Tempfile] tempfile with 1x1 GIF image
  def self.blank_file
    @blank_file ||= Tempfile.new('').tap do |tmp|
      tmp.binmode
      tmp << Base64.decode64(NON_EXISTING_FILE)
      tmp.rewind
    end
  end

end

Although in the end I won't be using Similarity (I don't have much record and pure hamming distance is enough) - I love your solution!

Ah and one more thing - there's a new Lire version that works great with transparent PNG files (the one that you're using is quite old) - in a spare time would be awesome if you could update it :)
Great job btw!

@mrkamel
Copy link
Owner

mrkamel commented Feb 9, 2014

Thanks a lot! I'll update Lire asap as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants