koderun
koderun
Reputation Top 10%
Martin Myburgh
" to grasp the essence of a person can't be done in one breath; but you can feel the breeze in an instant " - me
1 Snippet  (316th place)
Published
1 Channel
Created
0 Channels
Following
Mar 2, 2017
Last Visit
Mar 2, 2017
Registered
31 points  (900th place)
Reputation
Junior Popular Coder
Junior Autobiographer
Serious Autobiographer
Master Autobiographer

Recent Snippets See all snippets by koderun

public by Ricardo Silva created Jan 27, 2016  357  0  3  0

Extract contents of a .webarchive file.

Extract contents of a .webarchive file.: unwebarchive.rb
#!/usr/bin/env ruby
#
# Mac OS X webarchive is a binary format of a plist file. You can extract the contents manually:
#  1. convert the plist file into XML by "plutil -convert xml1 file.webarchive"
#  2. parse the resulted XML file by some XML parser
#  3. decode "WebResourceData" by Base64.decode64(data) in each key
#  4. save the decoded content into a file indicated by "WebResourceData"
# Thankfully, the plist library can take care of annoying steps 2 and 3.
#
# Preparation:
#  % gem install plist
#
# Usage:
#  % unwebarchive.rb filename.webarchive
#
# Result:
#  You'll find the extracted contents under the 'filename/' directory.
#

require 'rubygems'
require 'fileutils'
require 'plist'

webarchive = ARGV.shift
exportdir = File.basename(webarchive, ".webarchive")

class UnWebarchive

  def initialize(webarchive, exportdir)
    @file = webarchive
    @dir  = exportdir

    prepare_exportdir
    parse_webarchive
  end

  def prepare_exportdir
    if File.exists?(@dir)
      print "Override existing export directory '#{@dir}' [Yes/No]? "
      exit 1 unless gets.chomp[/^y(es)?$/i]
    end
    FileUtils.mkdir_p(@dir)
    FileUtils.cp(@file, @dir)
  end

  def parse_webarchive
    FileUtils.cd(@dir) do
      system("plutil -convert xml1 #{@file}")
      plist = Plist.parse_xml(File.read(@file))
      file = plist["WebMainResource"]["WebResourceURL"]
      data = plist["WebMainResource"]["WebResourceData"].read
      data.gsub!(/file:\/\/\//, './')
      export(file, data)
      plist["WebSubresources"].each do |res|
        file = res["WebResourceURL"]
        data = res["WebResourceData"].read
        export(file, data)
      end
    end
  end

  def export(resource_uri, resource_data)
    if resource_uri[/^file:/]
      name = resource_uri.sub('file:///', '')
      puts "Writing '#{@dir}/#{name}' ..."
      File.open(name, "w") do |file|
        file.print resource_data
      end
    end
  end
end

UnWebarchive.new(webarchive, exportdir)


;