omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Download "plain text" (HTML) document and save content as text

    Pythonista
    downloads
    2
    4
    7728
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Wizardofozzie
      Wizardofozzie last edited by

      I'll use this FAQ as the example document: http://www.gamefaqs.com/ps3/959558-fallout-new-vegas/faqs/61226.

      Appending ?print=1 as a parameter, ie, http://www.gamefaqs.com/ps3/959558-fallout-new-vegas/faqs/61226**?print=1**, simplifies the document for printing such that in a browser, the content appears to be plaintext (of course it's not, it's HTML).

      How can I save the document with Pythonista as actual plaintext?

      1 Reply Last reply Reply Quote 1
      • omz
        omz last edited by

        You could try using the html2text module (included in Pythonista, though not listed in the documentation).

        Wizardofozzie 1 Reply Last reply Reply Quote 1
        • Wizardofozzie
          Wizardofozzie last edited by Wizardofozzie

          # coding: utf-8
          ## Download a GameFaqs.com FAQ in printable text format
          ## eg http://www.gamefaqs.com/ps3/959558-fallout-new-vegas/faqs/61226
          ## v0.1
          
          import os, sys, re, random, appex, console, clipboard, html2text, requests, dialogs
          
          
          RE_URL = re.compile(ur'^http(s)?://(www\.)?gamefaqs\.com/.*/faqs/[0-9]{3,8}$', re.IGNORECASE)
          
          
          def main():
              if appex.is_running_extension():
                  url = appex.get_url()
              else:
                  url = clipboard.get().strip()
                  if not RE_URL.match(url):
                      try:
                          url = console.input_alert("Enter gamefaqs URL", "", "https://www.gamefaqs.com/")
                      except KeyboardInterrupt:
                          sys.exit(0)
              
              newurl = "{0}?print=1".format(url)
              if RE_URL.match(url):
                  h = html2text.HTML2Text()
                  r = requests.get(
                                   url=newurl, 
                                   headers={"User-agent": "Mozilla/5.0{0:06}".format(random.randrange(999999))}
                                   )
                  html_content = r.text.decode('utf-8')
                  rendered_content = html2text.html2text(html_content)
                  filename = url.partition("gamefaqs.com/")[-1].partition("/")[-1].partition("/faqs")[0]+".txt"
                  filepath = os.path.join(os.path.expanduser("~/Documents"), filename)
                  
                  with open(filepath, "w") as fo:
                      fo.write(rendered_content)
                  
                  console.hud_alert("Success! Saved to '~/Documents/{0}'".format(filename), "success")
                  dialogs.share_url("file:///"+filepath)
          
          if __name__ == '__main__':
              main()
          
          1 Reply Last reply Reply Quote 1
          • Wizardofozzie
            Wizardofozzie @omz last edited by

            @omz Yep, see code above, it works!

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Powered by NodeBB Forums | Contributors