omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Dropbox files names containing accented characters

    Pythonista
    3
    12
    5340
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • omz
      omz last edited by omz

      I think this would be the correct way to decode the URL:

      url = unicode(urllib.unquote(url), 'utf-8')
      

      or alternatively (but more confusing):

      url = urllib.unquote(url).decode('utf-8')
      

      Edit: Looks like @dgelessus was faster than me...

      1 Reply Last reply Reply Quote 0
      • cvp
        cvp last edited by

        Thanks champions! My code had a misplaced right parenthesis which thus gave a bad answer.
        One more time, shame on me.

        1 Reply Last reply Reply Quote 0
        • cvp
          cvp last edited by

          Sorry, but I still have problems with that.

          Try this short code,
          if the URL is passed "by appex", it is NOT OK
          if the URL is set as text for testing, it's OK

          # coding: utf-8
          import urllib
          import appex
          
          #url = 'https://www.dropbox.com/s/5mmxh7h7vu2lwnp/La%20vie%20tr%C3%A8s%20priv%C3%A9e%20de%20Monsieur%20Sim.png?dl=0'
          url = appex.get_url()
          print url
          print urllib.unquote(url).decode('utf-8')
          
          1 Reply Last reply Reply Quote 0
          • omz
            omz last edited by omz

            appex.get_url() returns a unicode string, so you need an extra encode there...

            import urllib
            
            # This is a unicode string literal (note the 'u' before the quotes), to simulate the behavior of appex.get_url():
            url = u'https://www.dropbox.com/s/5mmxh7h7vu2lwnp/La%20vie%20tr%C3%A8s%20priv%C3%A9e%20de%20Monsieur%20Sim.png'
            print urllib.unquote(url.encode('utf-8')).decode('utf-8')
            

            And no, you're not the only one who finds this very confusing. ;)

            1 Reply Last reply Reply Quote 1
            • cvp
              cvp last edited by

              My god! (Not you, but almost)

              1 Reply Last reply Reply Quote 0
              • omz
                omz last edited by

                The good news is, this kind of stuff is generally a bit easier in Python 3 because pretty much every string is unicode there, and urllib.parse.unquote (the Python 3 equivalent of urllib.unquote) can handle unicode, so it would be just urllib.parse.unquote(url) in Python 3, regardless of whether url was defined as a normal string literal, or returned by appex.get_url.

                1 Reply Last reply Reply Quote 1
                • dgelessus
                  dgelessus last edited by

                  Well this is confusing. Though here the issue looks like it's with urllib.unquote - it seems to be designed for str strings and gets confused with unicode strings. In Python 3 it's a lot better (as always) - there the string is decoded as UTF-8 by default, and you can set a different encoding if necessary.

                  1 Reply Last reply Reply Quote 0
                  • cvp
                    cvp last edited by

                    I'm really still a beginner in Python and, of course, I'll buy the next version, but I hope you'll give some explanation how to convert my scripts for this version, when it would be available.

                    1 Reply Last reply Reply Quote 0
                    • dgelessus
                      dgelessus last edited by

                      @cvp There is the 2to3 tool which can do most of the dumb work for you (e. g. putting parentheses around your print calls). I'm not sure how well it corrects the bytes/str/unicode mess that you need in Python 2. Probably not very much, as it's hard to guess whether a encode or decode is actually necessary or just a compatibility hack.

                      1 Reply Last reply Reply Quote 0
                      • cvp
                        cvp last edited by

                        Ok, I'll try to remember when I'll use Python 3, thanks

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Powered by NodeBB Forums | Contributors