omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    dropbox.put_file() chopping off bytes

    Pythonista
    2
    5
    3730
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • roosterboy197
      roosterboy197 last edited by

      I don't understand what's going on here so hopefully one of you can enlighten me.

      I subscribe to the RSS feed for http://kupu.maori.nz in MrReader and have a service set up to send a post to a Pythonista script to extract the word of the day and append it to a text file in my Dropbox. But every time it runs, the file shrinks by a number of bytes, usually between 2 and 25. I can't see any correlation between the size of the appended string and the size of the shrinkage.

      What am I doing wrong? I'm betting it's something simple I've missed but I dunno.

      #coding: utf-8
      import sys
      import dropboxlogin
      from dropbox import rest
      import console
      import locale
      import webbrowser
      
      dropboxlogin.app_key = '...'
      dropboxlogin.app_secret = '...'
      
      DB_FOLDER = '/flashcards/'
      DB_FILE = 'Māori.cards.txt'
      
      def main():
          locale.setlocale(locale.LC_ALL, '')
          
          #extract the word of the day from the RSS text
          rss = sys.argv[1]
          new_word = rss.split('.')[0]
          new_word = new_word.replace(': ', ' :: ')
          #this gives us text like "ngeru :: cat"
          
          try:
              db = dropboxlogin.get_client()
              ff, md = db.get_file_and_metadata(DB_FOLDER + DB_FILE)
              wordlist = ff.read().decode('utf-8').splitlines()
              ff.close()
              print(md) #to check our starting size
              wordlist.append(new_word)
              wordlist = list(set(wordlist))
              wordlist.sort(cmp=locale.strcoll)
              md = db.put_file(DB_FOLDER + DB_FILE, u'\n'.join(wordlist), overwrite=True)
              print(md) #to see how much we've shrunk
              console.hud_alert('added {}'.format(new_word))
          except rest.ErrorResponse as e:
              console.alert('Error - add_maori.py', message='{}\n'.format(e))
          
          webbrowser.open('mrreader://')
      
      if __name__ == '__main__':
          main()
      
      1 Reply Last reply Reply Quote 0
      • ccc
        ccc last edited by

        I would try...

        DB_FOLDER = '/flashcards/'
        DB_FILE = 'Māori.cards.txt'
        DB_FILEPATH = DB_FOLDER + DB_FILE  # use DB_FILEPATH in your main program
        
        # ...
                print('before', len(wordlist), len(''.join(wordlist))
                wordlist = list(set(wordlist))  # is the set() operation removing data?
                print(' after', len(wordlist), len(''.join(wordlist))
        
        1 Reply Last reply Reply Quote 0
        • roosterboy197
          roosterboy197 last edited by

          Here are the results of running that several times, with the string I added and the bytes returned from the Dropbox metadata.

          # "whatitoka :: door"
          ('bytes - before', 1157)
          ('before', 60, 1091)
          ('after', 60, 1091)
          ('bytes - after', 1150)
          # 17 characters, -7 bytes
          
          # "awa :: river"
          ('bytes - before', 1150)
          ('before', 60, 1083)
          ('after', 60, 1083)
          ('bytes - after', 1142)
          # 12 characters, -8 bytes
          
          # "hapa :: dinner"
          ('bytes - before', 1142)
          ('before', 60, 1078)
          ('after', 60, 1078)
          ('bytes - after', 1137)
          # 14 characters, -5 bytes
          
          # "āporo :: apple"
          ('bytes - before', 1137)
          ('before', 60, 1073)
          ('after', 60, 1073)
          ('bytes - after', 1132)
          # 14 characters, -5 bytes
          
          # "hgtj :: gfrd" - random characters
          ('bytes - before', 1132)
          ('before', 60, 1066)
          ('after', 60, 1066)
          ('bytes - after', 1125)
          # 12 characters, -7 bytes
          
          # "hgtjfj :: hytgfrd" - random characters
          ('bytes - before', 1125)
          ('before', 59, 1064)
          ('after', 59, 1064)
          ('bytes - after', 1122)
          # 17 characters, -3 bytes
          

          I can't really see any pattern here. After seeing the byte difference was -5 for both 14-character strings, I tested using same length strings as earlier examples (17 and 12) but got different results.

          1 Reply Last reply Reply Quote 0
          • roosterboy197
            roosterboy197 last edited by

            Yeah, now I'm really confused. I woke up this morning thinking "hmm, maybe it's the sort", added in just one print statement and now my output looks like this:

            # "hgtjfj :: hytgfrd"
            ('bytes - before', 1122)
            ('before', 59, 1062)
            ('after', 58, 1045)
            ('sorted', 58, 1045)
            ('bytes - after', 1102)
            # 17 characters, -20 bytes
            # "hgtjfj :: hytgfrd"
            ('bytes - before', 1102)
            ('before', 58, 1044)
            ('after', 57, 1027)
            ('sorted', 57, 1027)
            ('bytes - after', 1083)
            # 17 characters, -19 bytes
            # "hgtjfj :: hytgfrd"
            ('bytes - before', 1083)
            ('before', 57, 1027)
            ('after', 56, 1010)
            ('sorted', 56, 1010)
            ('bytes - after', 1065)
            # 17 characters, -18 bytes
            # "hgtjfj :: hytgfrd"
            ('bytes - before', 1065)
            ('before', 57, 1010)
            ('after', 56, 993)
            ('sorted', 56, 993)
            ('bytes - after', 1048)
            # 17 characters, -17 bytes
            

            Now the wordlist = list(set(wordlist)) line is removing data! And it looks like the byte count of the lost data is dropping by one each time. I just don't understand.

            1 Reply Last reply Reply Quote 0
            • roosterboy197
              roosterboy197 last edited by

              And that's why I shouldn't code first thing out of bed in the morning. Of course it's doing that; I'm reusing the same input each time so changing it to a set strips out the dupe. Fixing that gives me this output:

              # "12hgtjfj :: 34hytgfrd"
              ('bytes - before', 1048)
              ('before', 56, 999)
              ('after', 56, 999)
              ('sorted', 56, 999)
              ('bytes - after', 1054)
              # 21 characters, 6 bytes
              

              So it's not the sort.

              But I just noticed that my byte count increased this time!

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Powered by NodeBB Forums | Contributors