omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Script crashes when making bulk upload into local server (Synology NAS)

    Pythonista
    3
    8
    2148
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • aldoblack
      aldoblack last edited by

      I am writing a script to upload files and photos into my Synology NAS server and during testing the script crashes after a period of time. I have around 1400 photos in my phone and it always crashes between 900-1000. At first I thought that it might be some corrupt file that makes this crash and tested starting from file 800 and onwards but it does not crash.

      Any idea what causes this crash when trying to upload files between 900-1000? There are no error codes, the app just closes.

      1 Reply Last reply Reply Quote 0
      • ccc
        ccc last edited by ccc

        Pythonista might not be deleting memory that you are freeing up. You might add a deliberate del variable_name to suggest that the garbage collector free up large memory allocations like images and videos.

        1 Reply Last reply Reply Quote 0
        • aldoblack
          aldoblack last edited by

          @ccc tried but still crashes. This is the code:

          import appex
          import photos
          import requests
          import os
          import editor
          import sys
          from objc_util import *
          
          from datetime import datetime
          
          
          class Synology(object):
              def __init__(self, url, port, account, passwd):
                  """
                  
                  :param url: IP/URL or Synology. So far it only supports HTTP.
                  :param port:
                  :param account:
                  :param passwd:
                  :returnv:
                  """
                  self.url = url
                  self.port = port
                  
                  self.account = account
                  self.passwd = passwd
          
                  self.sid = None
          
                  # self.session = requests.Session()
          
              def login(self):
                  print("Logging in...")
          
                  param = {
                      "version": "2",
                      "method": "login",
                      "account": self.account,
                      "passwd": self.passwd,
                      "session": "FileStation",
                      "format": "cookie"
                  }
          
                  url = f"http://{self.url}:{self.port}/webapi/auth.cgi?api=SYNO.API.Auth"
          
                  r = requests.get(url, param, verify=False)
                  r = r.json()
          
                  if not r["success"]:
                      raise Exception("Authenticatoin failed. Note that account with 2FA enabled does not work.")
          
                  print("Logging successful.")
          
                  self.sid = r["data"]["sid"]
                  return self.sid
          
              def upload_file(self, files):
                  upload_url = f"http://{self.url}:{self.port}/webapi/entry.cgi?api=SYNO.FileStation.Upload&version=2&method=upload&_sid={self.sid}"
          
                  args = {
                      "path": "/home/b",
                      "create_parents": "true",
                      "overwrite": "true"
                  }
          
                  # Ignpre this ugly code. I am trying to figure out if I can do bulk upload.
                  file = {'file': (files[0]["fname"], files[0]["payload"], 'application/octet-stream')}
                  # file = [('files', (x["fname"], x["payload"], "application/octet-stream")) for x in files]
          
                  r = requests.post(upload_url, data=args, files=file, verify=False)
          
                  del upload_url
                  del args
                  del file
                  del r
                  del files
          
          
          
          def main():
              # if not appex.is_running_extension():
              #   print("Sript is intended to be un from sharing extension.")
              #   return
          
              print("Starting...")
              s = Synology(url="1.1.1.1", port="1000", account="1", passwd="1")
              s.login()
          
          
              startTime = datetime.now()
          
              all_photos = photos.get_assets()
          
              # images = []
          
              for idx, asset in enumerate(all_photos):
          
                  if idx % 100 == 0:
                      print(f"Uploading... {idx}/{len(all_photos)}")
          
                  #fname = str(ObjCInstance(asset).valueForKey_('filename'))
                  #payload = asset.get_image_data(original=True)
          
                  # images = [{"fname": fname, "payload": payload}]
          
                  s.upload_file([{"fname": str(ObjCInstance(asset).valueForKey_('filename')), "payload": asset.get_image_data(original=True)}])
                  del asset
                  # print(fname)
          
              print(datetime.now() - startTime)
          
          if __name__ == "__main__":
              main()
          
          1 Reply Last reply Reply Quote 0
          • JonB
            JonB last edited by

            I believe you will need to do the following:

            1. ObjCInstance(asset) should be on its own line, assigned to a variable, so that you can del the instance after it is created. You might also need to delete the instance method (if I recall, the release method has a reference cycle of some sort).

            2. when dealing with many images, you MUST have an with objc_util.autorelease_pool(): wrapping the code that grabs the asset and data, otherwise the asset won't actually get released (even if you del it).

            1 Reply Last reply Reply Quote 0
            • JonB
              JonB last edited by

              Aha,
              https://forum.omz-software.com/topic/3844/photos-asset-get_image_data-is-there-a-memory-leak/6

              I am not on my iPad, so not sure if this but was ever fixed.... But when the ObjCInstance is retained, the retain cached method creates a reference cycle. It is necessary to modify objc_util to avoid the leak by deleting the cached methods of the objc instance.

              So,what I think you need to do is:
              A) wrap your upload line in an autorelease_pool() context handler
              B) inside that handler, also add:

              ObjCInstance(asset)._cached_methods = {} 
              del asset # probably not needed
              

              I think it is possible that making the change to your objc_util.py effectively does the same thing, but I have not tried it recently.

              1 Reply Last reply Reply Quote 0
              • JonB
                JonB last edited by

                ok. turns out that autoreleasepool() does not work -- it works once, but not upon repeated script run. This appears to be fairly robust, and does not leak:

                
                from objc_util import *
                #from objc_util import autoreleasepool #this doesnt work properly
                import photos
                import sys, os
                
                try:
                   sys.path+=os.path.expanduser('~')
                   from objc_hacks.memstatus import _get_taskinfo
                except:
                   pass # if memstatus is not installed,or not supported device
                
                import gc
                from PIL import Image
                all_photos=photos.get_assets()
                for idx, asset in enumerate(all_photos):
                      pool=ObjCClass('NSAutoreleasePool').new()
                      try:
                         if idx % 10 == 0:
                            print(f"Uploading... {idx}/{len(all_photos)}")
                            if _get_taskinfo:
                               print(_get_taskinfo().resident_size/1024/1024)
                         fname = str(ObjCInstance(asset).valueForKey_('filename'))
                         payload = asset.get_image_data(original=True)
                         #do the actual upload....
                         # upload(....)
                         
                         # this is the important bit, all objcinstances must have cached methods cleared manually in order to get gc'd
                         ObjCInstance(asset)._cached_methods.clear()
                
                      finally:
                         pool.drain()
                         pool._cached_methods.clear()
                         del pool
                         ObjCClass('NSAutoreleasePool').releaseAllPools()
                
                
                
                

                They keys:

                1. using NSReleasePool manually, which requires a context manager or try/finally to ensure it gets releaseAllPools at the end. (the releaseAllPools could go at the end if the script, maybe, but it needs to happen before gc gets called, or script is run again, i think)

                2. inside the pool, any ObjCInstance's that get created must have
                  ._cached_methods.clear() called to break the ref cycle.
                  I tried with and without del the asset and payload, that is not necessary, but not harmful either

                1 Reply Last reply Reply Quote 2
                • JonB
                  JonB last edited by ccc

                  Just to close out on my investigations here, here is a cleaned up version with a proper context manager to replace the objc_util version, and some extra fluff removed. .

                  
                  from objc_util import *
                  #from objc_util import autoreleasepool #this doesnt work properly
                  import photos
                  import sys, os,gc
                  import contextlib
                  
                  
                  try:
                     sys.path+=os.path.expanduser('~')
                     from objc_hacks.memstatus import _get_taskinfo
                  except:
                     pass # if memstatus is not installed,or not supported device
                  
                  from PIL import Image
                  all_photos=photos.get_assets()
                  
                  '''Replacement for objc_util.autoreleasepool that doesnt crash when rerunning script.
                  objc_util's has two issues:   first, a bug in ObcCInstance.__del__ means that  NSAutoreleasePool will have release called, which is never proper for NSAutoreleasePool.  This wouldnt be an issue if the object is alive, since those calls get ignired, it seems.  The second issue affects all ObjCInstances:  the _method_cache prevents timely garbage collection due to a ref cycle.  So ObjCInstances stick around -- and thus never get release called -- until gc is run.   Problem is, once the pool has been drained, we are not guaranteed to have it around later.  Since objc_util doesnt retain it, after draining, it might get released, and then when gc finally breaks the ref cycle, release gets called, and that pointer might be a new object at that point, so release causes a crash'''
                  @contextlib.contextmanager
                  def autoreleasepool():
                     pool = ObjCClass('NSAutoreleasePool').new()
                     try:
                        yield
                     finally:
                        pool.drain()
                        pool._cached_methods.clear() #break ref cycle
                        pool.__del__=None #prevent release from being called
                     
                  
                  for idx, asset in enumerate(all_photos):
                        with autoreleasepool():
                           ''' debugging '''
                           if _get_taskinfo:
                              print(f'\nSTART {_get_taskinfo().resident_size/1024/1024:3.2f} MB')
                              
                           ''' get the data'''
                           fname = str(ObjCInstance(asset).valueForKey_('filename'))
                           payload = asset.get_image_data(original=True)
                           
                           ''' Do some actual work here
                                ... insert upload, save, etc, code
                           ''' 
                  
                           '''Clean up'''
                           del payload # not actually needed         
                           ObjCInstance(asset)._cached_methods.clear() #required !!!
                           del asset # not actually needed
                            
                           ''' debugging '''
                           if _get_taskinfo:
                              print(f'END.  {_get_taskinfo().resident_size/1024/1024:3.2f} MB')
                  
                  

                  As written, the memory is basically constant each cycle:

                  START 76.20 MB
                  END.  76.52 MB
                  
                  START 76.20 MB
                  END.  76.52 MB
                  
                  START 76.20 MB
                  END.  76.94 MB
                  
                  START 76.20 MB
                  END.  76.79 MB
                  
                  START 76.20 MB
                  END.  76.85 MB
                  

                  If i comment out the with autoreleasepool() line, essentially not using a release pool, it should now be obvious why looping phassets never works -- despite deleting the objc objects, the actual PHAsset's data does not get cleared without draining the pool, and we gain several MB per cycle

                  START 94.90 MB
                  END.  96.81 MB
                  
                  START 96.81 MB
                  END.  98.68 MB
                  
                  START 98.68 MB
                  END.  99.94 MB
                  
                  START 99.94 MB
                  END.  100.99 MB
                  
                  START 101.09 MB
                  END.  102.03 MB
                  
                  START 102.02 MB
                  END.  102.89 MB
                  

                  If I use the pool, but comment out the two del lines:

                  START 63.28 MB
                  END.  65.14 MB
                  
                  START 63.70 MB
                  END.  64.77 MB
                  
                  START 63.51 MB
                  END.  62.46 MB
                  
                  START 62.36 MB
                  END.  62.43 MB
                  
                  START 62.35 MB
                  END.  66.07 MB
                  
                  START 64.17 MB
                  END.  66.01 MB
                  

                  we see that there is no net growth, although there is somewhat more variation from cycle to cycle (because the bytesIO object payload is still in scope at the start -- if the meat of the loop is moved to a function, the results are the same with or without the del) . This shows that reference counting gc does do its job -- deleting the bytesIO object (payload) is not necessary (this was a point of contention with @ccc in a previous thread) -- when it falls out of scope, the memory is cleared.

                  Finally, to see the effect of the objc_util bug that prevents ObjCInstances from getting cleared when they fall out of scope, I comment out the
                  ObjCInstance(asset)._cached_methods.clear() ! line:

                  since ObjCInstance objects themselves are not that big, we see a slow leak, growing maybe a 0.1 MB every few hundred images. I seemed to get more crashes if starting/stopping the fhe script many times, but that could be my imagination. So I conclude, this probably not needed unless you are creating many tens of thousands of ObjCInstances, or you need to ensure that the objects release is called at a specific time for some other reason.

                  1 Reply Last reply Reply Quote 2
                  • ccc
                    ccc last edited by ccc

                    Awesome detailed work @JonB . The small leak might become significant after hundreds of loops in an appex environment.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    Powered by NodeBB Forums | Contributors