omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Have .csv file's URL

    Pythonista
    file handling
    4
    13
    10846
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DaveGadgeteer
      DaveGadgeteer last edited by

      (Noob) I have a URL for an internet of things data log of my rainfall. If I open it in a browser, it displays like a 3-column spreadsheet.

      I'd like to get the data into an array so I can process a summary into another array, which I would then plot.

      The URL examples I've found so far don't seem to relate to reading data.

      Can someone point me in a useful direction? Perhaps there's an example that does something similar?

      Dave

      1 Reply Last reply Reply Quote 0
      • ccc
        ccc last edited by ccc

        import csv
        with open(filename, newline='') as in_file:
            for row in csv.reader(in_file):
                print(', '.join(row))
        

        Like in https://github.com/cclauss/Ten-lines-or-less/blob/master/world_bank_data.py

        1 Reply Last reply Reply Quote 0
        • DaveGadgeteer
          DaveGadgeteer last edited by ccc

          Thanks!
          I still am not getting the URL connected to the file properly. In the following, I get the error urlopen not defined, though request.py illustrates it explicitly, and changing requests to request doesn't help: (I don't have a zip file to deal with, so can't follow the linked example directly):

          import csv, io, requests
          
          url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1'
          
          
          with urlopen(url, data=None) as in_file:
              for row in csv.reader(in_file):
                  print(', '.join(row))
          
          1 Reply Last reply Reply Quote 0
          • DaveGadgeteer
            DaveGadgeteer last edited by ccc

            Trying a different variation

            import csv, io, requests, urllib.request
            
            url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1'
            
            in_file=urllib.request.urlopen(url, data=None)
            for row in csv.reader(in_file):
                    print(', '.join(row))
            

            I got the error:
            line 9, in <module>
            for row in csv.reader(in_file):
            _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

            1 Reply Last reply Reply Quote 0
            • JonB
              JonB last edited by

              http://stackoverflow.com/questions/18897029/read-csv-file-from-url-into-python-3-x-csv-error-iterator-should-return-str

              1 Reply Last reply Reply Quote 0
              • ccc
                ccc last edited by ccc

                Always easier with requests...

                import csv, requests
                
                url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv'
                filename = url.split('/')[-1]
                with open(filename, 'wb') as out_file:
                    out_file.write(requests.get(url).content)
                
                # _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
                # change 'rb' to 'r'
                with open(filename, 'rb') as in_file:
                    for row in csv.reader(in_file):
                        print(', '.join(row))
                
                1 Reply Last reply Reply Quote 0
                • ccc
                  ccc last edited by

                  Or if you want to do it all in RAM...

                  #!/usr/bin/env python3
                  
                  import csv, io, requests
                  
                  url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv'
                  with io.StringIO(requests.get(url).text) as mem_file:
                      for row in csv.reader(mem_file):
                          print(', '.join(row))
                  
                  1 Reply Last reply Reply Quote 0
                  • omz
                    omz last edited by

                    @ccc I just learned that csv.reader accepts any iterator, and not just file-like objects, so you could make this slightly shorter:

                    import csv, requests
                    
                    url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv'
                    for row in csv.reader(requests.get(url).text.splitlines()):
                        print(', '.join(row))
                    
                    1 Reply Last reply Reply Quote 2
                    • DaveGadgeteer
                      DaveGadgeteer last edited by

                      Thank you all so very much!
                      I would never have found a solution without your help.
                      The following does what I wanted, almost. Instead of printing the rows, I need the data in an array. But maybe reader is an array of strings that I can parse.

                      More difficult to solve, the sparkfun website is very overloaded and often returns a 503 error and fails. It would be nice to have the software display a message, wait a little, then try again. But maybe the easiest fix is to just run my own server that isn't so busy.

                      # Download and display csv data from rain gauge
                      from contextlib import closing
                      
                      import csv, io, requests, urllib.request, codecs
                      from contextlib import closing
                      url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1'
                      
                      
                      
                      with closing(requests.get(url, stream=True)) as r:
                          reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'utf-8'))
                          for row in reader:
                              print (row)  ```
                      1 Reply Last reply Reply Quote 0
                      • omz
                        omz last edited by omz

                        Here's an idea how you could deal with the server errors, and also a starting point for parsing and plotting your data.

                        import csv
                        import requests
                        import time
                        import dateutil
                        import matplotlib.pyplot as plt
                        
                        url = 'http://data.sparkfun.com/output/YGa69ObX6WFj9mYa4EmW.csv?page=1'
                        
                        r = requests.get(url)
                        retry_count = 0
                        while r.status_code != 200 and retry_count < 10:
                        	print('status code %i, retrying...' % r.status_code)
                        	retry_count += 1
                        	time.sleep(2)
                        	r = requests.get(url)
                        
                        if r.status_code == 200:
                        	dates = []
                        	tips = []
                        	lines = r.text.splitlines()[1:] # Strip header line
                        	for row in csv.reader(lines):
                        		dates.append(dateutil.parser.parse(row[2]))
                        		tips.append(int(row[1]))
                        
                        	plt.plot_date(dates, tips, fmt='-')
                        	plt.show()
                        else:
                        	print('Failed to load data')
                        
                        1 Reply Last reply Reply Quote 0
                        • ccc
                          ccc last edited by ccc

                          @DaveGadgeteer Perhaps the best approach would be to separate the data download from the data parsing as done in https://forum.omz-software.com/topic/4011/have-csv-file-s-url/6. That way, you do not have to hit the URL resource/server so often.

                          You can then read the local file and convert it into a list of wearher_readings ...

                          import collections, csv
                          
                          filename = 'YGa69ObX6WFj9mYa4EmW.csv'
                          
                          with open(filename, 'r') as in_file:
                              data = []  # a list of weather_readings
                              weather_reading = None
                              for row in csv.reader(in_file):
                                  if weather_reading:
                                      data.append(weather_reading(*row))
                                  else:  # create a custom datatype from the header record
                                      weather_reading = collections.namedtuple('weather_reading', row)
                          
                          print('\n'.join(str(x) for x in data))
                          # weather_reading(time='1952', tips='773', timestamp='2017-04-30T20:42:59.017Z')
                          
                          1 Reply Last reply Reply Quote 0
                          • JonB
                            JonB last edited by

                            another possibility is numpy.genfromtext and numpy.recfromcsv, both of which either guess field types and autoconvert, or else let you specify col types. If you are doing any sort of analysis on the data, you want it in a format that numpy can use,

                            1 Reply Last reply Reply Quote 1
                            • ccc
                              ccc last edited by ccc

                              import numpy
                              filename = 'YGa69ObX6WFj9mYa4EmW.csv'
                              a = numpy.recfromcsv(filename)
                              print(a[0])
                              print(a.dtype)
                              

                              Works nicely but the third field is bytes instead of datetime64[ms].

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Powered by NodeBB Forums | Contributors