omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    BeautifulSoup Bug

    Pythonista
    3
    10
    8442
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • n8henrie142
      n8henrie142 last edited by

      I have a script that is running perfectly on my Mac, but giving me an error in Pythonista.

      BeautifulSoup is throwing an AttributeError: 'NoneType' object has no attribute 'next_element' on finding all data points in an HTML table: soup.find('table').find_all('td').

      I can verify that soup appears correct and has the td that I'm looking for. I can print soup.find('table') in the console and it is correct. I can break it down to table = soup.find('table'); table.find_all('td'); and it still doesn't work. I've tried changing to the old .findAll instead of .find_all and that doesn't work either.

      In fact, even soup.find('table').find('td') works correctly, but gives the error when changing .find('td') to .find_all('td').

      find_all seems to work in some contexts, e.g. `bs4.BeautifulSoup(requests.get('http://omz-software.com').content).find('p').find_all('a') seems to work fine.

      I can verify the identical code (synced by Dropbox) works fine on Python 2.7.8 in OS X.

      Has anyone run into this?

      1 Reply Last reply Reply Quote 0
      • JonB
        JonB last edited by

        Have you tried saving off the soup and trying on your OSX? The user agent might be different, so you might be comparing different soups.

        Also, it is possible that bs4 is an older version on pythonista.

        1 Reply Last reply Reply Quote 0
        • n8henrie142
          n8henrie142 last edited by

          Thanks for the response.

          Same version on both.

          $ python -c 'import bs4; print(bs4.__version__)'
          4.3.2
          

          I think the thing that seals it as a bug is that soup.find('table').find('td') works, but soup.find('table').find_all('td') throws an error, on the same soup object.

          1 Reply Last reply Reply Quote 0
          • briarfox
            briarfox last edited by

            I'd make sure you are getting the same web page that you are using for your soup. Maybe you are pulling a mobile version on your ipad. I think this is what JonB means by a different user-agent.

            1 Reply Last reply Reply Quote 0
            • n8henrie142
              n8henrie142 last edited by

              I understand what he means, and I'll check, but I don't think that would explain in any way why .find('td') would have a result but .find_all('td') would cause an error. It wouldn't even make sense if it came up empty (it should at least find the result that .find() found), but it should definitely not cause an error.

              1 Reply Last reply Reply Quote 0
              • n8henrie142
                n8henrie142 last edited by

                As suspected, I wrote html from Pythonista to a pickle file, loaded it on OS X, converted to soup, and had no problem using find_all('td') on OS X.

                I also used difflib to inspect the differences between HTML content of the Pythonista file and that downloaded on OS X , and as far as I can tell the only differences are timestamps (as the content was downloaded minutes apart).

                1 Reply Last reply Reply Quote 0
                • briarfox
                  briarfox last edited by

                  Thats really odd, I've been using bs4 for awhile, no issues. Can you setup a gist of the page and I'll try?

                  1 Reply Last reply Reply Quote 0
                  • n8henrie142
                    n8henrie142 last edited by

                    Unfortunately, I would have done that already except that it's a password-protected site I use for work. I haven't been able to replicate yet on a couple of other sites, but I'll try to find a public site that has the same bug.

                    1 Reply Last reply Reply Quote 0
                    • n8henrie142
                      n8henrie142 last edited by

                      Getting all td from the table at w3schools.com/html/html_tables.asp works fine.

                      Here's my traceback.

                      2014-10-27 13:12:43 /var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py :: __main__ ERROR    There was an error.
                      2014-10-27 13:12:44 /var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py :: __main__ ERROR    'NoneType' object has no attribute 'next_element'
                      Traceback (most recent call last):
                        File "/var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py", line 28, in <module>
                          print(len(table.find('td').find_all('td')))
                        File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1180, in find_all
                          return self._find_all(name, attrs, text, limit, generator, **kwargs)
                        File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 497, in _find_all
                          return ResultSet(strainer, result)
                        File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1610, in __init__
                          super(ResultSet, self).__init__(result)
                        File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 494, in <genexpr>
                          result = (element for element in generator
                        File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1198, in descendants
                          current = current.next_element
                      AttributeError: 'NoneType' object has no attribute 'next_element'
                      
                      1 Reply Last reply Reply Quote 0
                      • JonB
                        JonB last edited by

                        Is it possible for you to "sanitize" the html so it is no longer contains any work info? I.e just strip out text and replace with random text?

                        Have you tried pickling the soup itself? (Mmmm pickle soup) either going from OSX to pythonista, or vice versa?

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Powered by NodeBB Forums | Contributors