- Beautiful Soup Tutorial
- Beautiful Soup - Home
- Beautiful Soup - Overview
- Beautiful Soup - Web Scraping
- Beautiful Soup - Installation
- Beautiful Soup - Souping the Page
- Beautiful Soup - Kinds of objects
- Beautiful Soup - Inspect Data Source
- Beautiful Soup - Scrape HTML Content
- Beautiful Soup - Navigating by Tags
- Beautiful Soup - Find Elements by ID
- Beautiful Soup - Find Elements by Class
- Beautiful Soup - Find Elements by Attribute
- Beautiful Soup - Searching the Tree
- Beautiful Soup - Modifying the Tree
- Beautiful Soup - Parsing a Section of a Document
- Beautiful Soup - Find all Children of an Element
- Beautiful Soup - Find Element using CSS Selectors
- Beautiful Soup - Find all Comments
- Beautiful Soup - Scraping List from HTML
- Beautiful Soup - Scraping Paragraphs from HTML
- BeautifulSoup - Scraping Link from HTML
- Beautiful Soup - Get all HTML Tags
- Beautiful Soup - Get Text Inside Tag
- Beautiful Soup - Find all Headings
- Beautiful Soup - Extract Title Tag
- Beautiful Soup - Extract Email IDs
- Beautiful Soup - Scrape Nested Tags
- Beautiful Soup - Parsing Tables
- Beautiful Soup - Selecting nth Child
- Beautiful Soup - Search by text inside a Tag
- Beautiful Soup - Remove HTML Tags
- Beautiful Soup - Remove all Styles
- Beautiful Soup - Remove all Scripts
- Beautiful Soup - Remove Empty Tags
- Beautiful Soup - Remove Child Elements
- Beautiful Soup - find vs find_all
- Beautiful Soup - Specifying the Parser
- Beautiful Soup - Comparing Objects
- Beautiful Soup - Copying Objects
- Beautiful Soup - Get Tag Position
- Beautiful Soup - Encoding
- Beautiful Soup - Output Formatting
- Beautiful Soup - Pretty Printing
- Beautiful Soup - NavigableString Class
- Beautiful Soup - Convert Object to String
- Beautiful Soup - Convert HTML to Text
- Beautiful Soup - Parsing XML
- Beautiful Soup - Error Handling
- Beautiful Soup - Trouble Shooting
- Beautiful Soup - Porting Old Code
- Beautiful Soup - Functions Reference
- Beautiful Soup - contents Property
- Beautiful Soup - children Property
- Beautiful Soup - string Property
- Beautiful Soup - strings Property
- Beautiful Soup - stripped_strings Property
- Beautiful Soup - descendants Property
- Beautiful Soup - parent Property
- Beautiful Soup - parents Property
- Beautiful Soup - next_sibling Property
- Beautiful Soup - previous_sibling Property
- Beautiful Soup - next_siblings Property
- Beautiful Soup - previous_siblings Property
- Beautiful Soup - next_element Property
- Beautiful Soup - previous_element Property
- Beautiful Soup - next_elements Property
- Beautiful Soup - previous_elements Property
- Beautiful Soup - find Method
- Beautiful Soup - find_all Method
- Beautiful Soup - find_parents Method
- Beautiful Soup - find_parent Method
- Beautiful Soup - find_next_siblings Method
- Beautiful Soup - find_next_sibling Method
- Beautiful Soup - find_previous_siblings Method
- Beautiful Soup - find_previous_sibling Method
- Beautiful Soup - find_all_next Method
- Beautiful Soup - find_next Method
- Beautiful Soup - find_all_previous Method
- Beautiful Soup - find_previous Method
- Beautiful Soup - select Method
- Beautiful Soup - append Method
- Beautiful Soup - extend Method
- Beautiful Soup - NavigableString Method
- Beautiful Soup - new_tag Method
- Beautiful Soup - insert Method
- Beautiful Soup - insert_before Method
- Beautiful Soup - insert_after Method
- Beautiful Soup - clear Method
- Beautiful Soup - extract Method
- Beautiful Soup - decompose Method
- Beautiful Soup - replace_with Method
- Beautiful Soup - wrap Method
- Beautiful Soup - unwrap Method
- Beautiful Soup - smooth Method
- Beautiful Soup - prettify Method
- Beautiful Soup - encode Method
- Beautiful Soup - decode Method
- Beautiful Soup - get_text Method
- Beautiful Soup - diagnose Method
- Beautiful Soup Useful Resources
- Beautiful Soup - Quick Guide
- Beautiful Soup - Useful Resources
- Beautiful Soup - Discussion
Beautiful Soup - Searching the Tree
In this chapter, we shall discuss different methods in Beautiful Soup for navigating the HTML document tree in different directions - going up and down, sideways, and back and forth.
We shall use the following HTML string in all the examples in this chapter −
html = """ <html><head><title>TutorialsPoint</title></head> <body> <p class="title"><b>Online Tutorials Library</b></p> <p class="story">TutorialsPoint has an excellent collection of tutorials on: <a href="https://tutorialspoint.com/Python" class="lang" id="link1">Python</a>, <a href="https://tutorialspoint.com/Java" class="lang" id="link2">Java</a> and <a href="https://tutorialspoint.com/PHP" class="lang" id="link3">PHP</a>; Enhance your Programming skills.</p> <p class="tutorial">...</p> """
The name of required tag lets you navigate the parse tree. For example soup.head fetches you the <head> element −
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print (soup.head.prettify())
Output
<head> <title> TutorialsPoint </title> </head>
Going down
A tag may contain strings or other tags enclosed in it. The .contents property of Tag object returns a list of all the children elements belonging to it.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tag = soup.head print (list(tag.children))
Output
[<title>TutorialsPoint</title>]
The returned object is a list, although in this case, there is only a single child tag enclosed in head element.
.children
The .children property also returns a list of all the enclosed elements in a tag. Below, all the elements in body tag are given as a list.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tag = soup.body print (list(tag.children))
Output
['\n', <p class="title"><b>Online Tutorials Library</b></p>, '\n', <p class="story">TutorialsPoint has an excellent collection of tutorials on: <a class="lang" href="https://tutorialspoint.com/Python" id="link1">Python</a>, <a class="lang" href="https://tutorialspoint.com/Java" id="link2">Java</a> and <a class="lang" href="https://tutorialspoint.com/PHP" id="link3">PHP</a>; Enhance your Programming skills.</p>, '\n', <p class="tutorial">...</p>, '\n']
Instead of getting them as a list, you can iterate over a tag's children using the .children generator −
Example
tag = soup.body for child in tag.children: print (child)
Output
<p class="title"><b>Online Tutorials Library</b></p> <p class="story">TutorialsPoint has an excellent collection of tutorials on: <a class="lang" href="https://tutorialspoint.com/Python" id="link1">Python</a>, <a class="lang" href="https://tutorialspoint.com/Java" id="link2">Java</a> and <a class="lang" href="https://tutorialspoint.com/PHP" id="link3">PHP</a>; Enhance your Programming skills.</p> <p class="tutorial">...</p>
.descendents
The .contents and .children attributes only consider a tag's direct children. The .descendants attribute lets you iterate over all of a tag's children, recursively: its direct children, the children of its direct children, and so on.
The BeautifulSoup object is at the top of hierarchy of all the tags. Hence its .descendents property includes all the elements in the HTML string.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print (soup.descendants)
The .descendents attribute returns a generator, which can be iterated with a for loop. Here, we list out the descendents of the head tag.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tag = soup.head for element in tag.descendants: print (element)
Output
<title>TutorialsPoint</title> TutorialsPoint
The head tag contains a title tag, which in turn encloses a NavigableString object TutorialsPoint. The <head> tag has only one child, but it has two descendants: the <title> tag and the <title> tag's child. But the BeautifulSoup object only has one direct child (the <html> tag), but it has many descendants.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tags = list(soup.descendants) print (len(tags))
Output
27
Going Up
Just as you navigate the downstream of a document with children and descendents properties, BeautifulSoup offers .parent and .parent properties to navigate the upstream of a tag
.parent
every tag and every string has a parent tag that contains it. You can access an element's parent with the parent attribute. In our example, the <head> tag is the parent of the <title> tag.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tag = soup.title print (tag.parent)
Output
<head><title>TutorialsPoint</title></head>
Since the title tag contains a string (NavigableString), the parent for the string is title tag itself.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tag = soup.title string = tag.string print (string.parent)
Output
<title>TutorialsPoint</title>
.parents
You can iterate over all of an element's parents with .parents. This example uses .parents to travel from an <a> tag buried deep within the document, to the very top of the document. In the following code, we track the parents of the first <a> tag in the example HTML string.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tag = soup.a print (tag.string) for parent in tag.parents: print (parent.name)
Output
Python p body html [document]
Sideways
The HTML tags appearing at the same indentation level are called siblings. Consider the following HTML snippet
<p> <b> Hello </b> <i> Python </i> </p>
In the outer <p> tag, we have <b> and <i> tags at the same indent level, hence they are called siblings. BeautifulSoup makes it possible to navigate between the tags at same level.
.next_sibling and .previous_sibling
These attributes respectively return the next tag at the same level, and the previous tag at same level.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<p><b>Hello</b><i>Python</i></p>", 'html.parser') tag1 = soup.b print ("next:",tag1.next_sibling) tag2 = soup.i print ("previous:",tag2.previous_sibling)
Output
next: <i>Python</i> previous: <b>Hello</b>
Since the <b> tag doesn't have a sibling to its left, and <i> tag doesn't have a sibling to its right, it returns Nobe in both cases.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<p><b>Hello</b><i>Python</i></p>", 'html.parser') tag1 = soup.b print ("next:",tag1.previous_sibling) tag2 = soup.i print ("previous:",tag2.next_sibling)
Output
next: None previous: None
.next_siblings and .previous_siblings
If there are two or more siblings to the right or left of a tag, they can be navigated with the help of the .next_siblings and .previous_siblings attributes respectively. Both of them return generator object so that a for loop can be used to iterate.
Let us use the following HTML snippet for this purpose −
<p> <b> Excellent </b> <i> Python </i> <u> Tutorial </u> </p>
Use the following code to traverse next and previous sibling tags.
Example
from bs4 import BeautifulSoup soup = BeautifulSoup("<p><b>Excellent</b><i>Python</i><u>Tutorial</u></p>", 'html.parser') tag1 = soup.b print ("next siblings:") for tag in tag1.next_siblings: print (tag) print ("previous siblings:") tag2 = soup.u for tag in tag2.previous_siblings: print (tag)
Output
next siblings: <i>Python</i> <u>Tutorial</u> previous siblings: <i>Python</i> <b>Excellent</b>
Back and forth
In Beautiful Soup, the next_element property returns the next string or tag in the parse tree. On the other hand, the previous_element property returns the previous string or tag in the parse tree. Sometimes, the return value of next_element and previous_element attributes is similar to next_sibling and previous_sibling properties.
.next_element and .previous_element
Example
html = """ <html><head><title>TutorialsPoint</title></head> <body> <p class="title"><b>Online Tutorials Library</b></p> <p class="story">TutorialsPoint has an excellent collection of tutorials on: <a href="https://tutorialspoint.com/Python" class="lang" id="link1">Python</a>, <a href="https://tutorialspoint.com/Java" class="lang" id="link2">Java</a> and <a href="https://tutorialspoint.com/PHP" class="lang" id="link3">PHP</a>; Enhance your Programming skills.</p> <p class="tutorial">...</p> """ from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') tag = soup.find("a", id="link3") print (tag.next_element) tag = soup.find("a", id="link1") print (tag.previous_element)
Output
PHP TutorialsPoint has an excellent collection of tutorials on:
The next_element after <a> tag with id = "link3" is the string PHP. Similarly, the previous_element returns the string before <a> tag with id = "link1".
.next_elements and .previous_elements
These attributes of the Tag object return generator respectively of all tags and strings after and before it.
Next elements example
tag = soup.find("a", id="link1") for element in tag.next_elements: print (element)
Output
Python , <a class="lang" href="https://tutorialspoint.com/Java" id="link2">Java</a> Java and <a class="lang" href="https://tutorialspoint.com/PHP" id="link3">PHP</a> PHP ; Enhance your Programming skills. <p class="tutorial">...</p> ...
Previous elements example
tag = soup.find("body") for element in tag.previous_elements: print (element)
Output
<html><head><title>TutorialsPoint</title></head>
To Continue Learning Please Login
Login with Google