Skip to main content

Access or select tags, items, div, class, id, etc using xpaths | Scrapping Guide

When we learn web scraping or web automation its very important to understand the concepts of "XPATHS" and using them to get most benefit.
 You would be surprised to know that XPATHS Version 1 was created in 1999 and since then we have many version of xpaths and many more to come. But current browsers only support XPATHS Version 1.


Learning XPATH is really easy as well interesting. 
Now I will tell you about the structure of XPATHS in rest of the Blogpost


Structure of XPATHS 

There no need to go deep inside XPATHS if you primarily want to learn web scraping or web automation but remember these two points given below to make your life easier.
  • Remember that a XPATH always start with two forward slash "//" 
  • After 2 Forward slash comes the element name such as "p", "div", etc.


Example of XPATH:

Here are 3 examples which will facilitate the learning of XPATH and will make the stuff as easy as a loaf of bread.

Example 1:

//div/a/text()

This is a very easy example of XPATH. This XPATH will target all the Text inside the Anchor Tag inside all the Div elements inside a page.

Example 2:


//div[@id='main-section']


The above line of code will target Div with id='main-section'. This will provide you 1 or less than 1 result as id is always unique. But the case would be different if want to target Class. 

Example 3:


//div[@class='social-icons']

This is target all the div with Class = 'social-icons'.lo


If you too love python and want to scrape and automate the web using python, then I recommend you a few tools like Scrappy, Selenium and there are total 8 ways to locate or access an element using selenium.



Comments

Popular posts from this blog

How to make a private repository in Git Hub

  Step 1 is to create a gitignore file in that directory. So open Git Bash in the desired directory. Then use the command below to accomplish the task. Here you can add name of files or folders which you don't want to add in your github repository. Adding the file name or file directory name will simply ignore that file or file directory. touch .gitignore Step 2 is to Initialise. So use the command below to initialize. git init Step 3 is to add content to staging area. So use the command below to add all the content to staging area and prepare all the content to be added to your repository.  git add . Remember all the files except the file names added in the gitignore file will be added to the staging area. Step 4 is to make our first commit. So use the command below to make a commit. Here i called my first commit as "Initial Commit" you can call it "First Commit" or whatever you like. git commit -m "Initial Commit" Step 4 is an optional step, but i in

[SOLVED] An AMP component 'script' tag is present, but unused - Search Console issue for wordpress website

I Encounter this issue " An AMP component 'script' tag is present, but unused." in my search console for a wordpress website.  I was using "Newspaper 9" wordpress theme with AMP enabled with the help of plugin provided by NEWSPAPER THEME. Everything was going good until i got this error. An AMP component 'script' tag is present, but unused. It was clear the error was due to a change in my website code. I recently got Adsense approval for my website and it was pretty clear that it was due to " AMP ADS ".  I think that my website was adsense approved a few days ago and this could be the reason that AMP-ADS were not setup properly. I used GOOGLE SITE KIT to implement Auto Ads as well as inserting the Ad-code for Adsense approval.  How I solved This Issue I tried many ways to implement amp-ads with the help of AMP Plugin provided by newspaper, Google Site kit. I also tried many Amp settings in my Adsense Dashboard. Luc