Playing with webdriver

Webdriver has been a persistently alluring technology since I discovered it a couple of years ago. However, regular HTTP clients have always been sufficient for my needs.

I have recently wanted to pull some data off the local church website, and I ha…


This content originally appeared on DEV Community and was authored by DEV Community

Webdriver has been a persistently alluring technology since I discovered it a couple of years ago. However, regular HTTP clients have always been sufficient for my needs.

I have recently wanted to pull some data off the local church website, and I have been unable to log in with any HTTP clients. So, I attempted to throw Etaoin at the problem, and it worked marvelously.

You will need a username and password for a congregate site to follow along. I suspect the routes will be identical.

;; Getting started
(require '[etaoin.api :as api])
(def base-url "https://mysite.com")
(def user "...")
(def pass "...")
(def ff (api/firefox))

Logging in is easy.

(api/go ff (str base-url "/members/login/"))
(api/fill-multi ff {:username user :password pass})
(api/submit ff {:id "password"})

After submitting the login form, I am unsure how to verify that the member landing page has loaded. So, for now, I advise just waiting a few seconds. If you're following along, you will have the browser in front of you and can "eyeball" it. I would appreciate any suggestions for improvement here.

We're in, so what now?

Well, I have trouble remembering names and faces. What if I had a flashcard system to help me memorize them? We can build that from the directory.

Let's navigate to the directory page and inspect it before proceeding.

(api/go ff (str base-url "/members/directory"))

It looks like each directory element is identifiable by the album class.

snapshot of directory html

Let's dig into an album tag.

<div class="album">
  <a href="/members/directory/family/XXX">
    <span class="album-img">
      <img src="image-url" alt="...">
    </span>
    <span class="album-title">Doe, John</span>
  </a>
</div>

We'll need a couple of functions. One takes an album and grabs the tag's value with class=album-title, and the other grabs the image source.

(require '[clojure.string :as s])

(defn get-album-title [album-entry]
  (->> {:class "album-title"}
       (api/child ff album-entry)
       (api/get-element-text-el ff)))

(defn get-album-image [album-entry]
  (as-> album-entry $
    (api/child ff $ {:tag "img"})
    (api/get-element-attr-el ff $ :src)
    (s/replace $ #"\?.*" "")
    (str base-url $)))

This code may look familiar because it is similar to the kind of web-scraping you would do with a regular HTTP client. If not, I've got you covered.

album-entry represents a DOM element like the <div class=album> tag we inspected earlier, children and all. Call the child function to get the sub-element we want, and then finally, a get-element-<thing> function returns the string we need.

Let's put it together.

(->> {:class "album"}
     (api/query-all ff)
     (mapv (juxt get-album-title get-album-image)))

;=>
[["Doe, John" "path/to/image.jpg"]
 ["Doe, Jane" "path/to/image2.jpg"] ...]

At this point, I am beginning to lose interest. But I like having options. So, let's convert this to JSON and print it to the console. You can see the project here

Perhaps I will revisit and finish the project in another post.


This content originally appeared on DEV Community and was authored by DEV Community


Print Share Comment Cite Upload Translate Updates
APA

DEV Community | Sciencx (2022-03-07T16:26:16+00:00) Playing with webdriver. Retrieved from https://www.scien.cx/2022/03/07/playing-with-webdriver/

MLA
" » Playing with webdriver." DEV Community | Sciencx - Monday March 7, 2022, https://www.scien.cx/2022/03/07/playing-with-webdriver/
HARVARD
DEV Community | Sciencx Monday March 7, 2022 » Playing with webdriver., viewed ,<https://www.scien.cx/2022/03/07/playing-with-webdriver/>
VANCOUVER
DEV Community | Sciencx - » Playing with webdriver. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/03/07/playing-with-webdriver/
CHICAGO
" » Playing with webdriver." DEV Community | Sciencx - Accessed . https://www.scien.cx/2022/03/07/playing-with-webdriver/
IEEE
" » Playing with webdriver." DEV Community | Sciencx [Online]. Available: https://www.scien.cx/2022/03/07/playing-with-webdriver/. [Accessed: ]
rf:citation
» Playing with webdriver | DEV Community | Sciencx | https://www.scien.cx/2022/03/07/playing-with-webdriver/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.