How You Get Readymade Python from chatGPT - After a Few Tries 😊

Can you give me some python code which can be executed from the command line as python3 script.py URL DOMAIN Where the URL argument is a link to a web page (url) and the DOMAIN argument is a domain name such as blogger. What the script will do is dump out the links on the page (referred to by the url argument) which contain the domain DOMAIN.

Sure! ... but, that dumped out all links.

Can you give me some python code which can be executed from the command line as python3 script.py URL DOMAIN Where the URL argument is a link to a web page (url) and the DOMAIN argument is a domain name such as blogger. What the script will do is dump out the links on the page (referred to by the url argument) which contain the string DOMAIN between periods (whole word).

No bueno😒 When I ran as python3 links.py url blogspot, it dumped out as many blogspot.com (that's it, per line) as there were links..

That didn't work, we want to get the page referenced by URL and the dump out only those links from that page that contain the string given by the DOMAIN argument as a whole word


I apologize for the confusion. Let's modify the script to fetch the page referenced by the URL and then extract links that contain the specified domain as a whole word. Here's the updated Python script:

And that worked perfectly..

Now, instead of just dumping out the links, is it possible to dump out javascript code such that the links are contained in an array variable so that they can be processed by a foreach loop easily?

(click below to select the code) (howto for the click to select)

import sys import requests from bs4 import BeautifulSoup from urllib.parse import urlparse def get_links_with_domain(url, domain): try: response = requests.get(url) response.raise_for_status() soup = BeautifulSoup(response.content, 'html.parser') parsed_domain = f".{domain}." links = [] for link in soup.find_all('a', href=True): if parsed_domain in urlparse(link['href']).netloc.lower(): links.append(link['href']) return links except requests.exceptions.RequestException as e: print(f"Error fetching the URL: {e}") return [] if __name__ == "__main__": if len(sys.argv) != 3: print("Usage: python3 script.py URL DOMAIN") sys.exit(1) url = sys.argv[1] domain = sys.argv[2] links_with_domain = get_links_with_domain(url, domain) if links_with_domain: print("var linksArray = [") for link in links_with_domain: print(f' "{link}",') print("];") else: print(f"No links found on the page containing '{domain}' as a whole word.")

Comments