๐Ÿ“‚์›น ๊ฐœ๋ฐœ(Web)/๐ŸํŒŒ์ด์ฌ(Python)

ํŒŒ์ด์ฌ ์›น ํฌ๋กค๋ง ์ •์  ํฌ๋กค๋ง selenium ๋งˆ์šฐ์Šค ์ œ์–ด ํ‚ค๋ณด๋“œ ์ œ์–ด selenium๊ณผ bs4์˜ ์กฐํ•ฉ

๐Ÿ‘ฉ‍๐ŸŽ“์ธํ…”๋ฆฌ๊ฐ์ž๐Ÿฅ” 2023. 5. 10. 17:49

## select 

find_all()๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋งค์นญ๋˜๋Š” ๋ชจ๋“  ๊ฒฐ๊ณผ๋ฅผ ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜

select_one()์œผ๋กœ ํ•˜๋‚˜์˜ ๊ฒฐ๊ณผ๋งŒ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅ

ํด๋ž˜์Šค๋Š” ๋งˆ์นจํ‘œ(.) ์•„์ด๋””๋Š” ์ƒต(#)์œผ๋กœ, ์ž์†ํƒœ๊ทธ๋Š” ๋„์–ด์“ฐ๊ธฐ๋กœ ํ‘œํ˜„

 

print(soup.select("p")) #pํƒœ๊ทธ 
print(soup.select(".d")) # class๊ฐ€ d์ธ ํƒœ๊ทธ
print(soup.select("p.d")) # class๊ฐ€ d์ธ pํƒœ๊ทธ
print(soup.select("#i")) # id๊ฐ€ i์ธ ํƒœ๊ทธ
print(soup.select("p#i")) # id๊ฐ€ i์ธ pํƒœ๊ทธ

 

print(soup.select("body p")) #body์˜ ์ž์†์ธ p ํƒœ๊ทธ

 

ํฌ๋กค๋ง ์˜ˆ์ œ

f12 ๊ฐœ๋ฐœ์ž ๋„๊ตฌ๋กœ ์กฐํšŒ

 

 

 

 

 

๋™์  ํฌ๋กค๋ง

 

https://chromedriver.chromium.org/downloads

 

ChromeDriver - WebDriver for Chrome - Downloads

Current Releases If you are using Chrome version 114, please download ChromeDriver 114.0.5735.16 If you are using Chrome version 113, please download ChromeDriver 113.0.5672.63 If you are using Chrome version 112, please download ChromeDriver 112.0.5615.49

chromedriver.chromium.org

๋‚ด ํฌ๋กฌ ๋ฒ„์ „๊ณผ ๋งž๋Š” ๋ฒ„์ „์„ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๋Š”๋‹ค.

์…€๋ ˆ๋Š„์ด ์‹œ์ž‘๋˜์ง€ ์•Š๋Š”๋‹ค.

 

cmd์ฐฝ์— pip install senium์„ ์ž…๋ ฅํ•œ๋‹ค.

 

 

 

#Selenum์œผ๋กœ Dom์— ์ ‘๊ทผํ•˜๋Š” ๋ฐฉ๋ฒ•


๋‹จ์ผ ๊ฐ์ฒด ๋ฐ˜ํ™˜(bs4์˜ find()์™€ ๊ฐ™์€ ํ˜•ํƒœ)
find_element

๋ฆฌ์ŠคํŠธ ๊ฐ์ฒด ๋ฐ˜ํ™˜(bs4์˜ find_all()๊ณผ ๊ฐ™์€ ํ˜•ํƒœ)
find_elements

 

 

# ์›น ์ ‘์†ํ•˜๊ธฐ


url = "https://www.naver.com"

driver = webdriver.Chrome("chromedriver")

driver.get(url) # url ์ ‘์†

 

 

## css_selector

bs4์˜ select()์™€ ๋™์ผ

 

url = "https://pjt3591oo.github.io"

driver = webdriver.Chrome("chromedriver")

driver.get(url)

selected = driver.find_element(by = By.CSS_SELECTOR, value = "div.p")

print(selected)
print(selected.tag_name)
print(selected.text)

selected = driver.find_elements(By.CSS_SELECTOR, "div.p")
print(selected)

 

 

์—†๋Š” ์š”์†Œ ์ ‘๊ทผ

-bs4 ์™€๋Š” ๋‹ค๋ฅด๊ฒŒ ์—†๋Š” ์š”์†Œ์— ์ ‘๊ทผํ•˜๋ฉด ์—๋Ÿฌ๋ฅผ ๋„์›€

no such Element Exception

 

๋งˆ์šฐ์Šค ์ œ์–ด

url = "https://pjt3591oo.github.io"

driver = webdriver.Chrome("chromedriver")

driver.get(url)

selected = driver.find_element(by = By.CSS_SELECTOR, value = "div.p a") 
print(selected) 
print(selected.text)

selected.click()

 

์˜ค๋ฅ˜ ํŽ˜์ด์ง€

๋ฉ”์ธ ํŽ˜์ด์ง€์—์„œ ๋”ํŠธ๋ฆฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ๋‹ค๋ฅธ ํŽ˜์ด์ง€๋กœ ๋„˜์–ด๊ฐ€๋ฉด ๊ทธ ์ „์— ๊ฐ€์ ธ์˜จ ํŽ˜์ด์ง€๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๊ฒŒ ๋จ

๋”ฐ๋ผ์„œ click์„ ํŽ˜์ด์ง€ ์ด๋™ ์šฉ๋„๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ๊ฐ€๊ธ‰์  ํ”ผํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Œ

ํŽ˜์ด์ง€ ๋ณ€ํ™” ์—†์ด ํŽ˜์ด์ง€ ๋‚ด์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณ€ํ™”๋˜๋Š” ๊ฒฝ์šฐ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅ

 

 

ํ‚ค๋ณด๋“œ ์ œ์–ด

url = "https://pjt3591oo.github.io/search"

driver = webdriver.Chrome("chromedriver")

driver.get(url)

selected = driver.find_element(By.CSS_SELECTOR, "input#search-box") 
selected.send_keys("test")

 

์—”ํ„ฐํ‚ค

selected.send_keys(Keys.ENTER)

 

selenium๊ณผ bs4์˜ ์กฐํ•ฉ

page_source : ํ˜„์žฌ ์›น ๋ธŒ๋ผ์šฐ์ €์˜ HTML ์ฝ”๋“œ๋ฅผ ๊ฐ€์ ธ์˜ด

url = "https://pjt3591oo.github.io"

driver = webdriver.Chrome("chromedriver")

driver.get(url)

soup = BeautifulSoup(driver.page_source, "lxml")

print(soup.select("div"))

 

url = "https://pjt3591oo.github.io/search"

driver = webdriver.Chrome("chromedriver")

driver.get(url)

selected = driver.find_element(By.CSS_SELECTOR, "input#search-box") 
selected.send_keys("test")
selected.send_keys(Keys.ENTER)

soup = BeautifulSoup(driver.page_source, "lxml")
items = soup.select("ul#search-results li")

for item in items:
    title = item.select_one("h3").text
    description = item.select_one("p").text
    print(title)
    print(description)

 

์˜ˆ์ œ # ๋„ค์ด๋ฒ„์—์„œ ๊ณ ์Šด๋„์น˜ ๊ฒ€์ƒ‰ ํ›„ ๊ณ ์Šด๋„์น˜ ์ง€์‹๋ฐฑ๊ณผ ์ ‘์†

๋ฐฉ๋ฒ• 1

# ๋„ค์ด๋ฒ„์—์„œ ๊ณ ์Šด๋„์น˜ ๊ฒ€์ƒ‰ ํ›„ ๊ณ ์Šด๋„์น˜ ์ง€์‹๋ฐฑ๊ณผ ์ ‘์†

url = "https://www.naver.com/"

driver = webdriver.Chrome("chromedriver")

driver.get(url)

driver.implicitly_wait(3) # ๋ฌต์‹œ์  ๋Œ€๊ธฐ

search = driver.find_element(By.CSS_SELECTOR, "input#query") 

search.send_keys("๊ณ ์Šด๋„์น˜")
search.send_keys(Keys.ENTER)

post = driver.find_element(By.CSS_SELECTOR, "a.area_text_title")

post.click()

๋ฐฉ๋ฒ• 2

url = "https://www.naver.com/"

driver = webdriver.Chrome("chromedriver")

driver.get(url)

driver.implicitly_wait(3) # ๋ฌต์‹œ์  ๋Œ€๊ธฐ

search = driver.find_element(By.CSS_SELECTOR, "input#query") 

selected.send_keys("๊ณ ์Šด๋„์น˜")
selected.send_keys(Keys.ENTER)

selected = driver.find_element(By.CSS_SELECTOR, "div.title_area a")

selected.click()

 

 

๋ฌต์‹œ์  ๋Œ€๊ธฐ 

driver.implicitly wait(3) # ์ตœ๋Œ€ 3์ดˆ๋ฅผ ์‰ผ

 

import time
time.sleep(1)