[Python] BeautifulSoup 라이브러리 소개 및 기본활용

Notice

Recent Posts

Recent Comments

Link

« 2024/10 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

에코프로.AI

[Python] BeautifulSoup 라이브러리 소개 및 기본활용 본문

AI Tutorial

[Python] BeautifulSoup 라이브러리 소개 및 기본활용

AI_HitchHiker 2024. 8. 20. 19:51

https://www.sixfeetup.com/blog/an-introduction-to-beautifulsoup

html, css 등 관련 내용을 기 학습 하시려면, 아래의 링크를 클릭하시기 바랍니다.

[참고사이트] HTML, CSS, XML, JSON

BeautifulSoup 라이브러리
- 웹 페이지에서 정보를 쉽게 스크래핑하는 라이브러리
  - HTML 뿐만 아니라, XML파일 파싱도 쉽게 수행하는 라이브러리
  - 구문 분석 트리를 사용하여, HTML 및 XML 파일을 탐색하고 검색하는 기능 제공
  - 공식문서
    - https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start
- 주요 특징
  - HTML / XML 파싱
  - 쉽고 직관적인 API
  - 다양한 파서 지원 (html.parser, lxml, xml 등)

설치

  pip install beautifulSoup4
  pip install bs4

xml parser 설치

pip install lxml

네이버 홈페이지 불러오기 (Feat. requests)

import requests

url = 'https://www.naver.com'
response = requests.get(url)

if response.status_code == 200:
  result = response.text
  print(result)
else:
  print('Failed : ', response.status_code)

BeautifulSoup 기본 사용

설치

#%pip install beautifulSoup4
%pip install bs4

테스트 html 생성

from bs4 import BeautifulSoup

html_doc = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

BeautifulSoup으로 html 불러오기

soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())

특정 태그 추출

print('soup.title : ', soup.title)
print('soup.title.name : ', soup.title.name)

print('soup.title.string : ', soup.title.string)
print('soup.title.text : ', soup.title.text)

print('soup.title.parent.name : ', soup.title.parent.name)

태그의 속성 접근

print('soup.p : ', soup.p)
print('soup.p["class"] : ', soup.p["class"])

print('soup.a : ', soup.a)
print('soup.a["class"] : ', soup.a["class"])
print('soup.a["href"] : ', soup.a["href"])
print('soup.a["id"] : ', soup.a["id"])

find() 메서드 이용하기
- 첫 번째로 발견된 태그를 반환

print('soup.find("a") : ', soup.find("a"))
print('soup.find("a", string="Lacie") : ', soup.find("a", string="Lacie"))
print('soup.find(id="link3) : ', soup.find(id = "link3"))

find_all() 메서드 이용하기
- 발견한 모든 태그를 리스트로 반환

print('soup.find_all("a") : ', soup.find_all("a"))

soup.find_all("a") : [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
keyboard_arrow_down

CSS 선택자를 사용하여 태그 선택
- select_one() 메서드 이용하기
  - 처음으로 매치되는 태그 객체 하나만 반환

soup.select_one('p.title')

select() 메서드 이용하기
- 매치되는 모든 태그를 리스트로 반환

soup.select("a.sister")

links = soup.select("a.sister")
for link in links:
  print(link["href"])

검색 된 태그의 attr 사용하기

soup.select_one("a").attrs

print(soup.select_one("a").attrs["href"])
print(soup.select_one("a").attrs["class"])
print(soup.select_one("a").attrs["id"])

태그의 .string 과 .text 구분해서 사용하기
- 태그의 문자열을 추출할 때 사용
  - .string : 태그가 단일 자식 텍스트 노드를 가질때 사용
    - 태그 안에 또 다른 태그가 포함된 경우 None 반환
  - .text : 태그와 그 모든 자식의 텍스트를 반환

print('soup.select_one("a").string : ', soup.select_one("a").string)
print('soup.select_one("a").text : ', soup.select_one("a").text)

저작자표시 비영리 변경금지

'AI Tutorial' 카테고리의 다른 글

[Python] 크롤링 예제 (Feat. 멜론 차트) (0)	2024.08.21
[Python] 크롤링 예제 (Feat. 다음 뉴스) (1)	2024.08.21
[Python] Requests 라이브러리 소개 및 활용(Feat. xml) (0)	2024.08.20
[Python] Requests 라이브러리 소개 및 활용(Feat. json) (0)	2024.08.18
[Python] XML 데이터 처리관련 (0)	2024.08.18

'AI Tutorial' Related Articles

에코프로.AI

[Python] BeautifulSoup 라이브러리 소개 및 기본활용 본문

[Python] BeautifulSoup 라이브러리 소개 및 기본활용

html, css 등 관련 내용을 기 학습 하시려면, 아래의 링크를 클릭하시기 바랍니다.

[참고사이트] HTML, CSS, XML, JSON

네이버 홈페이지 불러오기 (Feat. requests)

BeautifulSoup 기본 사용

'AI Tutorial' 카테고리의 다른 글

티스토리툴바