관리 메뉴

웹개발자의 기지개

[Python] 쇼핑몰 상품 크롤링하기1 본문

python

[Python] 쇼핑몰 상품 크롤링하기1

웹개발자 워니 2023. 6. 19. 16:47

 

[파이썬 가상환경 설정]
F:\python_basic 폴더에서 가상환경설정하기
F:\python -m venv python_basic 하면 지금과같이
F:\python_basic 폴더안에 Lib, Scripts, Include 폴더등등이 별도로 생성된다.
--------------------------------------------------------------------

1. 파이썬을 설치한다. python 3.0이상버전

F:\python_basic 폴더를 기본폴더로 설명한다.


F:\python_basic

F:\python_basic\cd Scripts

F:\python_basic\Scripts>pip list    --> 설치된 라이브러리 확인한다.

2. 관련 pip 라이브러리를 설치 및 확인한다.

F:\python_basic\Scripts>pip install beautifulsoup4
F:\python_basic\Scripts>pip install requests
F:\python_basic\Scripts>pip install pandas
F:\python_basic\Scripts>pip install openpyxl


3. 가상환경 실행한다.

F:\python_basic\Scripts>activate
(python_basic) F:\python_basic\Scripts>

3. 크롤링 파일을 실행한다.

(python_basic) F:\python_basic\Scripts>cd ..

(python_basic) F:\python_basic>python craw1.py


4. 결과 엑셀파일 확인한다.
craw1_excel_gmarket.xlsx 확인

craw1_url.xlsx 는 입력받는 G마켓 url 상품파일

 

 

 

[ 크롤링전 준비사항 ]

python 은 먼저 설치한다.

 

# pip 설치항목 확인
pip list

pip install beautifulsoup4
pip install requests
pip install pandas

 

[ craw1.py ]

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import datetime
 
itemList = ["http://item.gmarket.co.kr/item?goodscode=1784246790""http://item.gmarket.co.kr/Item?goodscode=2254667413"]
 
goods_names = []
prices1 = []
prices2 = []
baesongs = []
img_urls = []
 
 
for item in itemList:
    res = requests.get(item)
    soup = bs(res.content, 'html.parser')
 
    # 제품명
    goods_name = soup.find('h1', attrs={"class" : "itemtit"}).get_text()
 
    price = []  # 정상가 price[0] , 할인가 price[1]
    priceList = soup.find_all('strong', attrs={"class":"price_real"})
    for pr in priceList:
        price.append(pr.get_text())
 
    # 배송비
    baesong = soup.find('span', attrs={"class":"txt_emp"}).get_text()
    if baesong !="무료배송":
        baesong = soup.find('em', attrs={"class":"txt_default"}).get_text()
        baesong = baesong.strip("배송비").strip("원")
 
    price[0= price[0].strip("원")
    price[1= price[1].strip("원")
 
    # 이미지
    #img = soup.find('ul', attrs={"class" : "viewer"}).select("li.on a img")
    img_url = soup.find('ul', attrs={"class" : "viewer"}).find("img")
 
    #print(goods_name)
    #print(price[0])
    #print(price[1])
    #print(baesong)
    #print(img_url['src'])
    #print()
 
 
    goods_names.append(goods_name)
    prices1.append(price[0])
    prices2.append(price[1])
    baesongs.append(baesong)
    img_urls.append(img_url['src'])
 
 
 
 
dt_now = datetime.datetime.now()
today = datetime.datetime.strftime(dt_now,'%Y-%m-%d')
excelSheet = "G마켓 " + today
 
df = pd.DataFrame()
df['URL'= img_urls
df['제품명'= goods_names
df['정상가'= prices1
df['할인가'= prices2
df['배송비'= baesongs
 
df.to_excel('./craw1_excel.xlsx', sheet_name=excelSheet)
cs

 

 

 

 

 

참고 : https://library.gabia.com/contents/9239/

 

 

Comments