PYTHON을 사용하여 웹에서 파일 다운로드

요청사항 다양한 애플리케이션을 갖춘 Python의 다목적 HTTP 라이브러리입니다. 그 응용 프로그램 중 하나는 파일 URL을 사용하여 웹에서 파일을 다운로드하는 것입니다. 설치: First of all you would need to download the requests library. You can directly install it using pip by typing following command:

pip install requests

Or download it directly from 여기 그리고 수동으로 설치하세요.

파일 다운로드 중

Python3

    # imported the requests library   import   requests   image_url   =   'https://www.python.org/static/community_logos/python-logo-master-v3-TM.webp'   # URL of the image to be downloaded is defined as image_url   r   =   requests  .  get  (  image_url  )   # create HTTP response object   # send a HTTP request to the server and save   # the HTTP response in a response object called r   with   open  (  'python_logo.webp'    'wb'  )   as   f  :   # Saving received content as a png file in   # binary format   # write the contents of the response (r.content)   # to a new file in binary mode.   f  .  write  (  r  .  content  )

This small piece of code written above will download the following image from the web. Now check your local directory(the folder where this script resides) and you will find this image: All we need is the URL of the image source. (You can get the URL of image source by right-clicking on the image and selecting the View Image option.)

대용량 파일 다운로드

HTTP 응답 콘텐츠( r.내용 )는 파일 데이터를 저장하는 문자열일 뿐입니다. 따라서 대용량 파일의 경우 모든 데이터를 단일 문자열에 저장할 수 없습니다. 이 문제를 극복하기 위해 프로그램에 몇 가지 변경 사항을 적용합니다.

모든 파일 데이터를 우리가 사용하는 단일 문자열로 저장할 수 없기 때문에 r.iter_content 청크 크기를 지정하여 데이터를 청크로 로드하는 방법입니다.

 r = requests.get(URL stream = True)

Setting 개울 매개변수 진실 응답 헤더만 다운로드되고 연결은 계속 열려 있습니다. 이렇게 하면 큰 응답을 위해 콘텐츠를 메모리로 한꺼번에 읽는 것을 방지할 수 있습니다. 고정된 청크는 매번 로드됩니다. r.iter_content is iterated. Here is an example: Python3

    import   requests   file_url   =   'http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf'   r   =   requests  .  get  (  file_url     stream   =   True  )   with   open  (  'python.pdf'    'wb'  )   as   pdf  :   for   chunk   in   r  .  iter_content  (  chunk_size  =  1024  ):   # writing one chunk at a time to pdf file   if   chunk  :   pdf  .  write  (  chunk  )

비디오 다운로드

이 예에서 우리는 이에 대해 사용할 수 있는 모든 비디오 강의를 다운로드하는 데 관심이 있습니다. 웹페이지 . 본 강의의 모든 자료를 보실 수 있습니다 여기 . So we first scrape the webpage to extract all video links and then download the videos one by one. Python3

    import   requests   from   bs4   import   BeautifulSoup   '''    URL of the archive web-page which provides link to    all video lectures. It would have been tiring to    download each video manually.    In this example we first crawl the webpage to extract    all the links and then download videos.    '''   # specify the URL of the archive here    archive_url   =   'https://public.websites.umich.edu/errors/404.html   def   get_video_links  ():   # create response object    r   =   requests  .  get  (  archive_url  )   # create beautiful-soup object    soup   =   BeautifulSoup  (  r  .  content    'html5lib'  )   # find all links on web-page    links   =   soup  .  findAll  (  'a'  )   # filter the link sending with .mp4    video_links   =   [  archive_url   +   link  [  'href'  ]   for   link   in   links   if   link  [  'href'  ]  .  endswith  (  'mp4'  )]   return   video_links   def   download_video_series  (  video_links  ):   for   link   in   video_links  :      '''iterate through all links in video_links     and download them one by one'''   # obtain filename by splitting url and getting    # last string    file_name   =   link  .  split  (  '/'  )[  -  1  ]   print  (   'Downloading file:  %s  '  %  file_name  )   # create response object    r   =   requests  .  get  (  link     stream   =   True  )   # download started    with   open  (  file_name     'wb'  )   as   f  :   for   chunk   in   r  .  iter_content  (  chunk_size   =   1024  *  1024  ):   if   chunk  :   f  .  write  (  chunk  )   print  (   '  %s   downloaded!  n  '  %  file_name   )   print   (  'All videos downloaded!'  )   return   if   __name__   ==   '__main__'  :   # getting all video links    video_links   =   get_video_links  ()   # download all videos    download_video_series  (  video_links  )

Advantages of using Requests library to download web files are:

웹사이트를 반복적으로 반복함으로써 웹 디렉토리를 쉽게 다운로드할 수 있습니다!
이는 브라우저 독립적인 방법이며 훨씬 빠릅니다!
웹페이지를 스크랩하여 웹페이지의 모든 파일 URL을 가져오고 단일 명령으로 모든 파일을 다운로드할 수 있습니다.
BeautifulSoup을 사용하여 Python에서 웹 스크래핑 구현

이 블로그는 Nikhil Kumar가 기고했습니다. 퀴즈 만들기