Thought leadership from the most innovative tech companies, all in one place.

How to Scrape Everything From Instagram Using Python

A guide on scraping from Instagram using Python.

image

Python has many great packages to scrape pages on the internet. One of them is instaloader. But instaloader is especially useful to extract data like videos, images, profile pictures, posts, and comments from Instagram Application.

**Instaloader **is a Python package to download pictures or videos along with their captions and other metadata like comments, like count, posts… from Instagram. Instaloader reveals its internally used methods and structures as a Python module and also allowing to further customize obtaining media and metadata from Instagram Application.

With Instaloader Module we can;

  • downloads public and private profiles, hashtags, user stories, feeds, and saved media,

  • downloads comments, **geotags, **and **captions **of each post,

  • automatically detects profile name changes and renames the target directory accordingly,

  • allows fine-grained customization of filters and where to store downloaded media,

  • download all pictures and videos of a profile, as well as the profile picture.

To install **Instaloader **just write this command to Terminal

pip3 install instaloader

After installation just create a python file and put these codes and run it :) with this code you can;

  • Get users Profile Picture

  • Download Users’ post data like pictures, media, or bio between from time to time

  • Download hashtag post data

  • Get user’s all follower names

  • Get user’s all following names

  • Get user’s post’s comments

  • and finally, get users’ all post data with comments.

import instaloader
from datetime import datetime
from itertools import dropwhile, takewhile
import csv

class GetInstagramProfile():
    def __init__(self) -> None:
        self.L = instaloader.Instaloader()

    def download_users_profile_picture(self,username):
        self.L.download_profile(username, profile_pic_only=True)

    def download_users_posts_with_periods(self,username):
        posts = instaloader.Profile.from_username(self.L.context, username).get_posts()
        SINCE = datetime(2021, 8, 28)
        UNTIL = datetime(2021, 9, 30)

        for post in takewhile(lambda p: p.date > SINCE, dropwhile(lambda p: p.date > UNTIL, posts)):
            self.L.download_post(post, username)

    def download_hastag_posts(self, hashtag):
        for post in instaloader.Hashtag.from_name(self.L.context, hashtag).get_posts():
            self.L.download_post(post, target='#'+hashtag)

    def get_users_followers(self,user_name):
        '''Note: login required to get a profile's followers.'''
        self.L.login(input("input your username: "), input("input your password: ") )
        profile = instaloader.Profile.from_username(self.L.context, user_name)
        file = open("follower_names.txt","a+")
        for followee in profile.get_followers():
            username = followee.username
            file.write(username + "\n")
            print(username)

    def get_users_followings(self,user_name):
        '''Note: login required to get a profile's followings.'''
        self.L.login(input("input your username: "), input("input your password: ") )
        profile = instaloader.Profile.from_username(self.L.context, user_name)
        file = open("following_names.txt","a+")
        for followee in profile.get_followees():
            username = followee.username
            file.write(username + "\n")
            print(username)

    def get_post_comments(self,username):
        posts = instaloader.Profile.from_username(self.L.context, username).get_posts()
        for post in posts:
            for comment in post.get_comments():
                print("comment.id  : "+str(comment.id))
                print("comment.owner.username  : "+comment.owner.username)
                print("comment.text  : "+comment.text)
                print("comment.created_at_utc  : "+str(comment.created_at_utc))
                print("************************************************")

    def get_post_info_csv(self,username):
        with open(username+'.csv', 'w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file)
            posts = instaloader.Profile.from_username(self.L.context, username).get_posts()
            for post in posts:
                print("post date: "+str(post.date))
                print("post profile: "+post.profile)
                print("post caption: "+post.caption)
                print("post location: "+str(post.location))

                posturl = "https://www.instagram.com/p/"+post.shortcode
                print("post url: "+posturl)
                writer.writerow(["post",post.mediaid, post.profile, post.caption, post.date, post.location, posturl,  post.typename, post.mediacount, post.caption_hashtags, post.caption_mentions, post.tagged_users, post.likes, post.comments,  post.title,  post.url ])

                for comment in post.get_comments():
                    writer.writerow(["comment",comment.id, comment.owner.username,comment.text,comment.created_at_utc])
                    print("comment username: "+comment.owner.username)
                    print("comment text: "+comment.text)
                    print("comment date : "+str(comment.created_at_utc))
                print("\n\n")


if __name__=="__main__":
    cls = GetInstagramProfile()
    #cls.download_users_profile_picture("best_gadgets_2030")
    #cls.download_users_posts_with_periods("best_gadgets_2030")
    #cls.download_hastag_posts("gadgets")
    #cls.get_users_followers("best_gadgets_2030")
    #cls.get_users_followings("best_gadgets_2030")
    #cls.get_post_comments("laydline")
    cls.get_post_info_csv("coolest.gadget")

When you call the **download_hastag_posts(“gadgets”) **methods. It will create a folder. The folder name will be #gadgets and its contents will look like this picture.

image

When you call the get_post_info_csv(“coolest.gadget”) It will create a file called coolest.gadget.csv . The CSV file will look like the below picture:

image




Continue Learning