EN VI

Php - Applying REQUEST HEADER with cUrl?

How to Php - Applying REQUEST HEADER with cUrl

I have been trying to get sizeable results fetching img urls from a image site (pixiv) (Input links are of artwork kind. For example:

https://www.pixiv.net/en/artworks/116849074

will work with this php) And while retrieving the relevant links via patternmatching is no problem, it seems that link(s), even if correctly formatted are throwing 403's as that site is configured to thwart outside access (probably to preserve bandwidth).

I did stumble across a option to pass on a valid "request header" in order to get things to work: https://www.reddit.com/r/Rlanguage/comments/ytgtun/im_trying_to_use_downloadfile_but_i_get_a_403/?rdt=55917

However so far this seems not to work (the original example was in "R", I'm using PHP to try and replicate the behavior.)

My code sofar looks like this (the main focus is on the php side, the rest is just JS to ease things should I get it to work:

<!DOCTYPE html>
<html>
<head>
    <title>Image Retrieval</title>
</head>
<body>
    <form method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>">
        <label for="url">Enter the URL:</label>
        <input type="text" id="url" name="url">
        <button type="submit">Submit</button>
    </form>
    <?php
    if ($_SERVER["REQUEST_METHOD"] == "POST") {
        $url = $_POST["url"];
        $options = [
            'http' => [
                'header' => "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0\r\n" .
                            "Referer: https://accounts.pixiv.net\r\n",
            ],
        ];
        $context = stream_context_create($options);
        $html = file_get_contents($url, false, $context);
        $pattern = '/image" href="(.*?)"/';   //find downscaled master img (always in jpg format)
        //$pattern = '/"original":"(.*?)"/'; //find original image (usually only works when logged in)
        preg_match($pattern, $html, $matches);
        $imageUrl = $matches[1];

        echo '<p>Image Link: <a id="image-link" href="' . $imageUrl . '">' . $imageUrl . '</a></p>';
    }
    ?>

    <script>
        var imageLink = document.getElementById("image-link");
        if (imageLink) {
            window.location.href = imageLink.href;
        }
    </script>

<-!Autofill if querystring exists-->
    <script>
        var urlParams = new URLSearchParams(window.location.search);
        var pixivUrl = urlParams.get('pixivurl');
        if (pixivUrl) {
            var urlInput = document.getElementById('url');
            if (urlInput) {
                urlInput.value = pixivUrl;
            }
            var form = document.querySelector('form');
            if (form) {
                form.submit();
            }
        }
    </script>
</body>
</html>

I'm fairly certain something specific is needed to pass on a request header properly, but I never had to use that feature, so I'm at a bit of a loss.

Thanks in advance

Solution:

It seems like you're trying to retrieve image URLs from Pixiv and display them on your webpage. However, you're encountering 403 Forbidden errors, likely due to Pixiv's restrictions on outside access.

To bypass this issue, you can try setting additional headers in your HTTP request to mimic a browser request more closely.

<!DOCTYPE html>
<html>
<head>
    <title>Image Retrieval</title>
</head>
<body>
    <form method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>">
        <label for="url">Enter the URL:</label>
        <input type="text" id="url" name="url">
        <button type="submit">Submit</button>
    </form>
    <?php
    if ($_SERVER["REQUEST_METHOD"] == "POST") {
        $url = $_POST["url"];
        $options = [
            'http' => [
                'header' => "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0\r\n" .
                            "Referer: https://www.pixiv.net/\r\n" .
                            "Accept-Language: en-US,en;q=0.5\r\n" .
                            "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n",
            ],
        ];
        $context = stream_context_create($options);
        $html = file_get_contents($url, false, $context);
        $pattern = '/image" href="(.*?)"/';   //find downscaled master img (always in jpg format)
        preg_match($pattern, $html, $matches);
        $imageUrl = $matches[1];

        echo '<p>Image Link: <a id="image-link" href="' . $imageUrl . '">' . $imageUrl . '</a></p>';
    }
    ?>

    <script>
        var imageLink = document.getElementById("image-link");
        if (imageLink) {
            window.location.href = imageLink.href;
        }
    </script>

    <!-- Autofill if querystring exists -->
    <script>
        var urlParams = new URLSearchParams(window.location.search);
        var pixivUrl = urlParams.get('pixivurl');
        if (pixivUrl) {
            var urlInput = document.getElementById('url');
            if (urlInput) {
                urlInput.value = pixivUrl;
            }
            var form = document.querySelector('form');
            if (form) {
                form.submit();
            }
        }
    </script>
</body>
</html>

https://www.fiverr.com/s/GYrvad