Developing a Script for Bypassing GeeTest CAPTCHA in Python: From Idea to Execution
Introduction — or why cracking GeeTest CAPTCHA is nothing like a new Haval
Lately you’ll find Chinese-goods and services in virtually every niche. And when you hear “this is a Chinese development,” you might smile and recall the Internet in the 90s: “Glasses n-n-needed?” Yet despite the jokes, one thing the Chinese have actually nailed is anti-bot protection — in particular, GeeTest CAPTCHA, a system that many optimisers shed salty tears over while trying to bypass it.
Why did the Chinese team pivot from import-cars to hardcore spam-protection? One guess (admittedly subjective) is this: GeeTest is used not only for exports but also internally in China — which means they really build it for themselves. The official description: GeeTest CAPTCHA is a modern protection system, widely used across web-services to prevent automated requests. Its core is a dynamic puzzle-slider: the user drags a piece of an image into a cut-out.
I got curious about how this CAPTCHA works — and what pitfalls you’ll run into when writing a solver. Let’s dig in.
Quick note — I’ll focus on bypassing GeeTest via a captcha-solving service (my choice: 2Captcha).
How GeeTest CAPTCHA Works — a challenge for even veteran developers
GeeTest consists of two main protection layers:
-
The slider component.
On each request the server dynamically generates a unique background image with a “hole” plus a puzzle fragment. This complicates use of pre-built solutions. The user drags the fragment so it fits exactly. During dragging the system records:-
final position of the puzzle piece
-
the trajectory of the slider movement
-
time intervals between actions
-
-
Behavioral data analysis.
This component doesn’t act alone—it’s integrated at every stage: how the user moved the mouse, how the drag was performed, cursor jitter, etc. These subtle behaviours matter.
Finally, server validation: after the drag the browser sends movement and positioning data to the server, which compares it to expected parameters.
This multi-level strategy makes robot-emulation much harder and raises the bar for an automated bypass.
The technology above characterises GeeTest version 4 (GeeTest V4). Its predecessor, version 3, lacked invisible-mode checking and had a simpler behavioral analysis. In practice, both V3 and V4 are much tougher to beat than, say, reCAPTCHA (fortunately GeeTest is less common in Europe).
What are the quirks — and why automating this CAPTCHA is not trivial
With a standard CAPTCHA like reCAPTCHA you might: find the widget on the page, extract certain static parameters, send them to a solving service, wait for the result. Static parameters = easier automation.
With GeeTest it’s not so straightforward. It mixes static and dynamic parameters which must be retrieved each time the CAPTCHA loads.
Example:
-
GeeTest V3 uses static:
websiteURL(page URL),gtvalue.
Dynamic:challenge— generated on page load (must be fresh, or the CAPTCHA is invalid). -
GeeTest V4 replaces
gtandchallengewith an objectinitParameterswhich must includecaptcha_id(site-specific configuration ID).
Technically this seems simple — but remember: these parameters are not merely in the HTML of the page. They are often generated after interaction with the widget. That means you must emulate user behaviour (which itself might raise red flags in GeeTest’s system) and often use proxies. So each added requirement creates another layer of complexity.
I’m going to try to bypass GeeTest on a test page (where it’s less aggressive and probably doable without proxies) — but in real deployment remember: proxies may be necessary.
Getting ready for implementation of the bypass
Having covered the technical dive, let’s move to the hands-on: how to actually build the bypass.
Here’s what you’ll need:
-
Python 3 — download from python.org, install ensuring “add to PATH” is checked.
-
pip — usually installed with Python. Verify with:
-
Required Python libraries:
requestsandselenium -
ChromeDriver — separate utility to allow Selenium to control Google Chrome. Install steps:
-
find your Chrome version (“About Chrome”)
-
download matching ChromeDriver from the official site
-
either put
chromedriverin a folder on your PATH or specify its path in Selenium code:
-
-
API key from a captcha-solving service (I’ll show how to use 2Captcha below).
The Python script — from reading, to trying
Below is the script (full code not repeated here) and explanation of what it does and how it works.
Broad strokes:
-
Imports:
re,time,json,argparse,requests, plus Selenium modules. -
Constants:
API_KEY,CREATE_TASK_URL,GET_TASK_RESULT_URLfor 2Captcha. -
Functions:
-
extract_geetest_v3_params(html)— takes HTML, uses regex to findgtandchallenge. -
extract_geetest_v4_params(html)— extractscaptcha_idfrom HTML. -
get_geetest_v3_params_via_requests(website_url)— for demo pages returns static sample values (avoids split()-fail). -
auto_extract_params(website_url)— determines version (v3 or v4), initializes Selenium driver, loads page, possibly clicks widget for v4, and returns driver object + version identifier + required parameters. -
create_geetest_v3_task(...)/create_geetest_v4_task(...)— create task via 2Captcha API for respective versions. -
get_task_result(task_id, retry_interval=5, max_retries=20)— polls 2Captcha until solution is ready or times out. -
main()— orchestrates argument parsing (--website-url, optional proxy specs), sets up proxyless or proxy mode, callsauto_extract_params, create task, wait for result, then inject solution into the page viadriver.execute_script, waits 30 sec, quits driver.
-
What it accomplishes:
-
Runs browser with ChromeDriver via Selenium, loads the page, extracts necessary dynamic parameters.
-
Sends task to 2Captcha, receives solution JSON.
-
Injects solution into the web page (hidden form fields or innerHTML replacement) to “trick” the site into believing CAPTCHA is passed.
-
Leaves browser open for 30 seconds so you can visually verify that “CAPTCHA passed” appears.
Conclusion
In this article I’ve walked through how GeeTest CAPTCHA works and attempted to show that you can bypass it even with minimal programming skills (yes, Python counts!). But caution: you must carefully extract all the parameters and correctly emulate browser behaviour, or you may spend hours stuck with a dynamic challenge that refuses to validate (been there, done that).
Комментарии
Отправить комментарий