Turn your code into any language with our Code Converter. It's the ultimate tool for multi-language programming. Start converting now!
Job descriptions are full of useful signal if you know how to read them at scale. Instead of skimming postings one by one, you can write a small Python tool that reads hundreds of them, counts which technical skills appear most often, and turns that into a ranked table and a chart. It is a genuinely practical project: the same technique works for tracking what employers want, validating which library to learn next, or feeding a dashboard.
In this tutorial, we will build a tech job skill analyzer from scratch. It loads a set of job postings, uses regular expressions to detect skills reliably (without the false positives a naive keyword search produces), ranks them by how many postings mention each one, and finally visualizes the result with matplotlib.
Learn also: Web Scraping tutorials in Python if you want to gather your own job posting dataset from the web.
Let's get started. The only third-party library we need is matplotlib:
$ pip install matplotlib
The naive approach to detecting skills is to check if "go" in description. This breaks immediately: "go" matches "google", "ago", and "category", and "r" matches almost everything. We avoid this by mapping each skill to a regular expression with word boundaries (\b), so we only match the skill as a standalone token.
Create a file named skill_analyzer.py:
import re
import json
from collections import Counter
# Map each skill to the regex pattern that matches it.
# Word boundaries (\b) prevent false positives like matching
# "go" inside "google" or "java" inside "javascript".
SKILL_PATTERNS = {
"Python": r"\bpython\b",
"SQL": r"\bsql\b",
"JavaScript": r"\b(javascript|js)\b",
"TypeScript": r"\b(typescript|ts)\b",
"Java": r"\bjava\b(?!script)",
"Go": r"\b(golang|go)\b",
"Rust": r"\brust\b",
"C++": r"c\+\+",
"AWS": r"\baws\b",
"Docker": r"\bdocker\b",
"Kubernetes": r"\b(kubernetes|k8s)\b",
"React": r"\breact\b",
"Machine Learning": r"\b(machine learning|ml)\b",
"Pandas": r"\bpandas\b",
}
A few patterns are worth noting. \bjava\b(?!script) uses a negative lookahead so "java" does not also match inside "javascript". The c\+\+ pattern escapes the plus signs, since + is a special character in regex. You can extend this dictionary with any skill you want to track.
We will store postings as a simple JSON list, where each entry has a title and a description. This keeps the data source separate from the logic, so you can later swap in a dataset you scraped or exported yourself:
def load_jobs(path):
"""Load job postings from a JSON file (a list of {title, description})."""
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
Create a sample_jobs.json file next to your script to test with:
[
{"title": "Senior Data Engineer", "description": "We need strong Python and SQL skills, experience with AWS, Docker and building data pipelines with Pandas."},
{"title": "Machine Learning Engineer", "description": "Python, machine learning frameworks, SQL for data access, and Kubernetes for deployment."},
{"title": "Backend Developer", "description": "Go and Python backend services, Docker, AWS, REST APIs."},
{"title": "Frontend Engineer", "description": "JavaScript, TypeScript and React for building modern web apps."},
{"title": "Full Stack Developer", "description": "JavaScript, TypeScript, React, Python, SQL and Docker experience required."},
{"title": "Data Analyst", "description": "SQL is essential, Python and Pandas for analysis, plus dashboarding."},
{"title": "DevOps Engineer", "description": "Kubernetes, Docker, AWS, Go and Python scripting."},
{"title": "Systems Programmer", "description": "C++ and Rust for high performance systems, plus some Python tooling."},
{"title": "ML Platform Engineer", "description": "Python, machine learning, Kubernetes, AWS, Docker, SQL."},
{"title": "Junior Python Developer", "description": "Python, SQL basics, willingness to learn React and JavaScript."}
]
Now we scan every posting and tally each skill. We compile the patterns once up front (compiling inside the loop would be wasteful), and we use a set per job so a skill mentioned three times in one description still only counts as one posting:
def count_skills(jobs):
"""Return a Counter of how many postings mention each skill."""
counts = Counter()
compiled = {skill: re.compile(pat, re.IGNORECASE)
for skill, pat in SKILL_PATTERNS.items()}
for job in jobs:
text = f"{job.get('title','')} {job.get('description','')}"
seen = set()
for skill, pattern in compiled.items():
if pattern.search(text) and skill not in seen:
counts[skill] += 1
seen.add(skill)
return counts
We combine the title and description into one searchable string so a skill named only in the job title is still caught. collections.Counter does the heavy lifting and gives us a handy most_common() method later.
A raw count is useful, but a percentage is easier to reason about ("Python appears in 90% of postings" lands harder than "Python: 9"). This function converts counts into a sorted list of tuples:
def rank_skills(counts, total_jobs):
"""Turn raw counts into a sorted list of (skill, count, percentage)."""
ranked = []
for skill, count in counts.most_common():
pct = round((count / total_jobs) * 100, 1)
ranked.append((skill, count, pct))
return ranked
Add a main block to load the data, run the analysis, and print a clean table:
if __name__ == "__main__":
jobs = load_jobs("sample_jobs.json")
counts = count_skills(jobs)
ranked = rank_skills(counts, len(jobs))
print(f"Analyzed {len(jobs)} job postings\n")
print(f"{'Skill':<18}{'Postings':<10}{'% of jobs'}")
print("-" * 38)
for skill, count, pct in ranked:
print(f"{skill:<18}{count:<10}{pct}%")
Run it:
$ python skill_analyzer.py
Output:
Analyzed 10 job postings
Skill Postings % of jobs
--------------------------------------
Python 9 90.0%
SQL 6 60.0%
Docker 5 50.0%
AWS 4 40.0%
Kubernetes 3 30.0%
JavaScript 3 30.0%
React 3 30.0%
Pandas 2 20.0%
Machine Learning 2 20.0%
Go 2 20.0%
TypeScript 2 20.0%
Rust 1 10.0%
C++ 1 10.0%
A table is fine, but a horizontal bar chart communicates the ranking instantly. Create a second file, make_chart.py, that imports our functions and plots the top 10:
import matplotlib
matplotlib.use("Agg") # non-interactive backend, good for saving files
import matplotlib.pyplot as plt
from skill_analyzer import load_jobs, count_skills, rank_skills
jobs = load_jobs("sample_jobs.json")
ranked = rank_skills(count_skills(jobs), len(jobs))
# Take the top 10 skills for a clean chart
top = ranked[:10]
skills = [r[0] for r in top]
percentages = [r[2] for r in top]
plt.figure(figsize=(10, 6))
# Reverse the lists so the highest value sits at the top of the chart
plt.barh(skills[::-1], percentages[::-1], color="#306998")
plt.xlabel("Percentage of job postings (%)")
plt.title("Top 10 In-Demand Tech Skills")
plt.tight_layout()
plt.savefig("skills_chart.png", dpi=120)
print("Chart saved to skills_chart.png")
Run it with python make_chart.py and you will get a skills_chart.png file ranking the skills by demand. We slice with [::-1] because barh draws from the bottom up, and we want the most in-demand skill at the top.
The sample data keeps the tutorial reproducible, but the tool is built to handle real postings. To analyze the actual market, replace sample_jobs.json with your own dataset. There are two practical ways to get one:
robots.txt.If you want to try the analyzer against live listings, an aggregator like Motion Recruitment's tech jobs board is a good source of real, current descriptions across roles like data engineering, DevOps, and machine learning. Collect the title and description text into the same JSON shape, point the script at it, and the rest of the pipeline works unchanged.
Running this kind of analysis across a large, real dataset surfaces a clear picture of the 2026 market, and it lines up with what industry research is reporting. An analysis of more than 800,000 US tech job postings published between January 2025 and March 2026 found that Python and SQL remain the most in-demand programming languages, appearing in 46% and 45% of listings respectively, with Java following at 21% and JavaScript at 19%.
The interesting takeaway for anyone building a learning roadmap is what sits alongside the top language. Coding skills increasingly appear in job ads across finance, healthcare, and manufacturing, not just traditional tech industries. A skill analyzer like this one makes that visible: Python rarely appears alone. It clusters with SQL, cloud tools, and containerization, which is exactly the combination employers are paying for. If your own analysis shows Python sitting next to Docker, AWS, and SQL again and again, that co-occurrence is your study plan written by the market itself.
You now have a working, extensible tool that reads job postings, detects skills with proper regex matching, ranks them by demand, and visualizes the result. From here you could add co-occurrence analysis (which skills appear together), track changes over time by saving snapshots, or expand the skill dictionary to cover frameworks and soft skills.
The full code is split cleanly into reusable functions, so dropping in a larger dataset is the only change needed to turn this from a demo into something genuinely useful for your own career research.
Happy coding! ♥
Just finished the article? Now, boost your next project with our Python Code Generator. Discover a faster, smarter way to code.
View Full Code Explain The Code for Me
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!