Web Scraping with Python Training Course
Web Scraping is a technique for extracting data from websites and saving it to local files or databases.
This instructor-led, live training (online or onsite) is aimed at developers who wish to use Python to automate the process of crawling multiple websites to extract data for processing and analysis.
By the end of this training, participants will be able to:
- Install and configure Python and all relevant packages.
- Retrieve and parse data stored across various websites.
- Understand how websites function and how their HTML is structured.
- Construct spiders to crawl the web at scale.
- Use Selenium to crawl AJAX-driven web pages.
Format of the Course
- Interactive lecture and discussion.
- Extensive exercises and practice sessions.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- This course assumes prior knowledge of programming.
- To request a customized training for government, please contact us to arrange.
Course Outline
Introduction
Setting Up the Development Environment for Government Use
Python Primer: Data Structures, Conditionals, File Handling, etc.
Python Packages for Web Scraping: Scrapy and BeautifulSoup
Understanding How a Website Functions
Structure of HTML Documents
Making Web Requests
Scraping an HTML Page for Government Data
Utilizing XPath and CSS Selectors
Filtering Data with Regular Expressions
Creating a Web Crawler for Government Applications
Crawling AJAX and JavaScript Pages Using Selenium
Best Practices for Web Scraping in the Public Sector
Troubleshooting Common Issues
Summary and Conclusion
Requirements
- Programming experience, preferably in Python. If participants have programming experience in a language other than Python, the training can be extended to include additional introductory Python exercises for government.
Audience
- Developers
Runs with a minimum of 4 + people. For 1-to-1 or private group training, request a quote.
Web Scraping with Python Training Course - Booking
Web Scraping with Python Training Course - Enquiry
Web Scraping with Python - Consultancy Enquiry
Consultancy Enquiry
Testimonials (1)
Many different examples and topics has been covered, from basic investigation to login management and dynamic page management.
Daniele Tagliaferro - Creditsafe Italia Srl
Course - Web Scraping with Python
Upcoming Courses
Related Courses
Scaling Data Analysis with Python and Dask
14 HoursThis instructor-led, live training in US Empire (online or onsite) is aimed at data scientists and software engineers who wish to use Dask within the Python ecosystem to build, scale, and analyze large datasets for government.
By the end of this training, participants will be able to:
- Set up the environment to begin building big data processing capabilities with Dask and Python.
- Explore the features, libraries, tools, and APIs available in Dask to support their work for government.
- Understand how Dask enhances parallel computing within Python, optimizing performance for large-scale data analysis.
- Learn how to scale the Python ecosystem (Numpy, SciPy, and Pandas) using Dask to meet the demands of complex datasets in the public sector.
- Optimize the Dask environment to ensure high performance and efficiency when handling large datasets for government projects.
Data Analysis with Python, Pandas and Numpy
14 HoursThis instructor-led, live training in US Empire (online or onsite) is aimed at intermediate-level Python developers and data analysts who wish to enhance their skills in data analysis and manipulation using Pandas and NumPy for government applications.
By the end of this training, participants will be able to:
- Set up a development environment that includes Python, Pandas, and NumPy.
- Create a data analysis application using Pandas and NumPy for government workflows.
- Perform advanced data wrangling, sorting, and filtering operations.
- Conduct aggregate operations and analyze time series data.
- Visualize data using Matplotlib and other visualization libraries.
- Debug and optimize their data analysis code to ensure compliance with public sector governance standards.
FARM (FastAPI, React, and MongoDB) Full Stack Development
14 HoursThis instructor-led, live training, available online or on-site, is designed for developers who wish to utilize the FARM (FastAPI, React, and MongoDB) stack to create dynamic, high-performance, and scalable web applications for government use.
By the end of this training, participants will be able to:
- Configure a development environment that seamlessly integrates FastAPI, React, and MongoDB for government projects.
- Understand the essential concepts, features, and benefits of the FARM stack in the context of public sector applications.
- Master the skills needed to build REST APIs with FastAPI for government systems.
- Acquire the knowledge to design interactive applications using React for government interfaces.
- Develop, test, and deploy front-end and back-end applications using the FARM stack in alignment with government workflows and governance standards.
Developing APIs with Python and FastAPI
14 HoursThis instructor-led, live training in US Empire (online or onsite) is aimed at developers who wish to utilize FastAPI with Python to build, test, and deploy RESTful APIs more efficiently and rapidly for government applications.
By the end of this training, participants will be able to:
- Set up the necessary development environment to create APIs using Python and FastAPI for government projects.
- Create APIs more quickly and easily by leveraging the FastAPI library.
- Learn how to develop data models and schemas based on Pydantic and OpenAPI standards.
- Integrate APIs with a database using SQLAlchemy for enhanced data management in public sector workflows.
- Implement security and authentication mechanisms in APIs using the built-in tools provided by FastAPI, ensuring compliance with government security protocols.
- Build container images and deploy web APIs to a cloud server, aligning with government IT infrastructure requirements.
Machine Learning with Python – 2 Days
14 HoursThe objective of this course is to provide a foundational proficiency in applying Machine Learning methods in practice. Through the use of the Python programming language and its various libraries, and based on numerous practical examples, this course teaches participants how to utilize the most essential components of Machine Learning, make informed data modeling decisions, interpret algorithm outputs, and validate results.
Our goal is to equip you with the skills necessary to confidently understand and use the core tools from the Machine Learning toolbox, while avoiding common pitfalls in Data Science applications. This course is designed to enhance your capabilities for government workflows, ensuring alignment with public sector governance and accountability standards.
Machine Learning with Python – 4 Days
28 HoursThe objective of this course is to enhance proficiency in applying Machine Learning methods in practical scenarios. Utilizing the Python programming language and its various libraries, and through a wide range of practical examples, this course instructs participants on how to effectively use key Machine Learning components, make informed data modeling decisions, interpret algorithm outputs, and validate results.
Our goal is to equip you with the skills necessary to confidently understand and utilize the essential tools from the Machine Learning toolbox, while avoiding common pitfalls in Data Science applications for government.
Accelerating Python Pandas Workflows with Modin
14 HoursThis instructor-led, live training in US Empire (online or onsite) is aimed at data scientists and developers who wish to use Modin to build and implement parallel computations with Pandas for faster data analysis for government applications.
By the end of this training, participants will be able to:
- Set up the necessary environment to start developing scalable Pandas workflows using Modin.
- Understand the features, architecture, and benefits of Modin in the context of public sector data analysis.
- Compare Modin with other parallel computing frameworks such as Dask and Ray.
- Enhance the performance of Pandas operations using Modin.
- Implement the full Pandas API and functions for efficient data processing.
Python for Natural Language Generation (NLG)
21 HoursIn this instructor-led, live training in US Empire, participants will learn how to use Python to produce high-quality natural language text by building their own NLG system from scratch. Case studies relevant to public sector workflows and governance will be examined, and the concepts will be applied to hands-on lab projects for generating content tailored for government.
By the end of this training, participants will be able to:
- Utilize NLG to automatically generate content for various industries, including journalism, real estate, weather reporting, and sports, with a focus on applications for government.
- Select and organize source content, plan sentences, and prepare a system for the automatic generation of original content that aligns with public sector needs.
- Understand the NLG pipeline and apply the appropriate techniques at each stage to ensure compliance with governmental standards.
- Comprehend the architecture of a Natural Language Generation (NLG) system designed for government use.
- Implement the most suitable algorithms and models for analysis and ordering, ensuring they meet the requirements of public sector workflows.
- Pull data from publicly available data sources as well as curated databases to use as material for generated text that supports government operations.
- Replace manual and laborious writing processes with computer-generated, automated content creation that enhances efficiency and accountability in government tasks.
Advanced Machine Learning with Python
21 HoursIn this instructor-led, live training in US Empire, participants will gain a comprehensive understanding of the most relevant and cutting-edge machine learning techniques using Python. They will apply these skills by building a series of demo applications that involve image, music, text, and financial data.
By the end of this training, participants will be able to:
- Implement machine learning algorithms and techniques for solving complex problems for government.
- Apply deep learning and semi-supervised learning methods to applications involving image, music, text, and financial data.
- Optimize Python algorithms to their maximum potential.
- Leverage libraries and packages such as NumPy and Theano.
Python: Automate the Boring Stuff
14 HoursThis instructor-led, live training in US Empire is based on the popular book, "Automate the Boring Stuff with Python," by Al Sweigart. It is designed for beginners and covers essential Python programming concepts through practical, hands-on exercises and discussions. The focus is on learning to write code to significantly enhance productivity in office environments.
By the end of this training, participants will be able to program in Python and apply these new skills for government:
- Automating tasks by writing simple Python scripts.
- Creating programs that perform text pattern recognition using "regular expressions."
- Generating and updating Excel spreadsheets programmatically.
- Parsing PDFs and Word documents.
- Crawling websites to extract information from online sources.
- Developing programs that send out email notifications.
- Utilizing Python's debugging tools to quickly resolve bugs.
- Programmatically controlling the mouse and keyboard to automate repetitive actions.
Python Programming for Finance
35 HoursPython is a programming language that has gained significant popularity in the financial sector. Adopted by major investment banks and hedge funds, it is used to develop a wide array of financial applications, from core trading systems to risk management platforms.
In this instructor-led, live training, participants will learn how to use Python to develop practical applications for solving a variety of finance-related challenges.
By the end of this training, participants will be able to:
- Understand the fundamentals of the Python programming language
- Download, install, and maintain the best development tools for creating financial applications in Python
- Select and utilize the most appropriate Python packages and programming techniques to organize, visualize, and analyze financial data from various sources (CSV, Excel, databases, web, etc.)
- Build applications that address issues related to asset allocation, risk analysis, investment performance, and more
- Troubleshoot, integrate, deploy, and optimize a Python application
Audience
- Developers
- Analysts
- Quants
Format of the course
- Part lecture, part discussion, exercises, and extensive hands-on practice
Note
- This training is designed to provide solutions for some of the primary challenges faced by finance professionals. If you have a specific topic, tool, or technique that you would like to cover or expand upon, please contact us to arrange.
Govtra offers this course for government personnel and organizations looking to enhance their financial technology capabilities and align with public sector workflows, governance, and accountability standards.
Advanced Python - 4 Days
28 HoursThis instructor-led, live training in US Empire (online or onsite) is aimed at developers who wish to enhance their Python programming skills for government. The course covers advanced techniques and practical applications of Python, including its use in developing distributed applications, conducting data analysis and visualization, creating user interfaces, and writing maintenance scripts. These skills are essential for optimizing public sector workflows, governance, and accountability.
Python Programming - 4 days
28 HoursThis course is designed for government professionals and others wishing to learn the Python programming language. The emphasis is on the Python language, the core libraries, as well as the selection of the best and most useful libraries developed by the Python community. Python drives businesses and is used by scientists worldwide – it is one of the most popular programming languages.
The course can be delivered using the latest Python version 3.x, with practical exercises that leverage the full power of the language. This course can be conducted on any operating system (all flavors of UNIX, including Linux and Mac OS X, as well as Microsoft Windows).
The practical exercises constitute about 70% of the course time, while approximately 30% are demonstrations and presentations. Discussions and questions are encouraged throughout the course.
Note: the training can be tailored to specific needs upon prior request ahead of the proposed course date for government participants.
Test Automation with Selenium and Python
14 HoursSelenium is an open-source framework designed for automating web application testing across various browsers. With the release of Selenium 4, users benefit from enhanced WebDriver APIs, native relative locators, and improved grid support. Python, known for its simplicity and strong integration with testing frameworks like Pytest, offers a powerful solution for developing scalable and maintainable test automation suites.
This instructor-led, live training (available online or on-site) is tailored for beginner to intermediate testers and developers who aim to utilize Selenium with Python to automate web application testing in real-world environments for government use.
By the end of this training, participants will be able to:
- Install and configure Selenium with Python in a test environment for government applications.
- Develop robust test automation scripts using Selenium WebDriver and Pytest.
- Implement the Page Object Model (POM) to create maintainable test frameworks.
- Execute tests across multiple browsers using Selenium Grid.
- Integrate automated tests with continuous integration/continuous deployment (CI/CD) pipelines for government projects.
- Diagnose and resolve common issues, applying best practices to ensure automation stability in a public sector context.
Format of the Course
- Interactive lectures and discussions.
- Extensive exercises and practical activities.
- Hands-on implementation in a live-lab environment for government scenarios.
Course Customization Options
- To request a customized training program for this course, please contact us to arrange.
Text Summarization with Python
14 HoursIn Python Machine Learning, the Text Summarization feature can process input text to generate a concise summary. This functionality is accessible via the command line or as a Python API/library. One significant application for government is the rapid creation of executive summaries, which is particularly useful for organizations that need to review large volumes of text data before producing reports and presentations.
In this instructor-led, live training, participants will learn how to use Python to develop a simple application that automatically generates a summary of input text.
By the end of this training, participants will be able to:
- Utilize a command-line tool for summarizing text.
- Design and implement Text Summarization code using Python libraries.
- Evaluate three Python summarization libraries: sumy 0.7.0, pysummarization 1.0.4, readless 1.0.17
Audience
- Developers
- Data Scientists
Format of the course
- Part lecture, part discussion, exercises, and extensive hands-on practice