Course Outline

Progression in protein structure prediction: advancing from homology modeling to deep learning methodologies
AlphaFold’s contribution to structural biology, pharmaceutical development, and functional characterization
Establishing realistic expectations: capabilities, constraints, and points for experimental integration
Practical Exercise: Navigating the AlphaFold Protein Structure Database (AFDB) interface and conducting initial sequence queries

Neural network framework: Evoformer, structure module, and attention-based sequence modeling
Generation of Multiple Sequence Alignments (MSA) and template matching using databases such as PDB, UniRef, and BFD
Explanation of confidence metrics: per-residue local distance difference test (pLDDT) and predicted aligned error (PAE)
Practical Exercise: Mapping AlphaFold’s workflow stages using a representative protein sequence to trace MSA and template inputs

Official deployment channels: AlphaFold DB, public API, Colab notebooks, and local or GPU-enabled environments
Establishing a reproducible Colab environment: installing dependencies, allocating GPU resources, and formatting inputs
Preparation of protein sequences: managing FASTA structure, chain handling, and multi-domain considerations
Practical Lab: Deploying the official AlphaFold Colab notebook, uploading a custom FASTA file, and initiating the first prediction run

Navigating AFDB: understanding organism coverage, structure quality indicators, and available download formats (PDB/mmCIF, unrelaxed/pLDDT files)
Cross-referencing AFDB data with UniProt, PDB, and functional databases such as GO, KEGG, and CATH
Managing large-scale datasets: addressing batch prediction limits, citation guidelines, and data licensing requirements for government and public sector use
Practical Exercise: Extracting high-confidence AFDB models for a target pathway and preparing files for downstream analysis

Analyzing pLDDT heatmaps to identify structured cores, disordered regions, and low-confidence domains
Decoding PAE matrices to detect domain boundaries, intra-chain and inter-chain interactions, and potential misfolding areas
Determining prediction reliability based on sequence coverage, evolutionary depth, and known structural homologs
Practical Exercise: Evaluating pLDDT and PAE outputs for a multi-domain protein, identifying low-confidence regions, and planning subsequent mutagenesis or validation targets

Repository structure: examining core modules, data pipelines, and configuration files
Modifying inputs: implementing custom MSAs, overriding templates, and adjusting confidence thresholds
Performance optimization techniques: reducing runtime, managing memory, and saving checkpoints
Practical Lab: Running a modified AlphaFold pipeline in Colab with a custom template constraint and exporting refined PDB files

Utilizing predicted models to guide mutagenesis, crystallization efforts, and cryo-EM grid planning
Functional annotation: mapping active sites, preparing ligand docking studies, and predicting interaction interfaces
Addressing limitations and verification: determining when predictions are reliable, when experimental validation is necessary, and identifying common pitfalls
Workshop: Designing an experimental validation workflow for a predicted structure and mapping AI-derived outputs to wet-lab assays

Consolidating key concepts: architecture, interpretation, and practical deployment strategies
Capstone: Participants select a protein of interest, execute or retrieve a prediction, interpret confidence metrics, and outline a research application plan
Open Q&A session, troubleshooting common errors, and distribution of resources
Next steps: introduction to AlphaFold3 integration, RoseTTAFold, trRosetta, and other ongoing community tools

Requirements

Demonstrated knowledge of protein architecture and conformational dynamics.
Foundational proficiency in molecular biology principles, including amino acid sequencing, folding mechanisms, and data standards such as PDB and mmCIF formats.
Capability to operate within browser-based computational environments and execute code cells effectively.

Biologists, molecular researchers, and specialists in structural biology.
Experimental scientists requiring computational structure predictions to inform laboratory protocols.
Life science professionals leveraging artificial intelligence-driven modeling to support hypothesis formulation and experimental strategy.