高通量实验技术继续改变当前系统生物学的研究。可以理解,研究人员渴望利用这些新技术的力量。然而,在这些平台上的蛋白质-蛋白质相互作用提出了许多生产和生物信息学挑战。在蛋白质-蛋白质相互作用位点的预测中,诸如特征提取,特征表示,预测算法和结果分析之类的问题变得越来越成问题。开发强大,有效的基于蛋白质一级序列或/和3D结构推断蛋白质界面残基的预测方法,对于研究界加快研究和出版工作至关重要。当前,基于机器学习的方法在预测蛋白质相互作用位点方面引起了最大的关注。这篇综述旨在描述当机器学习策略被用于推断蛋白质相互作用位点时整个流水线的状态。
2022-02-25 19:02:28 567KB Bioinformatics; machine learning; protein
1
最新的讲授将Python用于生物信息编程的书籍,希望大家喜欢。目录如下: Conventions 4 1.2.2 Python Versions 5 1.2.3 Code Style 5 1.2.4 Get the Most from This Book without Reading It All 6 1.2.5 Online Resources Related to This Book 7 1.3 WHY LEARN TO PROGRAM? 7 1.4 BASIC PROGRAMMING CONCEPTS 8 1.4.1 What Is a Program? 8 1.5 WHY PYTHON? 10 1.5.1 Main Features of Python 10 1.5.2 Comparing Python with Other Languages 11 1.5.3 How Is It Used? 14 1.5.4 Who Uses Python? 15 1.5.5 Flavors of Python 15 1.5.6 Special Python Distributions 16 1.6 ADDITIONAL RESOURCES 17 Chapter 2 First Steps with Python 19 2.1 INSTALLING PYTHON 20 2.1.1 Learn Python by Using It 20 2.1.2 Install Python Locally 20 2.1.3 Using Python Online 21 2.1.4 Testing Python 22 2.1.5 First Use 22 2.2 INTERACTIVE MODE 23 2.2.1 Baby Steps 23 2.2.2 Basic Input and Output 23 2.2.3 More on the Interactive Mode 24 2.2.4 Mathematical Operations 26 2.2.5 Exit from the Python Shell 27 2.3 BATCH MODE 27 2.3.1 Comments 29 2.3.2 Indentation 30 2.4 CHOOSING AN EDITOR 32 2.4.1 Sublime Text 32 2.4.2 Atom 33 2.4.3 PyCharm 34 2.4.4 Spyder IDE 35 2.4.5 Final Words about Editors 36 2.5 OTHER TOOLS 36 2.6 ADDITIONAL RESOURCES 37 2.7 SELF-EVALUATION 37 Chapter 3 Basic Programming: Data Types 39 3.1 STRINGS 40 3.1.1 Strings Are Sequences of Unicode Characters 41 3.1.2 String Manipulation 42 3.1.3 Methods Associated with Strings 42 3.2 LISTS 44 3.2.1 Accessing List Elements 45 3.2.2 List with Multiple Repeated Items 45 3.2.3 List Comprehension 46 3.2.4 Modifying Lists 47 3.2.5 Copying a List 49 3.3 TUPLES 49 3.3.1 Tuples Are Immutable Lists 49 3.4 COMMON PROPERTIES OF THE SEQUENCES 51 3.5 DICTIONARIES 54 3.5.1 Mapping: Calling Each Value by a Name 54 3.5.2 Operating with Dictionaries 56 3.6 SETS 59 3.6.1 Unordered Collection of Objects 59 3.6.2 Set Operations 60 3.6.3 Shared Operations with Other Data Types 62 3.6.4 Immutable Set: Frozenset 63 3.7 NAMING OBJECTS 63 3.8 ASSIGNING A VALUE TO A VARIABLE VERSUS BINDING A NAME TO AN OBJECT 64 3.9 ADDITIONAL RESOURCES 67 3.10 SELF-EVALUATION 68 Chapter 4 Programming: Flow Control 69 4.1 IF-ELSE 69 4.1.1 Pass Statement 74 4.2 FOR LOOP 75 4.3 WHILE LOOP 77 4.4 BREAK: BREAKING THE LOOP 78 4.5 WRAPPING IT UP 80 4.5.1 Estimate the Net Charge of a Protein 80 4.5.2 Search for a Low-Degeneration Zone 81 4.6 ADDITIONAL RESOURCES 83 4.7 SELF-EVALUATION 83 Chapter 5 Handling Files 85 5.1 READING FILES 86 5.1.1 Example of File Handling 87 5.2 WRITING FILES 89 5.2.1 File Reading and Writing Examples 90 5.3 CSV FILES 90 5.4 PICKLE: STORING AND RETRIEVING THE CONTENTS OF VARI- ABLES 94 5.5 JSON FILES 96 5.6 FILE HANDLING: OS, OS.PATH, SHUTIL, AND PATH.PY MODULE 98 5.6.1 path.py Module 100 5.6.2 Consolidate Multiple DNA Sequences into One FASTA File 102 5.7 ADDITIONAL RESOURCES 102 5.8 SELF-EVALUATION 103 Chapter 6 Code Modularizing 105 6.1 INTRODUCTION TO CODE MODULARIZING 105 6.2 FUNCTIONS 106 6.2.1 Standard Way to Make Python Code Modular 106 6.2.2 Function Parameter Options 110 6.2.3 Generators 113 6.3 MODULES AND PACKAGES 114 6.3.1 Using Modules 115 6.3.2 Packages 116 6.3.3 Installing Third-Party Modules 117 6.3.4 Virtualenv: Isolated Python Environments 119 6.3.5 Conda: Anaconda Virtual Environment 121 6.3.6 Creating Modules 124 6.3.7 Testing Modules 125 6.4 ADDITIONAL RESOURCES 127 6.5 SELF-EVALUATION 128 Chapter 7 Error Handling 129 7.1 INTRODUCTION TO ERROR HANDLING 129 7.1.1 Try and Except 131 7.1.2 Exception Types 134 7.1.3 Triggering Exceptions 135 7.2 CREATING CUSTOMIZED EXCEPTIONS 136 7.3 ADDITIONAL RESOURCES 137 7.4 SELF-EVALUATION 138 Chapter 8 Introduction to Object Orienting Programming (OOP) 139 8.1 OBJECT PARADIGM AND PYTHON 139 8.2 EXPLORING THE JARGON 140 8.3 CREATING CLASSES 142 8.4 INHERITANCE 145 8.5 SPECIAL METHODS 149 8.5.1 Create a New Data Type Using a Built-in Data Type 154 8.6 MAKING OUR CODE PRIVATE 154 8.7 ADDITIONAL RESOURCES 155 8.8 SELF-EVALUATION 156 Chapter 9 Introduction to Biopython 157 9.1 WHAT IS BIOPYTHON? 158 9.1.1 Project Organization 158 9.2 INSTALLING BIOPYTHON 159 9.3 BIOPYTHON COMPONENTS 162 9.3.1 Alphabet 162 9.3.2 Seq 163 9.3.3 MutableSeq 165 9.3.4 SeqRecord 166 9.3.5 Align 167 9.3.6 AlignIO 169 9.3.7 ClustalW 171 9.3.8 SeqIO 173 9.3.9 AlignIO 176 9.3.10 BLAST 177 9.3.11 Biological Related Data 187 9.3.12 Entrez 190 9.3.13 PDB 194 9.3.14 PROSITE 196 9.3.15 Restriction 197 9.3.16 SeqUtils 200 9.3.17 Sequencing 202 9.3.18 SwissProt 205 9.4 CONCLUSION 207 9.5 ADDITIONAL RESOURCES 207 9.6 SELF-EVALUATION 209 Section II Advanced Topics Chapter 10 Web Applications 213 10.1 INTRODUCTION TO PYTHON ON THE WEB 213 10.2 CGI IN PYTHON 214 10.2.1 Configuring a Web Server for CGI 215 10.2.2 Testing the Server with Our Script 215 10.2.3 Web Program to Calculate the Net Charge of a Protein (CGI version) 219 10.3 WSGI 221 10.3.1 Bottle: A Python Web Framework for WSGI 222 10.3.2 Installing Bottle 223 10.3.3 Minimal Bottle Application 223 10.3.4 Bottle Components 224 10.3.5 Web Program to Calculate the Net Charge of a Protein (Bottle Version) 229 10.3.6 Installing a WSGI Program in Apache 232 10.4 ALTERNATIVE OPTIONS FOR MAKING PYTHON-BASED DYNAMIC WEB SITES 232 10.5 SOME WORDS ABOUT SCRIPT SECURITY 232 10.6 WHERE TO HOST PYTHON PROGRAMS 234 10.7 ADDITIONAL RESOURCES 235 10.8 SELF-EVALUATION 236 Chapter 11 XML 237 11.1 INTRODUCTION TO XML 237 11.2 STRUCTURE OF AN XML DOCUMENT 241 11.3 METHODS TO ACCESS DATA INSIDE AN XML DOCUMENT 246 11.3.1 SAX: cElementTree Iterparse 246 11.4 SUMMARY 251 11.5 ADDITIONAL RESOURCES 252 11.6 SELF-EVALUATION 252 Chapter 12 Python and Databases 255 12.1 INTRODUCTION TO DATABASES 256 12.1.1 Database Management: RDBMS 257 12.1.2 Components of a Relational Database 258 12.1.3 Database Data Types 260 12.2 CONNECTING TO A DATABASE 261 12.3 CREATING A MYSQL DATABASE 262 12.3.1 Creating Tables 263 12.3.2 Loading a Table 264 12.4 PLANNING AHEAD 266 12.4.1 PythonU: Sample Database 266 12.5 SELECT: QUERYING A DATABASE 269 12.5.1 Building a Query 271 12.5.2 Updating a Database 273 12.5.3 Deleting a Record from a Database 273 12.6 ACCESSING A DATABASE FROM PYTHON 274 12.6.1 PyMySQL Module 274 12.6.2 Establishing the Connection 274 12.6.3 Executing the Query from Python 275 12.7 SQLITE 276 12.8 NOSQL DATABASES: MONGODB 278 12.8.1 Using MongoDB with PyMongo 278 12.9 ADDITIONAL RESOURCES 282 12.10 SELF-EVALUATION 284 Chapter 13 Regular Expressions 285 13.1 INTRODUCTION TO REGULAR EXPRESSIONS (REGEX) 285 13.1.1 REGEX Syntax 286 13.2 THE RE MODULE 287 13.2.1 Compiling a Pattern 290 13.2.2 REGEX Examples 292 13.2.3 Pattern Replace 294 13.3 REGEX IN BIOINFORMATICS 294 13.3.1 Cleaning Up a Sequence 296 13.4 ADDITIONAL RESOURCES 297 13.5 SELF-EVALUATION 298 Chapter 14 Graphics in Python 299 14.1 INTRODUCTION TO BOKEH 299 14.2 INSTALLING BOKEH 299 14.3 USING BOKEH 301 14.3.1 A Simple X-Y Plot 303 14.3.2 Two Data Series Plot 304 14.3.3 A Scatter Plot 306 14.3.4 A Heatmap 308 14.3.5 A Chord Diagram 309 Section III Python Recipes with Commented Source Code Chapter 15 Sequence Manipulation in Batch 315 15.1 PROBLEM DESCRIPTION 315 15.2 PROBLEM ONE: CREATE A FASTA FILE WITH RANDOM SE- QUENCES 315 15.2.1 Commented Source Code 315 15.3 PROBLEM TWO: FILTER NOT EMPTY SEQUENCES FROM A FASTA FILE 316 15.3.1 Commented Source Code 317 15.4 PROBLEM THREE: MODIFY EVERY RECORD OF A FASTA FILE 319 15.4.1 Commented Source Code 320 Chapter 16 Web Application for Filtering Vector Contamination 321 16.1 PROBLEM DESCRIPTION 321 16.1.1 Commented Source Code 322 16.2 ADDITIONAL RESOURCES 326 Chapter 17 Searching for PCR Primers Using Primer3 329 17.1 PROBLEM DESCRIPTION 329 17.2 PRIMER DESIGN FLANKING A VARIABLE LENGTH REGION 330 17.2.1 Commented Source Code 331 17.3 PRIMER DESIGN FLANKING A VARIABLE LENGTH REGION, WITH BIOPYTHON 332 17.4 ADDITIONAL RESOURCES 333 Chapter 18 Calculating Melting Temperature from a Set of Primers 335 18.1 PROBLEM DESCRIPTION 335 18.1.1 Commented Source Code 336 18.2 ADDITIONAL RESOURCES 336 Chapter 19 Filtering Out Specific Fields from a GenBank File 339 19.1 EXTRACTING SELECTED PROTEIN SEQUENCES 339 19.1.1 Commented Source Code 339 19.2 EXTRACTING THE UPSTREAM REGION OF SELECTED PRO- TEINS 340 19.2.1 Commented Source Code 340 19.3 ADDITIONAL RESOURCES 341 Chapter 20 Inferring Splicing Sites 343 20.1 PROBLEM DESCRIPTION 343 20.1.1 Infer Splicing Sites with Commented Source Code 345 20.1.2 Sample Run of Estimate Intron Program 347 Chapter 21 Web Server for Multiple Alignment 349 21.1 PROBLEM DESCRIPTION 349 21.1.1 Web Interface: Front-End. HTML Code 349 21.1.2 Web Interface: Server-Side Script. Commented Source Code 351 21.2 ADDITIONAL RESOURCES 353 Chapter 22 Drawing Marker Positions Using Data Stored in a Database 355 22.1 PROBLEM DESCRIPTION 355 22.1.1 Preliminary Work on the Data 355 22.1.2 MongoDB Version with Commented Source Code 358 Section IV Appendices
2022-02-16 13:33:07 3.1MB Python
1
RNA-Seq数据中circRNA的定量,差异表达分析和miRNA目标预测分析的工作流程。 介绍 nf-core / circrna是一种生物信息学流水线,用于定量,miRNA靶标预测和RNA测序数据中存在的circRNA的差异表达分析(当前支持总RNA-Seq配对末端测序数据,已映射至智人Gencode参考基因组GRCh37, GRCh38 v34)。 pipleline已以模块化方式开发,除了circRNA定量外,还允许用户选择miRNA靶标预测,差异表达分析(或两者),以促进围绕circRNA参与竞争内源RNA网络的假设。 该管道是使用构建的, 是一种工作流工具,可以以非常便携的方式跨多个计算基础架构运行任务。 它带有docker容器,使安装变得简单,结果可高度重现。 管道摘要 默认情况下, nf-core/circrna使用所有3个分析模块: circrna_discovery
2022-01-11 15:32:11 990KB workflow bioinformatics rna-seq pipeline
1
开重 作者: 彼得·门泽尔(Peter Menzel) 安德斯·克罗(Anders Krogh) Kaiju是一个程序,用于从宏基因组DNA的全基因组测序中对高通量测序读段(例如Illumina或Roche / 454)进行分类分类。 使用NCBI分类法和来自微生物和病毒基因组的蛋白质序列参考数据库,将读物直接分配给分类单元。 该程序在描述 (开放访问)。 Kaiju可以在本地安装(请参阅下文),也可以通过。 有关所有版本的请参见的发行说明。 执照 版权所有(c)2015-2021 Peter Menzel和Anders Krogh Kaiju是免费软件:您可以根据自由软件基金会发布的GNU通用公共许可的条款(许可的版本3)或(根据您的选择)任何更高版本来重新分发和/或修改它。 发行Kaiju的目的是希望它会有用,但不作任何担保; 甚至没有对适销性或特定用途适用性的暗示
1
TOBIAS-通过研究ATAC-seq信号预测转录因子的占有率 介绍 ATAC-seq(使用高通量测序进行转座酶可及性染色质测定)是用于研究全基因组染色质可及性的测序测定法。 该测定法应用Tn5转座酶将测序接头插入可利用的染色质中,从而能够绘制出基因组中调控区域的图谱。 另外,Tn5插入的局部分布还包含有关转录因子结合的信息,这是由于在被蛋白质结合的位点周围的插入处可见插入而导致的,这被称为“足迹” 。 TOBIAS是用于对ATAC-seq数据执行足迹分析的命令行生物信息学工具的集合,包括: Tn5插入偏差的校正 计算监管区域内的足迹得分 估计结合/未结合的转录因子结合位点 可视化不同条件之内和之间的足迹 有关每种工具的信息,请参见 。 安装 TOBIAS是作为python软件包编写的,可以通过pip快速安装: $ pip install tobias TOBIAS还可以通过Bio
2021-12-20 13:22:57 4.57MB bioinformatics atac-seq footprinting Python
1
Python Programming for Biology Bioinformatics and Beyond
2021-12-04 14:05:50 26.93MB Python
1
Advanced AI Techniques and Applications in Bioinformatics
2021-12-04 13:13:36 10.12MB AI
1
bioinformatics is useful for computation and algorithms, as well as programming
2021-11-12 04:38:12 8.55MB bioinformatics
1
Jin Xiong的生物信息学教程,内容全面,语言也比较易懂。
2021-11-12 04:24:55 14.04MB 生物学信息学
1
生物信息学 受启发的代码以及。 注意:函数通常使用基于零的索引; 使用1。 我已经用Python 3编写了这段代码,通常是每个模块编写时的最新版本-当时是Python 3.8.5(tags / v3.8.5:580fbb0,2020年7月20日,15:57:54)此文件已更新。 只要有可能使我提高工作效率,我就会倾向于使用新添加的功能。 自1968年以来,我一直在编写代码,在这段时间里计算机的速度越来越快,但我的大脑却没有。 如果受许可条款的约束,欢迎您使用此代码(如果对您有用),但如果使用早期版本的Python,则可能需要对其进行修改。 生物信息学算法教科书跟踪 1. DNA复制在基因组中的何处开始? # 地点 描述 BA1A rosalind.py BA1B rosalind.py BA1C rosalind.py BA1D rosalind.py BA1E rosa
2021-11-06 13:03:48 6.71MB Python
1