打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
s05 - Finding a Motif in DNA

Problem

Given two strings s and t, t is a substring of s if t is contained as a contiguous collection of symbols in s (as a result, t must be no longer than s).

The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of ‘U’ in “AUGCUUCAGAAAGGUCUUACG” are 2, 5, 6, 15, 17, and 18). The symbol at position i of s is denoted by s[i].

A substring of s can be represented as s[j:k], where j and k represent the starting and ending positions of the substring in s; for example, if s = “AUGCUUCAGAAAGGUCUUACG”, then s[2:5] = “UGCU”.

The location of a substring s[j:k] is its beginning position j; note that t will have multiple locations in s if it occurs more than once as a substring of s (see the Sample below).

Given: Two DNA strings s and t (each of length at most 1 kbp).

Return: All locations of t as a substring of s.

Sample Dataset

GATATATGCATATACTTATAT

Sample Output

2 4 10

Solution

这个问题,俗称大海捞针,这里不管是C语言版本还是python,我都用了sliding window,这个效率不高,但目前这问题的复杂度还不至于要上高效的字符串搜索算法。

C语言版本

#include
#include
#include

#define SZ 1000

int * get_substring_index(char *t, char *s);
int is_string_equal(char *t, char *s, int string_len);

int main() {  FILE *INFILE;  INFILE = fopen('DATA/rosalind_subs.txt', 'r');
 char s[SZ], t[SZ];
 fscanf(INFILE, '%s\n%s', s, t);  
 int *pos;  pos = get_substring_index(t, s);  
 int i=0;  
 while(pos[i]) {  
   printf('%d\t', pos[i]);    i++;  }  
 printf('\n');  
 return 0;}

int
* get_substring_index(char *t, char *s) {  
 // t is a substring of s  int slen = strlen(s);  
 int tlen = strlen(t);  
 if (slen <>
   return NULL;
 int *position = NULL;  position = (int *) malloc(slen * sizeof(int));
 int i;  int j = 0;  
 for (i=0; i<=slen-tlen; i++)="" {="">
   if ( is_string_equal(t, s+i, tlen) ) {      position[j++] = i + 1;    }  }  position[j] = '\0';  
 return position;}

int
is_string_equal(char *t, char *s, int string_len) {
 if (strlen(s) < string_len="" ||="">strlen(t) < string_len)="">
     return 0;  }
 int res = strncmp(t, s, string_len);
   if (res != 0)    
      return 0;  
   return 1;}

python版本

#!/usr/bin/env python3f=open('DATA/rosalind_subs.txt', 'r')seq =  f.readlines()s = seq[0].strip()t = seq[1].strip()p = []for i in range(len(s) - len(t)):    if s[i:i+len(t)] == t:        p.append(i+1)for x in p:    print(x, end=' ')print()
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
部分字符串的实现函数
小白VC++字符串截取总结
KMP algorithm string
C++:实现split分割字符串
嵌入式软件工程师面试题
验证概率
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服