Title:
Sequences of hepatitis C virus genotypes and their use as therapeutic and diagnostic agents
Document Type and Number:
Kind Code:
A1

Abstract:
The present invention relates to a polynucleic acid composition comprising or consisting of at least one polynucleic acid containing 8 or more contiguous nucleotides corresponding to a nucleotide sequence from the region spanning positions 417 to 957 of the Core/E1 region of HCV type 3; and/or the region spanning positions 4664 to 4730 of the NS3 region of HCV type 3; and/or the region spanning positions 4892 to 5292 of the NS3/4 region of HCV type 3; and/or the region spanning positions 8 023 to 8 235 of the NS5 region of the BR36 subgroup of HCV type 3a and/or the coding region of HCV type 4a starting at nucleotide 379 in the core region; and/or the coding region of HCV type 4; and/or the coding region of HCV type 5, with said nucleotide numbering being with respect to the numbering of HCV nucleic acids as shown in Table 1, and with said polynucleic acids containing at least one nucleotide difference with known HCV type 1, and/or HCV type 2 genomes in the above-indicated regions, or the complement thereof.
Inventors:
Maertens, Geert (Brugge, BE)
Stuyver, Lieven (Lede, BE)
      Plaque It!

Sponsored by:
Flash of Genius
Application Number:
09/899046
Publication Date:
01/09/2003
Filing Date:
07/06/2001
View Patent Images:
Images are available in PDF form when logged in. To view PDFs, Login  or  Create Account (Free!)
Assignee:
N.V. Innogenetics S.A.
Primary Class:
Other Classes:
424/189.100, 514/44, 536/23.720
International Classes:
(IPC1-7): C12Q001/70; A61K048/00; C07H021/04; A61K039/29
Attorney, Agent or Firm:
NIXON & VANDERHYE P.C. (8th Floor, Arlington, VA, 22201-4714, US)
Claims:
1. A composition comprising or consisting of at least one polynucleic acid containing 8 or more contiguous nucleotides selected from at least one of the following HCV sequences: an HCV type 3 genomic sequence, more particularly in any of the following regions: the region spanning positions 417 to 957 of the Core/E1 region of HCV subtype 3a, the region spanning positions 4664 to 4730 of the NS3 region of HCV type 3, the region spanning positions 4892 to 5292 of the NS3/4 region of HCV type 3, the region spanning positions 8023 to 8235 of the NS5 region of HCV subtype 3a, an HCV subtype 3c genomic sequence, an HCV subtype 2d genomic sequence, an HCV type 4 genomic sequence, the coding region of HCV subtype 5a, with said nucleotide numbering being with respect to the numbering of HCV nucleic acids as shown in Table 1, and with said polynucleic acids containing at least one nucleotide difference with known HCV polynucleic acid sequences in the above-indicated regions, or the complement thereof.

2. A composition according to claim 1, wherein said polynucleic acids correspond to a nucleotide sequence selected from any of the following HCV genomic sequences: an HCV genomic sequence as having a homology of at least 67%, preferably more than 69%, most preferably 71% or more to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 in the region spanning positions 417 to 957 of the Core/E1 region; an HCV genomic sequence as having a homology of at least 65%, preferably more than 67%, most preferably 69% or more to any of the sequences as represented in SEQ ID NO 19, 21, 23, 25 or 27 in the region spanning positions 574 to 957 of the E1 region; an HCV genomic sequence, having a homology of at least 79%, more preferably at least 81%, most preferably more than 83% or more to any of the sequences as represented in SEQ ID NO 147 in the region spanning positions 1 to 378 of the Core region, an HCV genomic sequence having a homology of at least 74%, more preferably at least 76%, most preferably more than 78% or more to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 in the region spanning positions 417 to 957 in the Core/E1 region; an HCV genomic sequence having a homology of at least 74%, preferably more than 76%, most preferably 78% or more to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 in the region spanning positions 574 to 957 in the E1 region, an HCV genomic sequence having a homology of more than 73.5%, preferably more than 74%, most preferably 75% homology to any of the sequence as represented in SEQ ID NO 29 in the region spanning positions 4664 to 4730 of the NS3 region; an HCV genomic sequence having a homology of more than 70%, preferably more than 72%, most preferably more than 74% homology to any of the sequences as represented in SEQ ID NO 29, 31, 33, 35, 37 or 39 in the region spanning positions 4892 to 5292 in the NS3/NS4 region; an HCV genomic sequence having a homology of more than 95%, preferably 95,5%, most preferably 96% homology to any of the sequences as represented in SEQ ID NO 5, 7, 1, 3, 9 or 11 in the region spanning positions 8023 to 8235 of the NS5 region; an HCV genomic sequence of the BR36 subgroup of HCV type 3a having a homology of more than 96%, preferably 96.5%, most preferably 97% homology to any of the sequences as represented in SEQ ID NO 5, 7, 1, 3, 9 or 11 in the region spanning positions 8023 to 8192 of the NS5B region; an HCV genomic sequence having a homology of more than 79%, more preferably more than 81%, and most preferably more than 83% to the sequence as represented in SEQ ID NO 149 in the region spanning positions 7932 to 8271 in the NS5B region.

3. A composition according to claim 1, wherein said polynucleic acids correspond to a nucleotide sequence selected from any of the following HCV genomic sequences: an HCV genomic sequence having a homology of more than 85%, preferably more than 86%, most preferably more than 87% homology to any of the sequences as represented in SEQ ID NO 41, 43, 45, 47, 49, 51, 53 or 151 in the region spanning positions 1 to 573 of the Core region; an HCV genomic sequence having a homology of more than 61%, preferably more than 63%, most preferably more than 65% homology to any of the sequences as represented in SEQ ID NO 41, 43, 45, 47, 49, 51, 53, 153 or 155 in the region spanning positions 574 to 957 of the E1 region; an HCV genomic sequence having a homology of more than 76.5%, preferably of more than 77%, most preferably of more than 78% homology with any of the sequences as represented in SEQ ID NO 55, 57, 197 or 199 in the region spanning positions 3856 to 4209 of the NS3 region; an HCV genomic sequence having a homology of more than 68%, preferably of more than 70%, most preferably of more than 72% homology with the sequence as represented in SEQ ID NO 157 in the region spanning positions 980 to 1179 of the E1/E2 region; an HCV genomic sequence having a homology of more than 57%, preferably more than 59%, most preferably more than 61% homology to any of the sequences as represented in SEQ ID NO 59 or 61 in the region spanning positions 4936 to 5296 of the NS4 region; an HCV genomic sequence having a homology of more than 93%, preferably more than 93.5%, most preferably more than 94% homology to any of the sequences as represented in SEQ ID NO 159 or 161 in the region spanning positions 7932 to 8271 of the NS5B region.

4. A composition according to claim 1, wherein said polynucleic acids correspond to a nucleotide sequence selected from any of the following HCV genomic sequences: an HCV genomic sequence having a homology of more than 66%, preferably more than 68%, most preferably more than 70% homology in the E1 region spanning positions 574 to 957 to any of the sequences as represented in SEQ ID NO 118, 120 or 122 in the region spanning positions 1 to 957 of the Core/E1 region; an HCV genomic sequence having a homology of more than 71%, preferably more than 72%, most preferably more than 74% homology to any of the sequences as represented in SEQ ID NO 118, 120 or 122 in the region spanning positions 379 to 957; an HCV genomic sequence having a homology of more than 85%, preferably more than 86%, most preferably more than 86.5% homology to any of the sequences as represented in SEQ ID NO 183, 185 or 187 in the region spanning positions 379 to 957 of the E1 region; an HCV genomic sequence having a homology of more than 81%, preferably more than 83%, most preferably more than 85% homology to the sequence as represented in SEQ ID NO 189 in the region spanning positions 379 to 957 of the E1 region; an HCV genomic sequence having a homology of more than 85%, preferably more than 87%, most preferably more than 89% homology to any of the sequences as represented in SEQ ID NO 167 or 169 in the region spanning positions 379 to 957 of the E1 region; an HCV genomic sequence having a homology of more than 79%, preferably more than 81%, most preferably more than 83% homology to any of the sequences as represented in SEQ ID NO 171 or 173 in the region spanning positions 379 to 957 of the E1 region; an HCV genomic sequence having a homology of more than 84%, preferably more than 86%, most preferably more than 88% homology to the sequence as represented in SEQ ID NO 175 in the region spanning positions 379 to 957 of the E1 region; an HCV genomic sequence having a homology of more than 83%, preferably more than 85%, most preferably more than 87% homology to the sequence as represented in SEQ ID NO 177 in the region spanning positions 379 to 957 of the E1 region, an HCV genomic sequence having a homology of more than 76%, preferably more than 78%, most preferably more than 80% homology to the sequence as represented in SEQ ID NO 179 in the region spanning positions 379 to 957 of the E1 region; an HCV genomic sequence having a homology of more than 84%, preferably more than 86%, most preferably more than 88% homology to the sequence as represented in SEQ ID NO 181 in the region spanning positions 379 to 957 of the E1 region; an HCV genomic sequence having a homology of more than 73%, preferably more than 75%, most preferably more than 77% homology to any of the sequences as represented in SEQ ID NO 106, 108, 110, 112, 114, or 116 in the region spanning positions 7932 to 8271 of the NS5 region; an HCV genomic sequence having a homology of more than 88%, preferably more than 89%, most preferably more than 90% homology to any of the sequences as represented in SEQ ID NO 106, 108, 110, or 112 in the region spanning positions 7932 to 8271 of the NS5 region; an HCV genomic sequence having a homology of more than 88%, preferably more than 89%, most preferably more than 90% homology to any of the sequences as represented in SEQ ID NO 116 or 201 in the region spanning positions 7932 to 8271 of the NS5 region; an HCV genomic sequence having a homology of more than 87%, preferably more than 89%, most preferably more than 90% homology to the sequence as represented in SEQ ID NO 203 in the region spanning positions 7932 to 8271 of the NS5 region; an HCV genomic sequence having a homology of more than 85%, preferably more than 87%, most preferably more than 89% homology to the sequence as represented in SEQ ID NO 114 in the region spanning positions 7932 to 8271 of the NS5 region; an HCV genomic sequence having a homology of more than 86%, preferably more than 87%, most preferably more than 88% homology to the sequence as represented in SEQ ID NO 207 in the region spanning positions 7932 to 8271 of the NS5 region; an HCV genomic sequence having a homology of more than 84%, preferably more than 86%, most preferably more than 88% homology to the sequence as represented in SEQ ID NO 209 in the region spanning positions 7932 to 8271 of the NS5 region, an HCV genomic sequence having a homology of more than 81%, preferably more than 83%, most preferably more than 85% homology to the sequence as represented in SEQ ID NO 211 in the region spanning positions 7932 to 8271 of the NS5 region

5. A composition according to claim 1, wherein said polynucleic acids correspond to a nucleotide sequence selected from any of the following HCV genomic sequences an HCV genomic sequence having a homology of more than 78%, preferably more than 80%, most preferably more than 82% homology to the sequence as represented in SEQ ID NO 143 in the region spanning positions 379 to 957 of the Core/E1 region, an HCV genomic sequence having a homology of more than 74%, preferably more than 76%, most preferably more than 78% homology to the sequence as represented in SEQ ID NO 143 in the region spanning positions 574 to 957; an HCV genomic sequence having a homology of more than 87%, preferably more than 89%, most preferably more than 91% homology to the sequence as represented in SEQ ID NO 145 in the region spanning positions 7932 to 8271 of the NS5B region

6. A composition according to any of claims 1 to 5, wherein said polynucleic acid is liable to act as a primer for amplifying the nucleic acid of a certain isolate belonging to the genotype from which the primer is derived.

7. A composition according to any of claims 1 to 5, wherein said polynucleic acid is able to act as a hybridization probe for specific detection and/or classification into types of a nucleic acid containing said nucleotide sequence, with said oligonucleotide being possibly labelled or attached to a solid substrate.

8. Use of a composition according to any of claims 1 to 7 for in vitro detecting the presence of one or more HCV genotypes, more particularly for detecting the presence of a nucleic acid of any of the HCV genotypes having a nucleotide sequence as defined in any of claims 1 to 5, present in a biological sample liable to contain them, comprising at least the following steps: (i) possibly extracting sample nucleic acid, (ii) possibly amplifying the nucleic acid with at least one of the primers according to claim 6 or any other HCV type 2, HCV type 3, HCV type 4, HCV type 5 or universal HCV primer, (iii) hybridizing the nucleic acids of the biological sample, possibly under denatured conditions, and with said nucleic acids being possibly labelled during or after amplification, at appropriate conditions with one or more probes according to claim 7, with said probes being preferably attached to a solid substrate, (iv) washing at appropriate conditions, (v) detecting the hybrids formed, (vi) inferring the presence of one or more HCV genotypes present from the observed hybridization pattern.

9. A composition consisting of or comprising at least one peptide or polypeptide containing in its sequence a contiguous sequence of at least 5 amino acids of an HCV polyprotein encoded by any of the polynucleic acids according to any of claims 1 to 5.

10. A composition according to claim 9, wherein said contiguous sequence contains in its sequence at least one of the following amino acid residues: L7, Q43, M44, S60, R67, Q70, T71, A79, A87, N106, K115, A127, A190, S130, V134, G142, I144, E152, A157, V158, P165, S177 or Y177, I178, V180 or E180 or F182, R184, I186, H187, T189, A190, S191 or G191, Q192 or L192 or I192 or V192 or E192, N193 or H193 or P193, W194 or Y194, H195, A197 or I197 or V197 or T197, V202, 1203 or L203, Q208, A210, V212, F214, T216, R217 or D217 or E217 or V217, H218 or N218, H219 or V219 or L219, L227 or L227, M231 or E231 or Q23 1, T232 or D232 or A232 or K232, Q235 or I235, A237 or T237, I242, I246, S247, S248, V249, S250 or Y250, I251 or V251 or M251 or F251, D252, T254 or V254, L255 or V255, E256 or A256, M258 or F258 or V258, A260 or Q260 or S260, A261, T264 or Y264, M265, I266 or A266, A267, G268 or T268, F271 or M271 or V271, I277, M280 or H280, I284 or A284 or L84, V274, V291, N292 or S292, R293 or I293 or Y293, Q294 or R294, L297 or I297 or Q297, A299 or K299 or Q299, N303 or T303, T308 or L308, T310 or F310 or A310 or D310 or V310, L313, G317 or Q317, L333, S351, A358, A359, A363, S364, A366, T369, L373, F376, Q386, I387, S392, I399, F402, 1403, R405, D454, A461, A463, T464, K484, Q500, E501, S521, K522, H524, N528, S531, S532, V534, F536, F537, M539, I546, C1282, A1283, H1310, V1312, Q1321, P1368, V1372, V1373, K1405, Q1406, S1409, A1424, A1429, C1435, S1436, S1456, H1496, A1504, D1510, D1529, I1543, N1567, D1556, N1567, M1572, Q1579, L1581, S1583, F1585, V1595, E1606 or T1606, M1611, V1612 or L1612, P1630, C1636, P1651, T1656 or I1656, L1663, V1667, V1677, A1681, H1685, E1687, G1689, V1695, A1700, Q1704, Y1705, A1713, A1714 or S1714, M1718, D1719, A1721 or T1721, R1722, A1723 or V1723, H1726 or G1726, E1730, V1732, F1735, I1736, S1737, R1738, T1739, G1740, Q1741, K1742, Q1743, A1744, T1745, L1746, E1747 or K1747, I1749, A1750, T1751 or A1751, V1753, N1755, K1756, A1757, P1758, A1759, H1762, T1763, Y1764, P2645, A2647, K2650, K2653 or L2653, S2664, N2673, F2680, K2681, L2686, H2692, Q2695 or L2695 or 12695, V2712, F2715, V2719 or Q2719, T2722, T2724, S2725, R2726, G2729, Y2735, H2739, I2748, G2746 or I2746, I2748, P2752 or K2752, P2754 or T2754, T2757 or P2757, with said notation being composed of a letter representing the amino acid residue by its one-letter code, and a number representing the amino acid numbering according to Kato et al., 1990 as shown in Table 1.

11. A composition according to any of claims 9 or 10, wherein said contiguous sequence is selected from any of the following HCV amino acid sequences: a sequence having a homology of more than 72%, preferably more than 74%, and most preferably more than 77% homology to any of the amino acid sequences as represented in SEQ ID NO 14, 16, 18, 20, 22, 24, 26 or 28 in the region spanning positions 140 to 319 in the Core/E1 region; a sequence having a homology of more than 70%, preferably more than 72%, and most preferably more than 75% homology to any of the amino acid sequences as represented in SEQ ID NO 14, 16, 18, 20, 22, 24, 26 or 28 in the E1 region spanning positions 192 to 319; a sequence having a homology of more than 86%, preferably more than 88%, and most preferably more than 90% homology to the amino acid sequences as represented in SEQ ID NO 148 in the region spanning positions 1 to 110 in the Core region; a sequence having a homology of more than 76%, preferably more than 78%, most preferably more than 80% to any of the amino acid sequences as represented in SEQ ID NO 30, 32, 34, 36, 38 or 40 in the region spanning positions 1646 to 1764 in the NS3/NS4 region; a sequence having a homology of more than 81.5%, preferably more than 83%, and most preferably more than 86% homology to any of the amino acid sequences as represented in SEQ ID NO 14, 16, 18, 20, 22, 24, 26 or 28 in the E1 region spanning positions 192 to 319; a sequence having a homology of more than 86%, preferably more than 88%, most preferably more than 90% to the amino acid sequence as represented in SEQ ID NO 150 in the region spanning positions 2645 to 2757 in the NS5B region;

12. A composition according to any of claims 9 or 10, wherein said contiguous sequence is selected from any of the following HCV amino acid sequences: a sequence having a homology of more than 80%, preferably more than 82%, most preferably more than 84% homology to any of the amino acid sequences as represented in SEQ ID NO 118, 120, and 122 in the region spanning positions 127 to 319, a sequence having a homology of more than 73%, preferably more than 75%, most preferably more than 78% homology in the E1 region spanning positions 192 to 319 to any of the amino acid sequences as represented in SEQ ID NO 118, 120, and 122, in the region spanning positions 127 to 319, a sequence having more than 85%, preferably more than 86%, most preferably more than 87% homology to any of the amino acid sequences as represented in SEQ ID NO 118, 120 or 122, in the region spanning positions 192 to 319.

13. A composition according to any of claims 9 or 10, wherein said contiguous sequence is selected from any of the following HCV amino acid sequences: a sequence having more than 93%, preferably more than 94%, most preferably more than 95% homology in the region spanning Core positions 1 to 191 to any of the amino acid sequences as represented in SEQ ID NO 42, 44, 46, 48, 50, 52, 54, or 152; a sequence having more than 73%, preferably more than 74%, most preferably more than 76% homology in the region spanning E1 positions 192 to 319 to any of the amino acid sequences as represented in SEQ ID NO 42, 44, 46, 48, 50, 52, 54, 154 or 156; a sequence spanning positions 1286 to 1403 of the NS3 region, with said sequence being characterized as having more than 90%, preferably more than 91%, most preferably more than 92% homology to any of the amino acid sequences represented in SEQ ID NO 56 to 58; a sequence spanning positions 1646 to 1764 of the NS3/4 region, with said sequence being characterized as having more than 66%, more particularly 68%, most particularly 70% or more homology to any of the amino acid sequences as represented in SEQ ID NO 60 or

14. A composition according to any of claims 9 to 10, wherein said contiguous sequence is selected from any of the following HCV amino acid sequences: a sequence having a more than 83%, preferably more than 85%, most preferably more than 87% homology in the region spanning Core positions 1 to 319 to the amino acid sequence as represented in SEQ ID NO 144; a sequence having a more than 79%, preferably more than 81%, most preferably more than 84% homology in the region spanning E1 positions 192 to 319 to the amino acid sequence as represented in SEQ D NO 144; a sequence having more than 95%, more particularly 96%, most particularly 97% or more homology to the amino acid sequence as represented in SEQ ID NO 146, in the region spanning positions 2645 to 2757 of the NS5B region.

15. A composition according to any of claims 9 to 14, wherein said sequence is selected from the following peptides: 27
QPTGRSWGQ(SEQ ID NO 93)
RSEGRTSWAQ(SEQ ID NO 220)
RTEGRTSWAQ(SEQ ID NO 221)
SRRQPIPRARRTEGRSWAQ(SEQ ID NO 268)
LEWRNTSGLYVL(SEQ ID NO 83)
VNYRNASGIYHI(SEQ ID NO 126)
QHYRNISGIYHV(SEQ ID NO 127)
EHYRNASGIYHI(SEQ ID NO 128)
IHYRNASGIYHI(SEQ ID NO 224)
VPYRNASGIYHV(SEQ ID NO 84)
VNYRNASGIYHI(SEQ ID NO 225)
VNYRNASGVYHI(SEQ ID NO 226)
VNYHNTSGIYHL(SEQ ID NO 227)
QHYRNASGIYHV(SEQ ID NO 228)
QHYRNVSGIYHV(SEQ ID NO 229)
IHYRNASDGYYI(SEQ ID NO 230)
LQVKNTSSSYMV(SEQ ID NO 231)
VYEADDVILHT(SEQ ID NO 85)
VYETEHHILHL(SEQ ID NO 129)
VYEADHHIMHL(SEQ ID NO 130)
VYETDHHILHL(SEQ ID NO 131)
VYEADNLILHA(SEQ ID NO 86)
VWQLRAIVLHV(SEQ ID NO 232)
VYEADYHILHL(SEQ ID NO 233)
VYETDNHILHL(SEQ ID NO 234)
VYETENHILHL(SEQ ID NO 235)
VFETVHHILHL(SEQ ID NO 236)
VFETEHHILHL(SEQ ID NO 237)
VFETDHHIMHL(SEQ ID NO 238)
VYETENHILHL(SEQ ID NO 239)
VYEADALILHA(SEQ ID NO 240)
VQDGNTSTCWTPV(SEQ ID NO 87)
VQDGNTSACWTPV(SEQ ID NO 241)
VRVGNQSRCWVAL(SEQ ID NO 132)
VRTGNTSRCWVPL(SEQ ID NO 133)
VRAGNVSRCWTPV(SEQ ID NO 134)
EEKGNISRCWIPV(SEQ ID NO 242)
VKTGNQSRCWVAL(SEQ ID NO 243)
VRTGNQSRCWVAL(SEQ ID NO 244)
VKTGNQSRCWIAL(SEQ ID NO 245)
VKTGNVSRCWIPL(SEQ ID NO 247)
VKTGNVSRCWISL(SEQ ID NO 248)
VRKDNVSRCWVQI(SEQ ID NO 249)
VRYVGATTAS(SEQ ID NO 89)
APYIGAPLES(SEQ ID NO 135)
APYVGAPLES(SEQ ID NO 136)
AVSMDAPLES(SEQ ID NO 137)
APSLGAVTAP(SEQ ID NO 90)
APSFGAVTAP(SEQ ID NO 250)
VSQPGALTKG(SEQ ID NO 251)
VKYVGATTAS(SEQ ID NO 252)
APYIGAPVES(SEQ ID NO 253)
AQHLNAPLES(SEQ ID NO 254)
SPYVGAPLEP(SEQ ID NO 255)
SPYAGAPLEP(SEQ ID NO 256)
APYLGAPLEP(SEQ ID NO 257)
APYLGAPLES(SEQ ID NO 258)
APYVGAPLES(SEQ ID NO 259)
VPYLGAPLTS(SEQ ID NO 260)
APHLRAPLSS(SEQ ID NO 261)
APYLGAPLTS(SEQ ID NO 262)
RPRRHQTVQT(SEQ ID NO 91)
QPRRHWTTQD(SEQ ID NO 138)
RPRRHWTTQD(SEQ ID NO 139)
RPRQHATVQN(SEQ ID NO 92)
RPRQHATVQD(SEQ ID NO 263)
SPQHHKFVQD(SEQ ID NO 264)
RPRRLWTTQE(SEQ ID NO 265)
PPRIHETTQD(SEQ ID NO 266)
TISYANGSGPSDDK(SEQ ID NO 267)


16. Recombinant vector, particularly for cloning and/or expression, with said recombinant vector comprising a vector sequence, an appropriate prokaryotic, eukaryotic or viral promoter sequence followed by the nucleotide sequences as defined in claims 1 to 5, with said recombinant vector allowing the expression of any one of the HCV type 2 and/or HCV type 3 and/or type 4 and/or type 5 derived polypeptides according to any of claims 9 to 15 in a prokaryotic, or eukaryotic host, or in living mammals when injected as naked DNA, and more particularly a recombinant vector allowing the expression of any of the following HCV type 2, HCV type 3, type 4 or type 5 polypeptides spanning the following amino acid positions: a polypeptide starting at position 1 and ending at any position in the region between positions 70 and 326, more particularly a polypeptide spanning positions 1 to 70, 1 to 85, positions 1 to 120, positions 1 to 150, positions 1 to 191, positions 1 to 200, for expression of the Core protein, and positions 1 to 263, positions 1 to 326, for expression of the Core and E1 protein; a polypeptide starting at any position in the region between positions 117 and 192, and ending at any position in the region between positions 263 and 326, more particularly from positions 119 to 326, for expression of E1, or forms that have the putative membrane anchor deleted (positions 264 to 293 plus or minus 8 amino acids); a polypeptide starting at any position in the region between positions 1556 and 1688, and ending at any position in the region between positions 1739 and 1764, for expression of the NS4 regions, more particularly a polypeptide starting at position 1658 and ending at position 1711 for expression of the NS4a antigen, and more particularly, a polypeptide starting at position 1712 and ending between positions 1743 and 1972, for example 1712-1743, 1712-1764, 1712-1782, 1712-1972, 1712 to 1782 and 1902 to 1972 for expression of the NS4b protein or parts thereof.

17. A composition according to any of claims 9 to 15, wherein said polypeptide is a recombinant polypeptide expressed by means of an expression vector as defined in claim 16.

18. A composition according to any of claims 9 to 15 or 16, for use in a method for immunizing a mammal, preferably humans, against HCV comprising administratering a sufficient amount of the composition possibly accompanied by pharmaceutically acceptable adjuvants, to produce an immune response, more particularly a vaccine composition including HCV type 3 polypeptides derived from the E1, Core, or NS4 region and/or type 4 and/or type 5 and/or type 2 polypeptides.

19. Antibody raised upon immunization with a composition according to any of claims 9 to 15, 17 or 18, by means of a process according to claim 18, with said antibody being reactive with any of the polypeptides as defined in any of claims 9 to 15, 17 or 18.

20. Process for detecting in vitro HCV present in biological sample liable to contain it, comprising at least the following steps: (i) contacting the biological sample to be analyzed for the presence of HCV antibodies with any of the compositions according to claims 9 to 15, 17 or 18, preferentially in an immobilized form under appropriate conditions which allow the formation of an immune complex, wherein said polypeptide is preferentially in the form of a biotinylated polypeptide and is covalently bound to a solid substrate by means of streptavidin or avidin complexes, (ii) removing unbound components, (iii) incubating the immunecomplexes formed with heterologous antibodies, which specifically bind to the antibodies present in the sample to be analyzed, with said heterologous antibodies having conjugated to a detectable label under appropriate conditions, (iv) detecting the presence of said immunecomplexes visually or by means of densitometry and inferring the HCV serotype(s) present from the observed hybridization pattern.

21. Use of a composition according to any of claims 9 to 15, 17 or 18, for incorporation into a serotyping assay for detecting one or more serological types of HCV present in a biological sample liable to contain it, more particularly for detecting E1 and NS4 antigens or antibodies of the different types to be detected combined in one assay format, comprising at least the following steps: (i) contacting the biological sample to be analyzed for the presence of HCV antibodies or antigens of one or more serological types, with at least one of the compositions according to claims 9 to 15, 17 or 18 in an immobilized form under appropriate conditions which allow the formation of an immunecomplex, (wherein said polypeptide is preferentially in the form of a biotinylated polypeptide and is covalently bound to a solid substrate by means of streptavidin or avidin complexes), (ii) removing unbound components, (iii) incubating the immunecomplexes formed with heterologous antibodies, which specifically bind to the antibodies present in the sample to be analyzed, with said heterologous antibodies having conjugated to a detectable label under appropriate conditions, (iv) detecting the presence of said immunecomplexes visually or by means of densitometry and inferring the HCV serological types present from the observed binding pattern.

22. A kit for determining the presence of HCV genotypes as defined in any of claims 1 to 5 present in a biological sample liable to contain them, comprising: possibly at least one primer composition containing any primer selected from those defined in claim 6 or any other HCV type 2 and/or HCV type 3 and/or HCV type 4 and/or HCV type 5, or universal HCV primers, at least one probe composition according to claim 7, preferably in combination with other polypeptides or peptides from HCV type 1, type 2 or other types of HCV, with said probes being preferentially immobilized on a solid substrate, and more preferentially on one and the same membrane strip, a buffer or components necessary for producing the buffer enabling hybridization reaction between these probes and the possibly amplified products to be carried out, a means for detecting the hybrids resulting from the preceding hybriziation, possibly also including an automated scanning and interpretation device for infering the HCV genotype(s) present in the sample from the observed hybridization pattern.

23. A kit for determining the presence of HCV antibodies according to any of claims 9 to 15, 17 or 18 present in a biological sample liable to contain them, comprising: at least one polypeptide composition according to any of claims 9 to 15, 17 or 18, with said polypeptides being preferentially immobilized on a solid substrate, and more preferentially on one and the same membrane strip, a buffer or components necessary for producing the buffer enabling binding reaction between these polypeptides and the antibodies against HCV present in the biological sample, a means for detecting the immune complexes formed in the preceding binding reaction, possibly also including an automated scanning and interpretation device for infering the HCV genotype present in the sample from the observed binding pattern.

Description:
[0001] The invention relates to new sequences of hepatitis C virus (HCV) genotypes and their use as therapeutic and diagnostic agents.

[0002] The present invention relates to new nucleotide and amino acid sequences corresponding to the coding region of a new type 2 subtype 2 d, type-specific sequences corresponding to HCV type 3 a, to new sequences corresponding to the coding region of a new subtype 3 c, and to new sequences corresponding to the coding region of HCV type 4 and type 5 subtype 5 a; a process for preparing them, and their use for diagnosis, prophylaxis and therapy.

[0003] The technical problem underlying the present invention is to provide new type-specific sequences of the Core, the E 1 , the E 2 , the NS 3 , the NS 4 and the NS 5 regions of HCV type 4 and type 5 , as well as of new variants of HCV types 2 and 3 . These new HCV sequences are useful to diagnose the presence of type 2 and/or type 3 and/or type 4 and/or type 5 HCV genotypes in a biological sample. Moreover, the availability of these new type-specific sequences can increase the overall sensitivity of HCV detection and should also prove to be useful for therapeutic purposes.

[0004] Hepatitis C viruses (HCV) have been found to be the major cause of non-A, non-B hepatitis. The sequences of cDNA clones covering the complete genome of several prototype isolates have been determined (Kato et al., 1990; Choo et al., 1991; Okamoto et al., 1991; Okamoto et al., 1992). Comparison of these isolates shows that the variability in nucleotide sequences can be used to distinguish at least 2 different genotypes, type 1 (HCV- 1 and HCV-J) and type 2 (HC-J 6 and HC-J 8 ), with an average homology of about 68%. Within each type, at least two subtypes exist (e.g. represented by HCV- 1 and HCV-J), having an average homology of about 79%. HCV genomes belonging to the same subtype show average homologies of more than 90% (Okamoto et al., 1992). However, the partial nucleotide sequence of the NS 5 region of the HCV-T isolates showed at most 67% homology with the previously published sequences, indicating the existence of a yet another HCV type (Mori et al., 1992). Parts of the 5 ′ untranslated region (UR), core, NS 3 , and NS 5 regions of this type 3 have been published, further establishing the similar evolutionary distances between the 3 major genotypes and their subtypes (Chan et al., 1992).

[0005] The identification of type 3 genotypes in clinical samples can be achieved by means of PCR with type-specific primers for the NS 5 region. However, the degree to which this will be successful is largely dependent on sequence variability and on the virus titer present in the serum. Therefore, routine PCR in the open reading frame, especially for type 3 and the new type 4 and 5 described in the present invention and/or group V (Cha et al., 1992) genotypes can be predicted to be unsuccessful. A new typing system (LiPA), based on variation in the highly conserved 5 ′ UR, proved to be more useful because the 5 major HCV genotypes and their subtypes can be determined (Stuyver et al., 1993). The selection of high-titer isolates enables to obtain PCR fragments for cloning with only 2 primers, while nested PCR requires that 4 primers match the unknown sequences of the new type 3 , 4 and 5 genotypes.

[0006] New sequences of the 5 ′ untranslated region ( 5 ′UR) have been listed by Bukh et al. (1992). For some of these, the E 1 region has recently been described (Bukh et al., 1993). Isolates with similar sequences in the 5 ′UR to a group of isolates including DK 12 and HK 10 described by Bukh et al. (1992) and E-b 1 to E-b 8 described and classified as type 3 by Chan et al. (1991), have been reported and described in the 5 ′UR, the carboxyterminal part of E 1 , and in the NS 5 region as group IV by Cha et al. (1992; WO 92/19743), and have also been described in the 5 ′UR for isolate BR 56 and classified as type 3 by the inventors of this application (Stuyver et al., 1993).

[0007] The aim of the present invention is to provide new HCV nucleotide and amino acid sequences enabling the detection of HCV infection.

[0008] Another aim of the present infection is to provide new nucleotide and amino acid HCV sequences enabling the classification of infected biological fluids into different serological groups unambiguously linked to types and subtypes at the genome level.

[0009] Another aim of the present invention is to provide new nucleotide and amino acid HCV sequences ameliorating the overall HCV detection rate.

[0010] Another aim of the present invention is to provide new HCV sequences, useful for the design of HCV vaccine compositions.

[0011] Another aim of the present invention is to provide a pharmaceutical composition consisting of antibodies raised against the polypeptides encoded by these new HCV sequences, for therapy or diagnosis.

[0012] The present invention relates more particularly to a composition comprising or consisting of at least one polynucleic acid containing at least 5, and preferably 8 or more contiguous nucleotides selected from at least one of the following HCV sequences:

[0013] an HCV type 3 genomic sequence, more particularly in any of the following regions:

[0014] the region spanning positions 417 to 957 of the Core/E 1 region of HCV subtype 3 a,

[0015] the region spanning positions 4664 to 4730 of the NS 3 region of HCV type 3 ,

[0016] the region spanning positions 4892 to 5292 of the NS 3 / 4 region of HCV type 3 ,

[0017] the region spanning positions 8023 to 8235 of the NS 5 region of the BR 36 subgroup of HCV subtype 3 a,

[0018] an HCV subtype 3 c genomic sequence,

[0019] more particularly the coding regions of the above-specified regions;

[0020] an HCV subtype 2 d genomic sequence, more particularly the coding region of HCV subtype 2 d,

[0021] an HCV type 4 genomic sequence, more particularly the coding region, more particularly the coding region of subtypes 4 a, 4 e, 4 f, 4 g, 4 h, 4 i, and 4 j,

[0022] an HCV type 5 genomic sequence, more particularly the coding region of HCV type 5 , more particularly the regions encoding Core, E 1 , E 2 , NS 3 , and NS 4

[0023] with said nucleotide numbering being with respect to the numbering of HCV nucleic acids as shown in Table 1, and with said polynucleic acids containing at least one nucleotide difference with known HCV (type 1 , type 2 , and type 3 ) polynucleic acid sequences in the above-indicated regions, or the complement thereof.

[0024] It is to be noted that the nucleotide difference in the polynucleic acids of the invention may involve or not an amino acid difference in the corresponding amino acid sequences coded by said polynucleic acids.

[0025] According to a preferred embodiment, the present invention relates to a composition comprising or containing at least one polynucleic acid encoding an HCV polyprotein, with said polynucleic acid containing at least 5, preferably at least 8 nucleotides corresponding to at least part of an HCV nucleotide sequence encoding an HCV polyprotein, and with said HCV polyprotein containing in its sequence at least one of the following amino acid residues: L7, Q43, M44, S60, R67, Q70, T71, A79, A87, N106, K115, A127, A190, S130, V134, G142, 1144, E152, A157, V158, P165, S177 or Y177, I178, V180 or E180 or F182, R184, I186, H187, T189, A190, S191 or G191, Q192 or L192 or I192 or V192 or E192, N193 or H193 or P193, W194 or Y194, H195, A197 or I197 or V197 or T197, V202, I203 or L203, Q208, A210, V212, F214, T216, R217 or D217 or E217 or V217, H218 or N218, H219 or V219 or L219, L227 or I227, M231 or E231 or Q231, T232 or D232 or A232 or K232, Q235 or I235, A237 or T237, I242, I246, S247, S248, V249, S250 or Y250, I251 or V251 or M251 or F251, D252, T254 or V254, L255 or V255, E256 or A256, M258 or F258 or V258, A260 or Q260 or S260, A261, T264 or Y264, M265, I266 or A266, A267, G268 or T268, F271 or M271 or V271, I277, M280 or H280, I284 or A284 or L84, V274, V291, N292 or S292, R293 or I293 or Y293, Q294 or R294, L297 or I297 or Q297, A299 or K299 or Q299, N303 or T303, T308 or L308, T310 or F310 or A310 or D310 or V310, L313, G317 or Q317, L333, S351, A358, A359, A363, S364, A366, T369, L373, F376, Q386, I387, S392, I399, F402, I403, R405, D454, A461, A463, T464, K484, Q500, E501, S521, K522, H524, N528, S531, S532, V534, F536, F537, M539, I546, C1282, A1283, H1310, V1312, Q1321, P1368, V1372, V1373, K1405, Q1406, S1409, A1424, A1429, C1435, S1436, S1456, H1496, A1504, D1510, D1529, I1543, N1567, D1556, N1567, M1572. Q1579, L1581, S1583, F1585, V1595, E1606 or T1606, M1611, V1612 or L1612, P1630. C1636, P1651, T1656 or I1656, L1663, V1667, V1677, A1681, H1685, E1687, G1689, V1695, A1700, Q1704, Y1705, A1713, A1714 or S1714, M1718, D1719, A1721 or T1721, R1722, A1723 or V1723, H1726 or G1726, E1730, V1732, F1735, I1736, S1737, R1738, T1739, G1740, Q1741, K1742, Q1743, A1744, T1745, L1746, E1747 or K1747, I1749, A1750, T1751 or A1751, V1753, N1755, K1756, A1757, P1758, A1759, H1762, T1763, Y1764, P2645, A2647, K2650, K2653 or L2653, S2664, N2673, F2680, K2681, L2686, H2692, Q2695 or L2695 or I2695, V2712, F2715, V2719 or Q2719, T2722, T2724, S2725, R2726, G2729, Y2735, H2739, I2748, G2746 or I2746, I2748, P2752 or K2752, P2754 or T2754, T2757 or P2757, with said notation being composed of a letter representing the amino acid residue by its one-letter code, and a number representing the amino acid numbering according to Kato et al., 1990.

[0026] Each of the above-mentioned residues can be found in any of FIGS. 2, 5 , 7 , 11 or 12 showing the new amino acid sequences of the present invention aligned with known sequences of other types or subtypes of HCV for the Core, E 1 , E 2 , NS 3 , NS 4 , and NS 5 regions.

[0027] More particularly, a polynucleic acid contained in the composition according to the present invention contains at least 5, preferably 8, or more contiguous nucleotides corresponding to a sequence of contiguous nucleotides selected from at least one of HCV sequences encoding the following new HCV amino acid sequences:

[0028] new sequences spanning amino acid positions 1 to 319 of the Core/E 1 region of HCV subtype 2 d, type 3 (more particularly new sequences for subtypes 3 a and 3 c ), new type 4 subtypes (more particularly new sequences for subtypes 4 a, 4 e, 4 f, 4 g, 4 h, 4 i and 4 j ) and type 5 a, as shown in FIG. 5 ;

[0029] new sequences spanning amino acid positions 328 to 546 of the E 1 /E 2 region of HCV subtype 5 a as shown in FIG. 12 ;

[0030] new sequences spanning amino acid positions 1556 to 1764 of the NS 3 /NS 4 region of HCV type 3 (more particularly for new subtypes 3 a sequences), and subtype 5 a, as shown in FIG. 7 or 11 ;

[0031] new sequences spanning amino acid positions 2645 to 2757 of the NS 5 B region of HCV subtype 2 d, type 3 (more particularly for new subtypes 3 a and 3 c ), new type 4 subtypes (more particularly subtypes 4 a, 4 e, 4 f, 4 g, 4 h, 4 i and 4 j ) and subtype 5 a, as shown in FIG. 2 ,

[0032] Using the LiPA system mentioned above, Brazilian blood donors with high titer type 3 hepatitis C virus, Gabonese patients with high-titer type 4 hepatitis C virus, and a Belgian patient with high-titer HCV type 5 infection were selected. Nucleotide sequences in the core, E 1 , NS 5 and NS 4 regions which have not yet been reported before, were analyzed in the frame of the invention. Coding sequences (with the exception of the core region) of any type 4 isolate are reported for the first time in the present invention. The NS 5 b region was also analyzed for the new type 3 isolates. After having determined the NS 5 b sequences, comparison with the Ta and Tb subtypes described by Mori et al. (1992) was possible, and the type 3 sequences could be identified as type 3 a genotypes. The new type 4 isolates segregated into 10 subtypes, based on homologies obtained in the NS 5 and E 1 regions. New type 2 and 3 sequences could also be distinguished from previously described type 2 or 3 subtypes from sera collected in Belgium and the Netherlands.

[0033] The term “polynucleic acid” refers to a single stranded or double stranded nucleic acid sequence which may contain at least 5 contiguous nucleotides to the complete nucleotide sequence (f.i. at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more contiguous nucleotides). A polynucleic acid which is up till about 100 nucleotides in length is often also referred to as an oligonucleotide. A polynucleic acid may consist of deoxyribonucleotides or ribonucleotides, nucleotide analogues or modified nucleotides, or may have been adapted for therapeutic purposes. A polynucleic acid may also comprise a double stranded cDNA clone which can be used for cloning purposes, or for in vivo therapy, or prophylaxis.

[0034] The term “polynucleic acid composition” refers to any kind of composition comprising essentially said polynucleic acids. Said composition may be of a diagnostic or a therapeutic nature.

[0035] The expression “nucleotides corresponding to” refers to nucleotides which are homologous or complementary to an indicated nucleotide sequence or region within a specific HCV sequence.

[0036] The term “coding region” corresponds to the region of the HCV genome that encodes thy HCV polyprotein. In fact, it comprises the complete genome with the exception of the 5 ′ untranslated region and 3 ′ untranslated region.

[0037] The term “HCV polyprotein” refers to the HCV polyprotein of the HCV-J isolate (Kato et al., 1990). The adenine residue at position 330 (Kato et al., 1990) is the first residue of the ATG codon that initiates the long HCV polyprotein of 3010 amino acids in HCV-J and other type 1 b isolates, and of 3011 amino acids in HCV- 1 and other type 1 a isolates, and of 3033 amino acids in type 2 isolates HC-J 6 and HC-J 8 (Okamoto et al., 1992).

[0038] This adenine is designated as position 1 at the nucleic acid level, and this methionine is designated as position 1 at the amino acid level, in the present invention. As type 1 a isolates contain 1 extra amino acid in the NS 5 a region, coding sequences of type 1 a and 1 b have identical numbering in the Core, E 1 , NS 3 , and NS 4 region, but will differ in the NS 5 b region as indicated in Table 1. Type 2 isolates have 4 extra amino acids in the E 2 region, and 17 or 18 extra amino acids in the NS 5 region compared to type 1 isolates, and will differ in numbering from type 1 isolates in the NS 3 / 4 region and NS 5 b regions as indicated in Table 1. 1

TABLE 1
Positions Positions Positions Positions
described in described for described for described for
the HCV-J HCV-1 HC-J6, HC-J8
present (Kato et al., (Choo et al., (Okamoto et
Region invention* 1990) 1991) al., 1992)
Nucleotides NS5b 8023/8235 8352/8564 8026/8238 8433/8645
7932/8271 8261/8600 7935/8274 8342/8681
NS3/4 4664/5292 4993/5621 4664/5292 5017/5645
4664/4730 4993/5059 4664/4730 5017/5083
4892/5292 5221/5621 4892/5292 5245/5645
3856/4209 4185/4528 3856/4209 4209/4762
4936/5292 5265/5621 4936/5292 5289/5645
coding  330/9359  1/9033  342/9439
region
of present
invention
Amino NS5b 2675/2745 2675/2745 2676/2746 2698/2768
Acids 2645/2757 2645/2757 2646/2758 2668/2780
NS3/4 1556/1764 1556/1764 1556/1764 1560/1768
1286/1403 1286/1403 1286/1403 1290/1407
1646/1764 1646/1764 1646/1764 1650/1768

[0039] Table 1: Comparison of the HCV nucleotide and amino acid numbering system used in the present invention (*) with the numbering used for other prototype isolates. For example, 8352 / 8564 indicates the region designated by the numbering from nucleotide 8352 to nucleotide 8564 as described by Kato et al. (1990). Since the numbering system of the present invention starts at the polyprotein initiation site, the 329 nucleotides of the 5 ′ untranslated region described by Kato et al (1990) have to be substracted, and the corresponding region is numbered from nucleotide 8023 (“ 8352 - 329 ”) to 8235 (“ 8564 - 329 ”)

[0040] The term “HCV type” corresponds to a group of HCV isolates of which the complete genome shows more than 74% homology at the nucleic acid level, or of which the NS 5 region between nucleotide positions 7932 and 8271 shows more than 74% homology at the nucleic acid level, or of which the complete HCV polyprotein shows more than 78% homology at the amino acid level, or of which the NS 5 region between amino acids at positions 2645 and 2757 shows more than 80% homology at the amino acid level, to polyproteins of the other isolates of the group, with said numbering beginning at the first ATG codon or first methionine of the long HCV polyprotein of the HCV-J isolate (Kato et al., 1990). Isolates belonging to different types of HCV exhibit homologies, over the complete genome, of less than 74% at the nucleic acid level and less than 78% at the amino acid level. Isolates belonging to the same type usually show homologies of about 92 to 95% at the nucleic acid level and 95 to 96% at the amino acid level when belonging to the same subtype, and those belonging to the same type but different subtypes preferably show homologies of about 79% at the nucleic acid level and 85-86% at the amino acid level.

[0041] More preferably the definition of HCV types is concluded from the classification of HCV isolates according to their nucleotide distances calculated as detailed below

[0042] (1) based on phylogenetic analysis of nucleic acid sequences in the NS5b region between nucleotides 7935 and 8274 (Choo et al., 1991) or 8261 and 8600 (Kato et al., 1990) or 8342 and 8681 (Okamoto et al., 1991), isolates belonging to the same HCV type show nucleotide distances of less than 0.34, usually less than 0.33, and more usually of less than 0.32, and isolates belonging to the same subtype show nucleotide distances of less than 0.135, usually of less than 0.13, and more usually of less than 0.125, and consequently isolates belonging to the same type but different subtypes show nucleotide distances ranging from 0 135 to 0.34, usually ranging from 0.1384 to 0.2477, and more usually ranging from 0.15 to 0.32, and isolates belonging to different HCV types show nucleotide distances greater than 0.34, usually greater that 0.35, and more usually of greater than 0.358, more usually ranging from 0 1384 to 0.2977.

[0043] (2) based on phylogenetic analysis of nucleic acid sequences in the core/E 1 region between nucleotides 378 and 957 , isolates belonging to the same HCV type show nucleotide distances of less than 0.38, usually of less than 0.37, and more usually of less than 0.364, and isolates belonging to the same subtype show nucleotide distances of less than 0.17, usually of less than 0.16, and more usually of less than 0.15, more usually less than 0.135, more usually less than 0.134, and consequently isolates belonging to the same type but different subtypes show nucleotide distances ranging from 0.15 to 0.38, usually ranging from 0.16 to 0.37, and more usually ranging from 0.17 to 0.36, more usually ranging from 0.133 to 0.379, and isolates belonging to different HCV types show nucleotide distances greater than 0.34, 0.35, 0.36, usually more than 0.365, and more usually of greater than 0.37,

[0044] (3) based on phylogenetic analysis of nucleic acid sequences in the NS 3 /NS 4 region between nucleotides 4664 and 5292 (Choo et al., 1991) or between nucleotides 4993 and 5621 (Kato et al., 1990) or between nucleotides 5017 and 5645 (Okamoto et al., 1991), isolates belonging to the same HCV type show nucleotide distances of less than 0.35, usually of less than 0.34, and more usually of less than 0.33, and isolates belonging to the same subtype show nucleotide distances of less than 0.19, usually of less than 0.18, and more usually of less than 0.17, and consequently isolates belonging to the same type but different subtypes show nucleotide distances ranging from 0.17 to 0.35, usually ranging from 0.18 to 0 34, and more usually ranging from 0.19 to 0.33, and isolates belonging to different HCV types show nucleotide distances greater than 0.33, usually greater than 0.34, and more usually of greater than 0.35. 2

TABLE 2
Molecular evolutionary distances
Core/E1 E1 NS5B NS5B
Region 579 bp 384 bp 340 bp 222 bp
Isolates* 0.0017 − 0.1347 0.0026 − 0.2031 0.0003 − 0.1151  0.000 − 0.1323
(0.0750 ± 0.0245) (0.0969 ± 0.0289) (0.0637 ± 0.0229) (0.0607 ± 0.0205)
Subtypes* 0.1330 − 0.3794 0.1645 − 0.4869 0.1384 − 0.2977  0.117 − 0.3538
(0.2786 ± 0.0363) (0.3761 ± 0.0433) (0.2219 ± 0.0341) (0.2391 ± 0.0399)
Types* 0.3479 − 0.6306 0.4309 − 0.9561 0.3581 − 0.6670 0.3457 − 0.7471
(0.4703 ± 0.0525) (0.6308 ± 0.0928) (0.4994 ± 0.0495) (0.5295 ± 0.0627)
*Figures created by the PHYLIP program DNADIST are expressed as minimum to maximum (average ± standard deviation). Phylogenetic distances for isolates belonging to the same subtype (‘isolates’), to different subtypes of the same type (‘subtypes’), and to different types (‘types’) are given.

[0045] In a comparative phylogenetic analysis of available sequences, ranges of molecular evolutionary distances for different regions of the genome were calculated, based on 19,781 pairwise comparisons by means of the DNA DIST program of the phylogeny inference package PHYLIP version 3.5C (Felsenstein, 1993). The results are shown in Table 2 and indicate that although the majority of distances obtained in each region fit with classification of a certain isolate, only the ranges obtained in the 340 bp NS 5 B-region are non-overlapping and therefor conclusive. However, as was performed in the present invention, it is preferable to obtain sequence information from at least 2 regions before final classification of a given isolate.

[0046] Designation of a number to the different types of HCV and HCV types nomenclature is based on chronological discovery of the different types. The numbering system used in the present invention might still fluctuate according to international conventions or guidelines. For example, “type 4 ” might be changed into “type 5 ” or “type 6 ”.

[0047] The term “subtype” corresponds to a group of HCV isolates of which the complete polyprotein shows a homology of more than 90% both at the nucleic acid and amino acid levels, or of which the NS 5 region between nucleotide positions 7932 and 8271 shows a homology of more than 90% at the nucleic acid level to the corresponding parts of the genomes of the other isolates of the same group, with said numbering beginning with the adenine residue of the initiation codon of the HCV polyprotein. Isolates belonging to the same type but different subtypes of HCV show homologies of more than 74% at the nucleic acid level and of more than 78% at the amino acid level.

[0048] The term “BR 36 subgroup” refers to a group of type 3 a HCV isolates (BR 36 , BR 33 , BR 34 ) that are 95%, preferably 95.5%, most preferably 96% homologous to the sequences as represented in SEQ ID NO 1, 3, 5, 7, 9, 11 in the NS 5 b region from position 8023 to 8235 .

[0049] It is to be understood that extremely variable regions like the E 1 , E 2 and NS 4 regions will exhibit lower homologies than the average homology of the complete genome of the polyprotein.

[0050] Using these criteria, HCV isolates can be classified into at least 6 types. Several subtypes can clearly be distinguished in types 1 , 2 , 3 and 4 : 1 a, 1 b, 2 a, 2 b, 2 c, 2 d, 3 a, 3 b, 4 a, 4 b, 4 c, 4 d, 4 e, 4 f, 4 g, 4 h, 4 i and 4 j based on homologies of the 5 ′ UR and coding regions including the part of NS 5 between positions 7932 and 8271 . An overview of most of the reported isolates and their proposed classification according to the typing system of the present invention as well as other proposed classifications is presented in Table 3. 3

TABLE 3
HCV CLASSIFICATION
OKA- NAKA
MOTO MORI O CHA PROTOTYPE
1a I I Pt GI HCV-1, HCV-H, HC-J1
1b II II KI GIII HCV-J, HCV-BK, HCV-T, HC-JK1, HC-
J4, HCV-CHINA
1c HC-G9
2a III III K2a GIII HC-J6
2b IV IV K2b GIII HC-J8
2c S83, ARG6, ARG8, I10, T983
2d NE92
3a V V K3 GIV E-b1, Ta, BR36, BR33, HD10, NZL1
3b VI K3 GIV HCV-TR, Tb
3c BE98
4a Z4, GB809-4
4b Z1
4c GB116, GB358, GB215, Z6, Z7
4d DK13
4e GB809-2, CAM600, CAM736
4f CAM622, CAM627
4g GB549
4h GB438
4i CAR4/1205
4j CAR1/501
4k EG29
5a GV SA3, SA4, SA1, SA7, SA11, BE95
6a HK1, HK2, HK3, HK4

[0051] The term “complement” refers to a nucleotide sequence which is complementary to an indicated sequence and which is able to hybridize to the indicated sequences.

[0052] The composition of the invention can comprise many combinations. By way of example, the composition of the invention can comprise:

[0053] two (or more) nucleic acids from the same region or,

[0054] two nucleic acids (or more), respectively from different regions, for the same isolate or for different isolates,

[0055] or nucleic acids from the same regions and from at least two different regions (for the same isolate or for different isolates).

[0056] The present invention relates more particularly to a polynucleic acid composition as defined above, wherein said polynucleic acid corresponds to a nucleotide sequence selected from any of the following HCV type 3 genomic sequences:

[0057] an HCV genomic sequence having a homology of at least 67%, preferably more than 69%, more preferably 71%, even more preferably more than 73%, or most preferably more than 76% to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 (HD 10 , BR 36 or BR 33 sequences) in the region spanning positions 417 to 957 of the Core/E 1 region as shown in FIG. 4 ;

[0058] an HCV genomic sequence having a homology of at least 65%, preferably more than 67%, preferably more than 69%, even preferably more than 70%, most preferably more than 74% to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 (HD 10 , BR 36 or BR 33 sequences) in the region spanning positions 574 to 957 of the E 1 region as shown in FIG. 4 ;

[0059] an HCV genomic sequence as having a homology of at least 79%, more preferably at least 81%, most preferably more than 83% or more to any of the sequences as represented in SEQ ID NO 147 (representing positions 1 to 346 of the Core region of HVC type 3 c, sequence BE 98 ) in the region spanning positions 1 to 378 of the Core region as shown in FIG. 3 ;

[0060] an HCV genomic sequence of HVC type 3 a having a homology of at least 74%, more preferably at least 76%, most preferably more than 78% or more to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 (HD 10 , BR 36 or BR 33 sequences) in the region spanning positions 417 to 957 in the Core/E 1 region as shown in FIG. 4 ;

[0061] an HCV genomic sequence of HCV type 3 a as having a homology of at least 74%, preferably more than 76%, most preferably 78% or more to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 (HD 10 , BR 36 or BR 33 sequences) in the region spanning positions 574 to 957 in the E 1 region as shown in FIG. 4 ;

[0062] an HCV genomic sequence as having a homology of more than 73.5%, preferably more than 74%, most preferably 75% homology to the sequence as represented in SEQ ID NO 29 (HCC 153 sequence) in the region spanning positions 4664 to 4730 of the NS 3 region as shown in FIG. 6 ;

[0063] an HCV genomic sequence having a homology of more than 70%, preferably more than 72%, most preferably more than 74% homology to any of the sequences as represented in SEQ ID NO 29, 31, 33, 35, 37 or 39 (HCC 153 , HD 10 , BR 36 sequences) in the region spanning positions 4892 to 5292 in the NS 3 /NS 4 region as shown in FIG. 6 or 10 ;

[0064] an HCV genomic sequence of the BR 36 subgroup of HCV type 3 a as having a homology of more than 95%, preferably 95,5%, most preferably 96% homology to any of the sequences as represented in SEQ ID NO 5, 7, 1, 3, 9 or 11 (BR 34 , BR 33 , BR 36 sequences) in the region spanning positions 8023 to 8235 of the NS 5 region as shown in FIG. 1 ;

[0065] an HCV genomic sequence of the BR 36 subgroup of HCV type 3 a as having a homology of more than 96%, preferably 96.5%, most preferably 97% homology to any of the sequences as represented in SEQ ID NO 5, 7, 1, 3, 9 or 11 (BR 34 , BR 33 , BR 36 sequences) in the region spanning positions 8023 to 8192 of the NSSB region as shown in FIG. 1 ;

[0066] an HCV genomic sequence of HCV type 3 c being characterized as having a homology of more than 79%, more preferably more than 81%, and most preferably more than 83% to the sequence as represented in SEQ ID NO 149 (BE 98 sequence) in the region spanning positions 7932 to 8271 in the NS 5 B region as shown in FIG. 1 .

[0067] Preferentially the above-mentioned genomic HCV sequences depict sequences from the coding regions of all the above-mentioned sequences.

[0068] According to the nucleotide distance classification system (with said nucleotide distances being calculated as explained above), said sequences of said composition are selected from:

[0069] an HCV genomic sequence being characterized as having a nucleotide distance of less than 0.44, preferably of less than 0.40, most preferably of less than 0.36 to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 in the region spanning positions 417 to 957 of the Core/E 1 region as shown in FIG. 4 ;

[0070] an HCV genomic sequence being characterized having a nucleotide distance of less than 0.53, preferably less than 0.49, most preferably of less than 0.45 to any of the sequences as represented in SEQ ID NO 19, 21, 23, 25 or 27 in the region spanning positions 574 to 957 of the E 1 region as shown in FIG. 4 ;

[0071] an HCV genomic sequence characterized having a nucleotide distance of less than 0.15, preferably less than 0.13, and most preferably less than 0.11 to any of the sequences as represented in SEQ ID NO 147 in the region spanning positions 1 to 378 of the Core region as shown in FIG. 3 ;

[0072] an HCV genomic sequence of HVC type 3 a being characterized as having a nucleotide distance of less than 0.3, preferably less than 0.26, most preferably of less than 0.22 to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 in the region spanning positions 417 to 957 in the Core/E 1 region as shown in FIG. 4 ;

[0073] an HCV genomic sequence of HCV type 3 a being characterized as having a nucleotide distance of less than 0.35, preferably less than 0.31, most preferably of less than 0.27 to any of the sequences as represented in SEQ ID NO 13, 15, 17, 19, 21, 23, 25 or 27 in the region spanning positions 574 to 957 in the E 1 region as shown in FIG. 4 ;

[0074] an HCV genomic sequence of the BR 36 subgroup of HCV type 3 a being characterized as having a nucleotide sequence of less than 0.0423, preferably less than 0.042, preferably less than 0.0362 to any of the sequences as represented in SEQ ID NO 5, 7, 1, 3, 9 or 11 in the region spanning positions 8023 to 8235 of the NS 5 region as shown in FIG. 1 ;

[0075] an HCV genomic sequence of HCV type 3 c being characterized as having a nucleotide distance of less than 0.255, preferably of less than 0.25, more preferably of less than 0.21, most preferably of less than 0.17 to the sequence as represented in SEQ ID NO 149 in the region spanning positions 7932 to 8271 in the NS 5 B region as shown in FIG. 1 .

[0076] In the present application, the E 1 sequences encoding the antigenic ectodomain of the E 1 protein, which does not overlap the carboxyterminal signal-anchor sequences of E 1 disclosed by Cha et al. (1992; WO 92/19743), in addition to the NS 4 epitope region, and a part of the NS 5 region are disclosed for 4 different isolates: BR 33 , BR 34 , BR 36 , HCC 153 and HD 10 , all belonging to type 3 a (SEQ ID NO 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 35, 37 or 39).

[0077] Also within the present invention are new subtype 3 c sequences (SEQ ID NO 147, 149 of the isolate BE 98 in the Core and NS 5 regions (see FIGS. 3 and 1 ).

[0078] Finally the present invention also relates to a new subtype 3 a sequence as represented in SEQ ID NO 217 (see FIG. 1 ).

[0079] Also included within the present invention are sequence variants of the polynucleic acids as selected from any of the nucleotide sequences as given in any of the above mentioned SEQ ID numbers, with said sequence variants containing either deletions and/or insertions of one or more nucleotides, mainly at the extremities of oligonucleotides (either 3 ′ or 5 ′), or substitutions of some non-essential nucleotides by others (including modified nucleotides an/or inosine), for example, a type 1 or 2 sequence might be modified into a type 3 sequence by replacing some nucleotides of the type 1 or 2 sequence with type-specific nucleotides of type 3 as shown in FIG. 1 (NS 5 region), FIG. 3 (Core region), FIG. 4 (Core/E 1 region), FIG. 6 and 10 (NS 3 /NS 4 region).

[0080] According to another embodiment, the present invention relates to a polynucleic acid composition as defined above, wherein said polynucleic acids correspond to a nucleotide sequence selected from any of the following HCV type 5 genomic sequences:

[0081] an HCV genomic sequence as having a homology of more than 85%, preferably more than 86%, most preferably more than 87% homology to any of the sequences as represented in SEQ ID NO 41, 43, 45, 47, 49, 51, 53 (PC sequences) or 151 (BE 95 sequence) in the region spanning positions 1 to 573 of the Core region as shown in FIG. 9 and 3 ;

[0082] an HCV genomic sequence as having a homology of more than 61%, preferably more than 63%, more preferably more than 65% homology, even more preferably more than 66% homology and most preferably more than 67% homology (f.i. 69 and 71%) to any of the sequences as represented in SEQ ID NO 41, 43, 45, 47, 49, 51, 53 (PC sequences), 153 or 155 (BE 95 , BE 100 sequences) in the region spanning positions 574 to 957 of the E 1 region as shown in FIG. 4 ;

[0083] an HCV genomic sequence having a homology of more than 76.5%, preferably of more than 77%, most preferably of more than 78% homology with any of the sequences as represented in SEQ ID NO 55, 57, 197 or 199 (PC sequences) in the region spanning positions 3856 to 4209 of the NS 3 region as shown in FIG. 6 or 10 ;

[0084] an HCV genomic sequence having a homology of more than 68%, preferably of more than 70%, most preferably of more than 72% homology with the sequence as represented in SEQ ID NO 157 (BE 95 sequence) in the region spanning positions 980 to 1179 of the E 1 /E 2 region as shown in FIG. 13 ;

[0085] an HCV genomic sequence having a homology of more than 57%, preferably more than 59%, most preferably more than 61% homology to any of the sequences as represented in SEQ ID NO 59 or 61 (PC sequences) in the region spanning positions 4936 to 5296 of the NS 4 region as shown in FIG. 6 or 10;

[0086] an HCV genomic sequence as having a homology of more than 93%, preferably more than 93.5%, most preferably more than 94% homology to any of the sequences as represented in SEQ ID NO 159 or 161 (BE 95 or BE 96 sequences) in the region spanning positions 7932 to 8271 of the NS 5 B region as shown in FIG. 1 .

[0087] Preferentially the above-mentioned genomic HCV sequences depict sequences from the coding regions of all the above-mentioned sequences.

[0088] According to the nucleotide distance classification system (with said nucleotide distances being calculated as explained above), said sequences of said composition are selected from:

[0089] a nucleotide distance of less than 0.53, preferably less than 0.51, more preferably less than 0.49 for the E 1 region to the type 5 sequences depicted above;

[0090] a nucleotide distance of less than 0.3, preferably less than 0.28, more preferably of less than 0.26 for the Core region to the type 5 sequences depicted above;

[0091] a nucleotide distance of less than 0.072, preferably less than 0.071, more preferably less than 0.070 for the NS 5 B region to the type 5 sequences as depicted above.

[0092] Isolates with similar sequences in the 5 ′UR to a group of isolates including SA 1 , SA 3 , and SA 7 described in the 5 ′UR by Bukh et al. (1992), have been reported and described in the 5 ′UR and NS 5 region as group V by Cha et al. (1992; WO 92/19743). This group of isolates belongs to type 5 a as described in the present invention (SEQ ID NO 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 151, 153, 155, 157, 159, 161, 197 and 199).

[0093] Also included within the present invention are sequence variants of the polynucleic acids as selected from any of the nucleotide sequences as given in any of the above given SEQ ID numbers with said sequence variants containing either deletion and/or insertions of one or more nucleotides, mainly at the extremities of oligonucleotides (either 3 ′ or 5 ′), or substitutions of some non-essential nucleotides (i.e. nucleotides not essential to discriminate between different genotypes of HCV) by others (including modified nucleotides an/or inosine), for example, a type 1 or 2 sequence might be modified into a type 5 sequence by replacing some nucleotides of the type 1 or 2 sequence with type-specific nucleotides of type 5 as shown in FIG. 3 (Core region), FIG. 4 (Core/E 1 region), FIG. 10 (NS 3 /NS 4 region), FIG. 14 (E 1 /E 2 region).

[0094] Another group of isolates including BU 74 and BU 79 having similar sequences in the 5 ′UR to isolates including Z 6 and Z 7 as described in the 5 ′UR by Bukh et al. (1992), have been described in the 5 ′UR and classified as a new type 4 by the inventors of this application (Stuyver et al., 1993). Coding sequences, including core, E 1 and NS 5 sequences of several new Gabon