{{nav.loginGreeting}}
  • 資料取得
      • 物種出現紀錄
      • GBIF 應用程式介面
      • 物種
      • 資料集
      • Occurrence snapshots
      • Hosted portals
      • 趨勢
  • 實務作法
    • 資料分享

      • 快速上手
      • 資料集類別
      • 資料託管
      • 資料標準
      • 成為發布者
      • 資料品質
      • 資料論文
    • 資料使用

      • 資料使用精選案例
      • 引用指南
      • 使用 GBIF 資料的研究文獻
      • 線上引用部件
  • 工具
    • 發布者

      • IPT 整合式發布工具
      • 資料驗證工具
      • GeoPick
      • New data model ⭐️
      • 科學典藏
      • 建議資料集
    • 使用者

      • Hosted portals
      • Scientific collections
      • 資料處理
      • 衍生的資料集
      • rgbif
      • pygbif
      • MAXENT
      • 工具目錄
    • GBIF 實驗室

      • 物種學名對應
      • 學名解析
      • 序列識別碼
      • 相對觀測趨勢
      • GBIF 資料部落格
  • 社群
    • 網絡

      • GBIF 會員國家及組織
      • 節點
      • 資料發布者
      • 聯繫 GBIF 網絡
      • 社群論壇
      • 一個生物多樣性知識的聯盟
    • 志願參與

      • 輔導員
      • 生物多樣性開放資料大使
      • 翻譯人員
      • 公民科學家
    • 活動

      • 能量提升
      • 計畫與專案
      • 訓練及數位學習
      • Data Use Club
      • 生物地圖集
  • 關於
    • GBIF 網內

      • 什麼是 GBIF?
      • 成為會員
      • 治理
      • GBIF 執行計畫
      • Work Programme
      • 經費來源
      • 合作關係
      • 版本說明
      • 聯絡資訊
    • 新聞與推廣

      • 新聞
      • 通訊和郵件論壇
      • 活動
      • 獎項
      • 科學評論
      • Data use
  • User profile

INSDC Environment Sample Sequences

Citation

European Bioinformatics Institute (EMBL-EBI), GBIF Helpdesk (2024). INSDC Environment Sample Sequences. Version 1.97. European Nucleotide Archive (EMBL-EBI). Occurrence dataset https://doi.org/10.15468/mcmd5g accessed via GBIF.org on 2024-08-12.

Description

This dataset contains INSDC sequences associated with environmental sample identifiers. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with the search parameters: environmental_sample=True & host=""

EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230).

The data was then processed as follows:

1. Human sequences were excluded.

2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number.

3. Contigs and whole genome shotgun (WGS) records were added individually.

4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept.

5. The records associated with the same vouchers are aggregated together.

6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by scientific_name, collection_date, location, country, identified_by, collected_by and sample_accession (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: https://github.com/gbif/embl-adapter/issues/10#issuecomment-855757978

7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip

More information available here: https://github.com/gbif/embl-adapter#readme

You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md

Taxonomic Coverages

Geographic Coverages

Worldwide

Bibliographic Citations

Contacts

European Bioinformatics Institute (EMBL-EBI)
originator
email: datasubs@ebi.ac.uk
homepage: http://www.ebi.ac.uk
GBIF Helpdesk
metadata author
email: helpdesk@gbif.org
European Bioinformatics Institute (EMBL-EBI)
administrative point of contact
email: datasubs@ebi.ac.uk
homepage: http://www.ebi.ac.uk
什麼是 GBIF? 應用程式介面 常見問答 通訊 隱私權 使用協議與條款 引用 行為準則 致謝
聯絡我們 GBIF Secretariat Universitetsparken 15 DK-2100 Copenhagen Ø Denmark
GBIF is a Global Core Biodata Resource