[EN] Awesome list: a toolkit for text analyzis Armenian language
- Eastern Armenian National Corpus Electronic Library provides a full view of works by classical authors (these books are in the public domain because their authors died more than 70 years ago). The corpus contains 4547379 words from 104 books by 12 authors.
- Named entity recognition. pioNer — trained data for Armenian NER using Wikipedia. This corpus provides the gold standard for automatically generated annotated datasets using GloVe models for Armenian. Along with the datasets, 50-, 100-, 200-, and 300-dimensional GloVe word embeddings trained on a collection of Armenian texts from Wikipedia, news, blogs, and encyclopedias have been released.
- The Polyglot library for Python supports language detection, named entity extraction (using Wikipedia data), morphological analysis, transliteration, and sentiment analysis for Armenian.
- Kevin Bougé Stopword Lists Page includes th Armenian language.
- Ranks NL Stopword Lists Page includes the Armenian language.
If you know of new usefull tools and guides, please share that knowledge with us!
Image author Aparna Melaput
#opendata #armenia #language #tools #digitalhumanities
- Eastern Armenian National Corpus Electronic Library provides a full view of works by classical authors (these books are in the public domain because their authors died more than 70 years ago). The corpus contains 4547379 words from 104 books by 12 authors.
- Named entity recognition. pioNer — trained data for Armenian NER using Wikipedia. This corpus provides the gold standard for automatically generated annotated datasets using GloVe models for Armenian. Along with the datasets, 50-, 100-, 200-, and 300-dimensional GloVe word embeddings trained on a collection of Armenian texts from Wikipedia, news, blogs, and encyclopedias have been released.
- The Polyglot library for Python supports language detection, named entity extraction (using Wikipedia data), morphological analysis, transliteration, and sentiment analysis for Armenian.
- Kevin Bougé Stopword Lists Page includes th Armenian language.
- Ranks NL Stopword Lists Page includes the Armenian language.
If you know of new usefull tools and guides, please share that knowledge with us!
Image author Aparna Melaput
#opendata #armenia #language #tools #digitalhumanities
[EN] WorldPop is an Open Spatial Demographic Data and Research project at the University of Southampton. It provides a lot of open datasets via its WorldPop Hub data catalog that aggregates a lot of demographics-related data, and it gives a lot of Armenia-related datasets too.
For example:
- The spatial distribution of population in 2020, Armenia
- National boundaries, Armenia
And many other datasets are available as GeoTIFF files.
All datasets available under CC-BY open license.
We started uploading this data to our data catalog and other datasets with geodata, statistics, and demographics.
Feel free to share if you know any other interesting data sources related to Armenia that are interesting for data analysis and research.
#opendata #geodata #demographics #worldpop
For example:
- The spatial distribution of population in 2020, Armenia
- National boundaries, Armenia
And many other datasets are available as GeoTIFF files.
All datasets available under CC-BY open license.
We started uploading this data to our data catalog and other datasets with geodata, statistics, and demographics.
Feel free to share if you know any other interesting data sources related to Armenia that are interesting for data analysis and research.
#opendata #geodata #demographics #worldpop
[EN] And here are some new inspiring 3D models, this time of Geghard monastery alongside Ani. They are published at the Open Heritage website. Open Heritage 3D is a project dedicated to making primary 3D cultural heritage data open and accessible as well as to ease sharing these data for publishers.
CyArc which led the documentation of both sites is one of the most authoritative in the field of digital culture preservation. Interestingly, Geghard Monastery was immortalized by the high schoolers of the TUMO Center for Creative Technologies during the two-week workshop ran by CyArk.
Feel free to share if you know any other interesting data sources aimed at preserving cultural heritage.
#opendata #armenia #history #architecture
CyArc which led the documentation of both sites is one of the most authoritative in the field of digital culture preservation. Interestingly, Geghard Monastery was immortalized by the high schoolers of the TUMO Center for Creative Technologies during the two-week workshop ran by CyArk.
Feel free to share if you know any other interesting data sources aimed at preserving cultural heritage.
#opendata #armenia #history #architecture
[EN] Main international and national data sources on Armenia 🇦🇲
We recall the well-known international data sources where one can find the main indicators and data sets about Armenia and enlarge the list:
— World Bank Statistics. Published data on the main economic and social indicators of the country: https://data.worldbank.org/country/AM
— Key Indicators of the Economy of Armenia, published at Asian Development Bank portal: https://kidb.adb.org/economies/armenia
— UN Statistics. Data are aggregated from various international databases and survey results by thematic cross-sections: economy, demography, agriculture and products, climate, unemployment and employment, telecommunication, information technology, etc. Go to search: http://data.un.org/Explorer.aspx
— Key health and mortality indicators for Armenia at the World Health Organisations portal: https://data.who.int/countries/051
— Climate Change Indicators by IMF by country, including Armenia: https://climatedata.imf.org/pages/country-data
— Biodiversity datasets from Global Core Biodata Resource: https://www.gbif.org/country/AM/about
— There are 93 datasets related to Armenia published on the Humanitarian Data Exchange portal. Also, there are geospatial datasets: https://data.humdata.org/group/arm
There is a lot of overlap and duplication but the data can form an excellent basis for enriching and creating new, high-quality, and useful datasets.
If you know of other sources, please let us know about them in the chat: https://yangx.top/opendataamchat.
We recall the well-known international data sources where one can find the main indicators and data sets about Armenia and enlarge the list:
— World Bank Statistics. Published data on the main economic and social indicators of the country: https://data.worldbank.org/country/AM
— Key Indicators of the Economy of Armenia, published at Asian Development Bank portal: https://kidb.adb.org/economies/armenia
— UN Statistics. Data are aggregated from various international databases and survey results by thematic cross-sections: economy, demography, agriculture and products, climate, unemployment and employment, telecommunication, information technology, etc. Go to search: http://data.un.org/Explorer.aspx
— Key health and mortality indicators for Armenia at the World Health Organisations portal: https://data.who.int/countries/051
— Climate Change Indicators by IMF by country, including Armenia: https://climatedata.imf.org/pages/country-data
— Biodiversity datasets from Global Core Biodata Resource: https://www.gbif.org/country/AM/about
— There are 93 datasets related to Armenia published on the Humanitarian Data Exchange portal. Also, there are geospatial datasets: https://data.humdata.org/group/arm
There is a lot of overlap and duplication but the data can form an excellent basis for enriching and creating new, high-quality, and useful datasets.
If you know of other sources, please let us know about them in the chat: https://yangx.top/opendataamchat.
Please open Telegram to view this post
Natural Language Processing can enhance not only our communication and language knowledge, but also strengthen the historical studies.
Marcella Tambuscio and Tara Lee Andrews in their Geolocation and Named Entity Recognition in Ancient Texts: A Case Study about Ghewond’s Armenian History apply Named Entity Recognition (NER) to Ghewond’s Armenian History. This facilitates drawing the ‘big picture’ of Armenian history in that period and matching historical toponyms with their contemporary counterparts. The outcomes and reproducible validated results of applying the model are published on GitHub. We also added them to our data catalog.
We believe that such studies are going to become more common, making ancient texts more available to a wider public and to the professional community. Tell us if you are aware of similar efforts in the field!
#opendata #armenia #history #language
Marcella Tambuscio and Tara Lee Andrews in their Geolocation and Named Entity Recognition in Ancient Texts: A Case Study about Ghewond’s Armenian History apply Named Entity Recognition (NER) to Ghewond’s Armenian History. This facilitates drawing the ‘big picture’ of Armenian history in that period and matching historical toponyms with their contemporary counterparts. The outcomes and reproducible validated results of applying the model are published on GitHub. We also added them to our data catalog.
We believe that such studies are going to become more common, making ancient texts more available to a wider public and to the professional community. Tell us if you are aware of similar efforts in the field!
#opendata #armenia #history #language
You are probably going to be surprised, but Armenia holds one of the leading positions in the region of Eastern Europe and Central Asia, backing down only to Ukraine and slightly surpassing Kazakhstan and Russia, according to the Global Data Barometer 2022.
The Armenia’s assessment comprised by Georgia-based experts shows that the country’s relative strength affecting the relatively high index (44.6/100) is its public finance data. On the other hand, the weakest of Armenia’s capabilities is its situation with the open data. We will spare no effort to boost this dimension of Armenia’s culture of dealing with the data, which will result in deeper societal changes, increasing consciousness and self-reflexion, as well as in policy responsiveness and effectiveness.
The Armenia’s assessment comprised by Georgia-based experts shows that the country’s relative strength affecting the relatively high index (44.6/100) is its public finance data. On the other hand, the weakest of Armenia’s capabilities is its situation with the open data. We will spare no effort to boost this dimension of Armenia’s culture of dealing with the data, which will result in deeper societal changes, increasing consciousness and self-reflexion, as well as in policy responsiveness and effectiveness.
We are glad to announce that we uploaded a lot of new datasets to the Open Data Armenia data catalog.
Datasets were aggregated from the following sources:
- WorldPop - global population geodata catalog
- Global Forest Watch Open data portal of Armenia - forest-related geodata collected by REC Caucasus
- World Bank data catalog - world statistics and surveys
- The Armenian Soil Information System (ArmSIS) - soil geodata from Armenian National Agrarian University
- Institute of Geological sciences geoportal - geology- related geodata
Total number of datasets on the portal is 702.
We will do our best to collect and upload more Armenian and Armenian-related datasets into our open data catalog from international and local Armenian data sources.
Source code and raw data from these data sources are also available at the code repository https://github.com/opendataam/opendataam-bulk
#opendata #opensource #datasets
Datasets were aggregated from the following sources:
- WorldPop - global population geodata catalog
- Global Forest Watch Open data portal of Armenia - forest-related geodata collected by REC Caucasus
- World Bank data catalog - world statistics and surveys
- The Armenian Soil Information System (ArmSIS) - soil geodata from Armenian National Agrarian University
- Institute of Geological sciences geoportal - geology- related geodata
Total number of datasets on the portal is 702.
We will do our best to collect and upload more Armenian and Armenian-related datasets into our open data catalog from international and local Armenian data sources.
Source code and raw data from these data sources are also available at the code repository https://github.com/opendataam/opendataam-bulk
#opendata #opensource #datasets
More Armenian open geodata available. This time 85 datasets from Scientific Network for the Caucasus Mountain Regions (https://data.opendata.am/organization/sustcaucasus)
These datasets are map layers related to Armenia and neighbor countries and whole Caucasus.
#opendata #geodata #datasets
These datasets are map layers related to Armenia and neighbor countries and whole Caucasus.
#opendata #geodata #datasets
Բաց տվյալները մեր անցյալի մասին գիտելիքի խոստւմնալից աղբյուր են: Օրինակ, Հանրային ռադիոյի ջանքերի շնորհիվ մենք կարող ենք վերապրել մեր ծնողների ու պապերի առօրյան ու ավելի լավ հասկանալ նրանց:
«Հանրայինը» հրապարակել է իր բոլոր թողարկումները 1920-ականներից ու վեր: Կայքում կարելի է գտնել երգեր, ինչպես նաև ծրագրեր ըստ ժանրերի: Նաև հասանելի են արխիվային լուսանկարներ, երաժիշտների ու երգիչների կենսագրությունն ու գործերը:
Այդ տվյալները ոչ միայն հետաքրքրաշարժ են, այլ նաև օգտակար լեզվական խնդիրներ լուծելու հարցում, քանի որ թույլ են տալիս համատեղել առկա տեքստերն ու անթերի ընթերցանությունը (օրինակ, գրական թողարկումների պարագայում):
Արդյո՞ք գիտեք նման արխիվների մասին ու ի՞նչ եք կարծում դրանց պրակտիկ կիրառելիության մասին:
«Հանրայինը» հրապարակել է իր բոլոր թողարկումները 1920-ականներից ու վեր: Կայքում կարելի է գտնել երգեր, ինչպես նաև ծրագրեր ըստ ժանրերի: Նաև հասանելի են արխիվային լուսանկարներ, երաժիշտների ու երգիչների կենսագրությունն ու գործերը:
Այդ տվյալները ոչ միայն հետաքրքրաշարժ են, այլ նաև օգտակար լեզվական խնդիրներ լուծելու հարցում, քանի որ թույլ են տալիս համատեղել առկա տեքստերն ու անթերի ընթերցանությունը (օրինակ, գրական թողարկումների պարագայում):
Արդյո՞ք գիտեք նման արխիվների մասին ու ի՞նչ եք կարծում դրանց պրակտիկ կիրառելիության մասին:
We are always looking for more Armenian textual data: Collections of free to use texts, especially under open licence. We already collected more than 200k texts from ARLIS database of Armenian laws 23Gb uncompressed. But laws are very specific texts, so more data is needed for the use of any advanced applications that could be created in the planned open data competitions.
If you are aware of any other source of Armenian texts, please drop us a note in the chat https://yangx.top/opendataamchat
#texts #datasets #helpneeded
If you are aware of any other source of Armenian texts, please drop us a note in the chat https://yangx.top/opendataamchat
#texts #datasets #helpneeded
Armenian legislation database from ARLIS - Data Catalog Armenia
Armenia legislation database extracted from the ARLIS website (arils.am) with all metadata and texts of Armenian laws and other legal documents. The dataset is relatively big, about 23 GB...
Բաց տվյալներ հետազոտող ու ստեղծող համայնք կերտելու համար մենք նաև հայալեզու կրթական ծրագրեր ենք նախատեսում: Ուրախ ենք, որ մինչ այդ հնարավորություն ունենք կիսվել գործընկերների կողմից պատրաստված ճանաչողական նյութերով,օրինակ` Բուն TV-ի այս փոդքաստով:
ՀՌԿԿ Հայաստան հիմնադրամի գործադիր տնօրեն Սոնա Բալասանյանը և Վիճակագրական կոմիտեի աշխատանքի վիճակագրության բաժնի պետ Լուսինե Քալանթարյանը զրուցեցին բաց տվյալների, դրանց կիրառելիության, շահառուների, պետության և հետազոտողների կապի, ինչպես նաև ոլորտի խնդիրների մասին:
Youtube. https://www.youtube.com/watch?v=Xu7I51_MlzY:
Spotify. https://open.spotify.com/show/0kebHzx0Gzsx3m8vYgdzEZ:
ՀՌԿԿ Հայաստան հիմնադրամի գործադիր տնօրեն Սոնա Բալասանյանը և Վիճակագրական կոմիտեի աշխատանքի վիճակագրության բաժնի պետ Լուսինե Քալանթարյանը զրուցեցին բաց տվյալների, դրանց կիրառելիության, շահառուների, պետության և հետազոտողների կապի, ինչպես նաև ոլորտի խնդիրների մասին:
Youtube. https://www.youtube.com/watch?v=Xu7I51_MlzY:
Spotify. https://open.spotify.com/show/0kebHzx0Gzsx3m8vYgdzEZ:
ԴԱՏԱ փոդքասթ․ Բաց տվյալները Հայաստանում | Սոնա Բալասանյան | Լուսինե Քալանթարյան
«Տվյալներ հաշվետու և թափանցիկ գործունեության համար» (ԴԱՏԱ) ծրագրի շրջանակներում ՀՌԿԿ Հայաստան հիմնադրամի գործադիր տնօրեն Սոնա Բալասանյանը և Վիճակագրական կոմիտեի աշխատանքի վիճակագրության բաժնի պետ Լուսինե Քալանթարյանը զրուցում են բաց տվյալների, դրանց կիրառելիության…
We are waiting for the geodata on Armenia.
In the year 2021, the Asian Development Bank (ADB) launched the project "Armenia: Supporting the Establishment of National Standardised Spatial Data Infrastructure". Its aim is to support the Armenian Cadastre Committee in the creation of a National Standardised Spatial Data Infrastructure by the end of 2023 [1].
According to the latest tender documentation, this portal will be built using Open Geospatial Standards (CSW, WFS, WMS and geospatial data formats GML, GeoPackage, SHP, GeoJSON). This portal will be created using Open Source products, we don't know which ones, but most likely it will be a combination of Geonetwork, Geonode and Geoserver software from Open Geospatial Consortium. Or other similar open source geoportal software.
New portal should be at maparmenia.am, it's not available yet, but we are eagerly waiting for it and we hope that not only open source but also open data will be one of the priorities of the Cadastre Committee and spatial data will be available under permissive licences like Creative Commons CC-BY 4.0.
[1] https://www.adb.org/projects/54388-001/main
#opendata #geodata #spatial
In the year 2021, the Asian Development Bank (ADB) launched the project "Armenia: Supporting the Establishment of National Standardised Spatial Data Infrastructure". Its aim is to support the Armenian Cadastre Committee in the creation of a National Standardised Spatial Data Infrastructure by the end of 2023 [1].
According to the latest tender documentation, this portal will be built using Open Geospatial Standards (CSW, WFS, WMS and geospatial data formats GML, GeoPackage, SHP, GeoJSON). This portal will be created using Open Source products, we don't know which ones, but most likely it will be a combination of Geonetwork, Geonode and Geoserver software from Open Geospatial Consortium. Or other similar open source geoportal software.
New portal should be at maparmenia.am, it's not available yet, but we are eagerly waiting for it and we hope that not only open source but also open data will be one of the priorities of the Cadastre Committee and spatial data will be available under permissive licences like Creative Commons CC-BY 4.0.
[1] https://www.adb.org/projects/54388-001/main
#opendata #geodata #spatial
The Armenian state has been making commitments to the Open Government Partnership since 2012. Via this link you can see what has been done during previous periods and what we should expect to see in 2024. Unfortunately, the 2020-2022 window was missed, most likely due to the consequences of the pandemic and war.
On the portal, you can also access the story of development of open government tools in Armenia and get information regarding the state priorities in this domain. Most of the plans for 2022-2024 deal rather with transparent governance than open data, however, there is an obvious focus on spatial data development and e-budgets.
What do you think about the mentioned aims? Are they reachable and ambitious enough? What would you add or postpone?
On the portal, you can also access the story of development of open government tools in Armenia and get information regarding the state priorities in this domain. Most of the plans for 2022-2024 deal rather with transparent governance than open data, however, there is an obvious focus on spatial data development and e-budgets.
What do you think about the mentioned aims? Are they reachable and ambitious enough? What would you add or postpone?
OGP-Armenia National Action Plan 2022-2024
The OGP is a voluntary, multi-stakeholder international initiative that aims to secure concrete commitments from governments to their citizenry to promote transparency, empower citizens, fight corruption, and harness new technologies to strengthen governance.…
Open data can come extremely handy in transforming the ways we deal with existing non-structured information and therefore our lifestyles. In the Open Data Armenia, making a real impact with this powerful tool is one of our key priorities.
The Open Data Impact portal introduces examples of utilization of open data for large-scale projects across the globe. Maybe you knew some of them and did not imagine that they have become possible only due to the existence of open data.
What projects on the page inspire you most? If you have other ideas in your mind, do not hesitate to share them with the community!
The Open Data Impact portal introduces examples of utilization of open data for large-scale projects across the globe. Maybe you knew some of them and did not imagine that they have become possible only due to the existence of open data.
What projects on the page inspire you most? If you have other ideas in your mind, do not hesitate to share them with the community!
The article published in 2021 in the authoritative ‘Plos One’ journal [1] makes the case for government efforts in publishing open data. Building on the surveys conducted across the European Union, the authors show that open government data is a trust-enabler for Millennials and Generation Z, also being enhanced by citizen satisfaction. Therefore, public officers are encouraged to spread the implementation of open data strategies as a way to strengthen democratic attachment in younger generations.
We firmly believe that these findings could be even more salient in Armenia where there are no serious alternatives to government-enabled open data yet.
[1] Gonzálvez-Gallego, N., & Nieto-Torrejón, L. (2021). Can open data increase younger generations’ trust in democratic institutions? A study in the European Union. PLOS ONE, 16(1), e0244994. https://doi.org/10.1371/journal.pone.0244994
We firmly believe that these findings could be even more salient in Armenia where there are no serious alternatives to government-enabled open data yet.
[1] Gonzálvez-Gallego, N., & Nieto-Torrejón, L. (2021). Can open data increase younger generations’ trust in democratic institutions? A study in the European Union. PLOS ONE, 16(1), e0244994. https://doi.org/10.1371/journal.pone.0244994
Can open data increase younger generations’ trust in democratic institutions? A study in the European Union
Scholars and policy makers are giving increasing attention to how young people are involved in politics and their confidence in the current democratic system. In a context of a global trust crisis in the European Union, this paper examines if open government…
One of the drivers of our motivation to create an open data platform for Armenia was the presence of enormous amounts of unstructured information. Without efforts to make it open (which implies being usable apart from available) Armenia and every other society wastes huge potential.
Today we want to share one of the lists of sources we will be closely working with in our more focused projects dedicated to preservation of Armenian heritage. Society for Armenian Studies has gathered a comprehensive guide to the digital resources covering Armenian matters. Some of them are published in our catalog, but deeper work is needed to transform these huge amounts of information to data applicable for analysis.
The sources mostly account for dispersed collections of manuscripts and texts collections, providing a huge field for work on their matching and systematization.
Today we want to share one of the lists of sources we will be closely working with in our more focused projects dedicated to preservation of Armenian heritage. Society for Armenian Studies has gathered a comprehensive guide to the digital resources covering Armenian matters. Some of them are published in our catalog, but deeper work is needed to transform these huge amounts of information to data applicable for analysis.
The sources mostly account for dispersed collections of manuscripts and texts collections, providing a huge field for work on their matching and systematization.
Делимся прекрасным примером того, как можно не полениться и коллективными усилиями создать общественно-полезный проект, построенный на данных.
https://www.armaqi.org/ – независимая система измерения качества воздуха с помощью открытого сообщества.
История возникновения проекта описана на «Хабре»: https://habr.com/ru/articles/755586/.
https://www.armaqi.org/ – независимая система измерения качества воздуха с помощью открытого сообщества.
История возникновения проекта описана на «Хабре»: https://habr.com/ru/articles/755586/.
ArmAQI - Մթնոլորտային օդի որակը Երևանում և Հայաստանում. Մշուշ Երևանում
ArmAQI - Օդի աղտոտվածությունը Երևանում և Հայաստանում իրական ժամանակում. Մշուշ Երևանում