Software Repository

We make source code of selected research software publicly available. Please note that our software/code are freely available for non-commercial use only.

DATA MANAGEMENT AND ANALYTICS

  SOFTWARE DETAILS
22 MIDAS Publication (ACM SIGMOD 2021)

This software is build on top of CATAPULT and enables efficient and effective maintenance of canned patterns of a visual graph query interface as the underlying collection of small- or medium-sized data graphs evolve. Specifically, MIDAS adopts a selective maintenance strategy that guarantees progressive gain of coverage of the patterns without sacrificing diversity and cognitive load.

Download
21 SSA Publication (ICDE 2021)

This software implements privacy preserving query services for strong simulation queries in the database outsourcing paradigm. In such a paradigm, clients send their queries to a third-party service provider (SP), who has the outsourced large graph data, and the SP computes the query answers. However, as SP may not always be trusted, the sensitive information of the clients’ queries, importantly, the query structures, should be protected. This software adopts strong simulation as a practical query semantic for this paradigm.

Download
20 ShapeNet Publication (AAAI 2021)

This software implements a novel algorithm called ShapeNet, which embeds shapelet candidates from different lengths into the unified space for shaplets selection. The network is trained using our cluster-wise triplet loss, which considers the distance between anchor and multiple positive (negative) samples and the distance among positive (negative) samples. Then, it computes representative and diversified final shapelets rather than directly using all the embeddings for model building to avoid a large fraction of computing non-discriminative shapelet candidates. A classical classifier (e.g., SVM) is then adopted.

Download
19 BSPCover Publication (IEEE TKDE 2021)

Time-series shapelets are discriminative subsequences, recently found effective for time series classification (TSC). It is evident that the quality of shapelets is crucial to the accuracy of TSC. However, the majority of research has focused on building accurate models from some shapelets candidates.This software implements a novel efficient shapelets discovery method, called BSPCOVER, to discover a set of high-quality shapelets candidates for model building.

Download
18 PANE Publication (VLDB 2021)

Given a graph where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node to a compact vector, which can be used in downstream machine learning tasks. PANE is an efective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of common prediction tasks.

Download
17 G-CARE Publication (SIGMOD 2020)

This software realizes the world's first framework for benchmarking graph cardinality estimation techniques for subgraph matching queries.

Download
16new LATTE Publication (SIGMOD 2020)

This software is a user-friendly visual interface for constructing Solidity smart contracts. It is targeted for end users who do not have programming skills or background in Solidity. The system can also serve expert users who can generate the initial code using LATTE and then augment it to their need.

Download
15 NRP Publication (VLDB 2020)

Homogeneous network embedding (HNE) maps the graph structure in the vicinity of a node to a compact, fixed-dimensional feature vector. This software focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility. Our proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic idea of deriving embedding vectors from pairwise personalized PageRank (PPR) values.

Download
14 PPKWS Publication (IEEE ICDE 2020)

This software implements a new keyword search framework, called public-private keyword search (PPKWS), on public-private graph model. PPKWS consists of three major steps: partial evaluation, answer refinement, and answer completion.

Download
13 BigIndex Publication (TKDE 2020)

This software implements a generic ontology-based indexing framework for keyword search for graphs.

Download
12 FROST Publication (ACM TIST 2020)

Facility relocation (FR) problem, which aims to optimize the placement of facilities to accommodate the changes of users’ locations, has a broad spectrum of applications. Despite the significant progress made by existing solutions to the FR problem, they all assume each user is stationary and represented as a single point. Unfortunately, in reality, objects (e.g., people, animals) are mobile. Consequently, these efforts may fail to identify superior solution to the FR problem. For the first time, this software takes into account movement history of users to address the above limitation.

Download
11 NEURON Publication (SIGMOD 2019)

NEURON is a novel system that facilitates natural language interaction with relational query execution plan (QEP), which represents an execution strategy for an SQL query, to enhance its understanding. It accepts an SQL query (which may include joins, aggregation, nesting, among other things) as input, executes it, and generates a simplified natural language description (both in text and voice form) of the execution strategy deployed by the underlying RDBMS. Furthermore, it facilitates understaning of various features related to a QEP through a natural language question answering (NLQA) framework. NEURON, world's first of its kind, is a tool that can greatly enhance students' learning of the query processing topic.

Software
10 CATAPULT Publication (ACM SIGMOD 2019)

This software automatically selects canned patterns for a visual graph query interface designed for a large collection of small- or medium-sized data graphs (e.g., chemical compounds). Given a data graph collection and a pattern budget, it automatically selects the canned patterns to be displayed on a GUI by optimizing coverage, diversity, and cognitive load of the patterns in the underlying data repository. CATAPULT is a core component for realizing plug-and-play visual graph query interface.

Download
9 TEA/TEA+ Publication (ACM SIGMOD 2019)

This software captures implementation of two novel local graph clustering algorithms based on Heat Kernel Page Rank (HKPR) to address the efficiency and accuracy limitations of existing local clustering techniques. Specifically, these algorithms provide non-trivial theoretical guarantees in relative error of HKPR values and the time complexity. The basic idea is to utilize deterministic graph traversal to produce a rough estimation of exact HKPR vector, and then exploit MonteCarlo random walks to refine the results in an optimized and non-trivial way.

Download
8 PANDA Publication (VLDB J 2017, VLDB 2018)

This software implements novel graph querying paradigm called partial topologybased network search and a query processing system called PANDA to efficiently find top-k matches of a partial topology query (PTQ) in a single machine. A PTQ is a disconnected query graph containing multiple connected query components. PTQs allow an end user to formulate queries without demanding precise information about the complete topology of a query graph.

Download
7 AutoG Publication (VLDB J 2017, VLDB 2016)

This software implements a novel framework for subgraph query autocompletion (called AUTOG). Given an initial query q and a user’s preference as input, AUTOG returns ranked query suggestions Q′ as output. Users may choose a query from Q′ and iteratively apply AUTOG to compose their queries.

Download
6 PINOCCHIO Publication (TKDE 2016)

The location selection problem, which aims to mine the optimal location from a set of candidates to place a new facility such that a score (i.e., benefit or influence on some given objects) can be maximized, has drawn significant research attention in recent years. State-of-the-art ls techniques assume each object is static and can only be influenced by a single facility. However, in reality, objects (e.g., people, vehicles) are mobile and are influenced by multiple facilities, which prevents classical ls solutions from selecting accurate results. This software takes mobility and probability factors into consideration to address the aforementioned limitations. Specifically, given a set of candidate locations, it aims to mine the optimal location which can influence the most number of moving objects.

Download
5 DUALSIM Publication (SIGMOD 2016)

Subgraph enumeration is important for many applications such as subgraph frequencies, network motif discovery, graphlet kernel computation, and studying the evolution of social networks. Recently, efforts to enumerate all subgraphs in a large-scale graph have seemed to enjoy some success by partitioning the data graph and exploiting the distributed frameworks such as MapReduce and distributed graph engines. However, we notice that all existing distributed approaches have serious performance problems for subgraph enumeration due to the explosive number of partial results. DUALSIM is a disk-based, single machine parallel subgraph enumeration solution that can handle massive graphs without maintaining exponential numbers of partial results. Specifically, it implements a novel concept of the dual approach for subgraph enumeration, which swaps the roles of the data graph and the query graph. DUALSIM outperforms the state-of-the-art methods by up to orders of magnitude, while they fail for many queries due to explosive intermediate results.

Download
4 Structure-Preserving Query Service Publication (ICDE 2015, TKDE 2015)

This software implements the first practical private approach for subgraph query services, asymmetric structure-preserving subgraph query processing, where the data graph is publicly known and the query structure/topology is kept secret. Such query service is useful when the query computation is outsourced to a third-party service provider.

Download
3 ASTERIX Publication (SIGIR 2017, SIGMOD 2013)

Existing XML keyword search (XKS) engines primarily suffer from two limitations. First, although the smallest lowest common ancestor (SLCA) algorithm (or a variant, e.g.,ELCA) is widely accepted as a meaningful way to identify subtrees containing the query keywords, SLCA typically performs poorly on dcuments with missing elements, i.e., (sub)elements that are optional, or appear in some instances of an element type but not all. Second, since keyword search can be ambiguous with multiple possible interpretations, it is desirable for an XKS engine to automatically expand the original query by providing a classification of different possible interpretations of the query w.r.t.the original results. However, existing XKS systems do not support such result-based query expansion. ASTERIX is an innovative XKS engine that addresses these limitations.

Download
2 Generalized Subgraph Search Publication (CIKM 12)

This software implements a new type of graph queries, which injectively maps its edges to paths of the graphs in a given database, where the length of each path is constrained by a given threshold specified by the weight of the corresponding matching edge.

Download
1 MustBlend Publication (DASFAA 2013, ICDE 09, ICDE 06)

MUSTBLEND (MUlti-Source Twig BLENDer) is a novel visual XML querying paradigm where the visual query formulation and processing is interleaved. A key practical feature of MUSTBLEND is its portability as it does not employ any special-purpose storage, indexing, and query cost estimation schemes.

Download

 

COMPUTATIONAL SYSTEMS BIOLOGY AND BIOINFORMATICS

  SOFTWARE DETAILS
8 TROVE Publication (Bioinformatics 2017)

Cancer hallmarks, a concept that seeks to explain the complexity of cancer initiation and development, provide a new perspective of studying cancer signaling which could lead to a greater understanding of this complex disease. However, to the best of our knowledge, there is currently a lack of tools that support such hallmark-based study of the cancer signaling network, thereby impeding the gain of knowledge in this area. TROVE is a user-friendly and novel software that facilitates hallmark annotation, visualization and analysis in cancer signaling networks. It can be used to build further network-based analytics applications for cancer.

Download
7 TINTIN Publication (ACM BCB 2017)

A network-based approach that ranks a given set of networks based on its "similarity" to a reference network. TINTIN exploits target feature-based network similarity in order to determine if two networks are similar. Specifically, it leverages topological and dynamic features of targets to compute similarity distances between signaling networks and rank them accordingly. TINTIN is useful to address problems such target prioritization and drug target repositioning.

Download
6 TAPESTRY Publication (ACM BCB 2016)

Target prioritization ranks molecules in biological networks according to a score that seeks to identify molecules that fulfill particular roles (e.g., drug targets). TAPESTRY is a network-based approach that prioritizes candidate targets in a given signaling network with unknown targets by utilizing knowledge (target characteristics) gained from curated targets in another set of signaling networks. It exploits a knowledge base of characterization models and predictive topological features of a set of signaling networks (candidate networks) with curated targets. Given a signaling network G with unknown targets, TAPESTRY identifies a candidate network most similar to G and selects its characterization model as prioritization model for computing a topological feature-based rank of each candidate node in G. Then, a dynamic feature-based rank is computed for these nodes by leveraging the time-series curves of ODEs associated with the edges in G. Finally, these two ranks are integrated and used for prioritizing candidate targets.

Download
5 TENET Publication (Bioinformatics 2015)

A network-based approach that characterizes known targets in signaling networks using topological features. TENET first computes a set of topological features and then leverages a support vector machine-based approach to identify predictive topological features that characterizes known targets. A characterization model is generated and it specifies which topological features are important for discriminating the targets and how these features should be combined to quantify the likelihood of a node being a target.

Download
4 DUALALIGNER Publication (Bioinformatics 2014)

DualAligner performs dual network alignment, in which both region-to-region alignment, where whole subgraph of one network is aligned to subgraph of another, and protein-to-protein alignment, where individual proteins in networks are aligned to one another, are performed to achieve higher accuracy network alignments. Dual network alignment is achieved in DualAligner via background information provided by a combination of Gene Ontology annotation information and protein interaction network data.

Download
3 DiffNet Publication (Methods 2014)

The study of genetic interaction networks that respond to changing conditions is an emerging research problem. Bandyopadhyay et al. (2010) proposed a technique to construct a differential network (dE-MAPnetwork) from two static gene interaction networks in order to map the interaction differences between them under environment or condition change (e.g., DNA-damaging agent). This differential network is then manually analyzed to conclude that DNA repair is differentially effected by the condition change. Unfortunately, manual construction of differential functional summary from a dE-MAP network that summarizes all pertinent functional responses is time-consuming, laborious and error-prone, impeding large-scale analysis on it. DiffNet is a novel data-driven algorithm that leverages Gene Ontology (GO) annotations to automatically summarize a dE-MAP network to obtain a high-level map of functional responses due to condition change.

Download
2 FACETS Publication (Bioninformatics 2012)

FACETS is a novel PPI network decomposition algorithm to make sense of the deluge of interaction data using Gene Ontology (GO) annotations. It finds not just a single functional decomposition of the PPI network, but a multi-faceted atlas of functional decompositions that portray alternative perspectives of the functional landscape of the underlying PPI network. Each facet in the atlas represents a distinct interpretation of how the network can be functionally decomposed and organized. Our algorithm maximizes interpretative value of the atlas by optimizing inter-facet orthogonality and intra-facet cluster modularity.

Download
1 BIDEL Publication (DASFAA 2007)

Warehousing heterogeneous, dynamic biological data is a key technique for biological data integration as it greatly improves performance. However, it requires complex maintenance procedures to update the warehouse in light of the changes to the sources. Consequently, a key issue to address is how to detect changes to the underlying biological data sources. BIDEL is a software for detecting exact changes to biological annotations. In our approach we transform heterogeneous biological data to XML format and then detect changes between two versions of XML representation of biological data.

Download

 

Back to top